ATRAC3: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
m (Link to the original ATRAC codec page.)
 
(24 intermediate revisions by 4 users not shown)
Line 1: Line 1:
* FOURCC: atrc
* Format tag: 0x270
* Company: [[Real]]
* Company: [[Sony]]
* Samples: http://samples.mplayerhq.hu/real/AC-atrc/
* Samples: http://samples.mplayerhq.hu/A-codecs/ATRAC3/


Found in some old [[RealMedia]] files. The same as the [[Sony ATRAC]].


= ATRAC3 Introduction =
= ATRAC3 Introduction =


ATRAC (Adaptive TRansform Acoustic Coding) is the collective name for audio compression technologies
ATRAC3 is the next generation of the [[ATRAC]] codec. There are three major implementations for the PC:
developed by [[Sony]]. This codec family includes the following codecs today: ATRAC, ATRAC3,
[[RealAudio atrc]], the Sony ATRAC3 for [[Microsoft Audio Compression Manager API|Audio Compression Manager]] (ACM) and the Sonic Stage implementation.
ATRAC3plus and ATRAC Advanced lossless.
You can read about it at http://www.sony.net/Products/ATRAC3/overview/index.html#family
 
The ATRAC codec was introduced in 1992 with the MiniDisc. There is a good description at
http://www.minidisc.org/aes_atrac.html. It is used in MiniDisc portable players
by many companies.
 
ATRAC3 is the next generation of the ATRAC codec. There are three major implementations for the PC:
RealAudio8 atrc, the Sony ATRAC3 for [[Microsoft Audio Compression Manager API|Audio Compression Manager]] (ACM) and the Sonic Stage implementation.


ATRAC3 supports several different constant bitrates ("flavors"). The following table shows the
ATRAC3 supports several different constant bitrates ("flavors"). The following table shows the
Line 35: Line 25:
== Encoding algorithm ==
== Encoding algorithm ==


* Split the input signal into four bands using Quadrature mirror filter (QMF).
* Split the input signal into 4 bands using a Quadrature mirror filter (QMF).
* Gain control analyze and obtain gain control data.
* Perform gain control analysis to obtain gain control data.
* Convert all four bands into frequency domain using Modified Cosine Transform (MDCT or MLT).
* Convert all four bands into frequency domain using Modified Cosine Transform (MDCT or MLT).
* Find tonal components.
* Find tonal components.
* quantization
* Quantization
* Encode the bitstream.
* Encode the bitstream.
Even though this is for ATRAC2 (http://www.minidisc.org/atrac2.html) most of it applies to ATRAC3.


== Decoding algorithm ==
== Decoding algorithm ==
Line 59: Line 51:
and encodes them separate from the less important spectral data. A tone component is a group of
and encodes them separate from the less important spectral data. A tone component is a group of
consecutive spectral coefficients, described with parameters such as location and with. This allows
consecutive spectral coefficients, described with parameters such as location and with. This allows
finer qantization of such coefficients than a quantization within fixed subbands.
finer quantization of such coefficients than a quantization within fixed subbands.


== Joint-stereo mode ==
== Joint-stereo mode ==
Line 80: Line 72:
  --------------------------------------
  --------------------------------------


== Bitstream details ==
= Decoding Specification =
 
== Bitstream parsing ==
 
Parts is '''bold''' mean that a certain amount of bits are to be consumed from the bitstream.


===Header===
===Header===
Line 101: Line 97:


The presence of tonal components is indicated by the following field:
The presence of tonal components is indicated by the following field:
* '''numToneComp (5 bits)''' - number of coded tonal components. The value of 0 indicates no coded tonal components.
* '''numToneComp (5 bits)''' - Number of coded tonal components. The value of 0 indicates no coded tonal components.
To be continued...
* '''coding_mode_selector(2 bits)''' -- If this is equal to 2, return error. If this is equal to 3 then every component has it's own bit to select the coefficients coding mode. (VLC/CLC). If this is equal to 1 then all the components are CLC coded. If this is 0 all components are VLC coded. (coding_mode)


* For each tonal component
** For each number of bands, get band flags
*** '''band_flags (1 bit)''' -- Flag per band in the Tonal Component to be processed
** '''coded_values (3 bits)''' -- amount of coded coefficients
** '''quant_step_index (3 bits)''' -- index into the quant step table, if it is less then/equal to 1 then return error
** if coding_mode_selector is 3
*** '''coding_mode (1 bit)''' -- get the bands coding mode (CLC/VLC)


===Other spectral coefficients===
===Other spectral coefficients===
Line 120: Line 123:
Then follows the codes for each spectral coefficient in this subband. The VLC codes are shown below.
Then follows the codes for each spectral coefficient in this subband. The VLC codes are shown below.


= Decoding Specification =
== Scrambling ==
In [[RealMedia]] files the bitstream is scrambled. To unscramble the stream, perform a XOR on every 32 bits in the frame. The hex value to XOR with is 0x537F6103.
== Extra data format ==
In [[RealMedia]] files the extra data is as follows (big-endian order):
INT32 id, always 4
INT16 samples per frame, always 1024 * 2
INT16 delay, not used but always 0x88E
INT16 stereo coding mode, 2 - normal stereo, 0x12 - joint stereo
The length of this data is always 10 bytes.


== Transforms ==
== Transforms ==
Line 148: Line 135:
* 11.025 to 22.05 kHz (''f''/4 to ''f''/2)
* 11.025 to 22.05 kHz (''f''/4 to ''f''/2)


==== QMF window ====
The coeffs used in the QMF filter.
float qmf_48tap_half[24] = {
  -0.00001461907, -0.00009205479, -0.000056157569, 0.00030117269,
  0.0002422519,-0.00085293897, -0.0005205574, 0.0020340169,
  0.00078333891, -0.0042153862, -0.00075614988, 0.0078402944,
  -0.000061169922, -0.01344162, 0.0024626821, 0.021736089,
  -0.007801671, -0.034090221, 0.01880949, 0.054326009,
  -0.043596379, -0.099384367, 0.13207909, 0.46424159
};
These coeffs need to be mirrored and scaled by 2.
for (i=0 ; i<24; i++) {
  s = qmf_48tap_half[i] * 2.0;
  qmf_window[i] = s;
  qmf_window[47 - i] = s;
}


=== MLT ===
=== MLT ===
Line 171: Line 179:
== Huffman coding ==
== Huffman coding ==


VLC coding is used to compress the spectral coefficients.
VLC coding is used to compress the tonal and spectral coefficients.


=== Huffman tables ===
=== Huffman tables ===
Line 236: Line 244:
  };
  };


[[Category:Undiscovered Audio Codecs]]
[[Category:Audio Codecs]]
[[Category:Audio Codecs]]
[[Category: QMF Audio Codecs]]
[[Category: MDCT Audio Codecs]]

Latest revision as of 06:57, 29 November 2018


ATRAC3 Introduction

ATRAC3 is the next generation of the ATRAC codec. There are three major implementations for the PC: RealAudio atrc, the Sony ATRAC3 for Audio Compression Manager (ACM) and the Sonic Stage implementation.

ATRAC3 supports several different constant bitrates ("flavors"). The following table shows the bitrate, the size of a frame and the coding mode for each flavor respectively:

No             bitrate   frame size (stereo)     coding mode   samples per frame
--   -----------------   -------------------   -------------   -----------------
0     66 kbps  (66150)             192 bytes    joint stereo    1024 per channel
1     94 kpbs  (93713)             272 bytes    joint stereo    1024 per channel
2    105 kbps (104738)             304 bytes   normal stereo    1024 per channel
3    132 kpbs (132300)             384 bytes   normal stereo    1024 per channel
4    146 kbps (146081)             424 bytes   normal stereo    1024 per channel
5    176 kbps (176400)             512 bytes   normal stereo    1024 per channel
6    264 kbps (264600)             768 bytes   normal stereo    1024 per channel
7    352 kbps (352800)            1024 bytes   normal stereo    1024 per channel

Encoding algorithm

  • Split the input signal into 4 bands using a Quadrature mirror filter (QMF).
  • Perform gain control analysis to obtain gain control data.
  • Convert all four bands into frequency domain using Modified Cosine Transform (MDCT or MLT).
  • Find tonal components.
  • Quantization
  • Encode the bitstream.

Even though this is for ATRAC2 (http://www.minidisc.org/atrac2.html) most of it applies to ATRAC3.

Decoding algorithm

  • Parse the bitstream and extract the following:
    • gain control data
    • tonal components
    • quantized spectral coefficients
  • inverse quantization of the tonal components and spectral coefficients
  • Merge tonal components and other spectral coefficients together.
  • Reconstruct the timedomain signal using inverse MDCT.
  • gain compensation
  • Apply the QMF synthesis filter to reconstruct the sound.

Tonal components

ATRAC3 extracts the psychoacoustically important tonal components from the input signal spectra and encodes them separate from the less important spectral data. A tone component is a group of consecutive spectral coefficients, described with parameters such as location and with. This allows finer quantization of such coefficients than a quantization within fixed subbands.

Joint-stereo mode

ATRAC3 uses joint-stereo coding at low bitrates (66 and 94 kbps) to achieve better compression.

Bitstream overview

The ATRAC3 bitstream consists of so-called "Channel Sound Units". In stereo mode there are two such units. The structure of an unit is shown below:

--------------------------------------
| Header                             |
--------------------------------------
| Gain compensation data             |
--------------------------------------
| Tonal components                   |
--------------------------------------
| Other spectral coefficients        |
--------------------------------------

Decoding Specification

Bitstream parsing

Parts is bold mean that a certain amount of bits are to be consumed from the bitstream.

Header

If not in the joint-stereo mode, this header should be interpreted as follows:

  • id (6 bits) - should contain the value 0x28
  • nBandsCoded (2 bits) - number of QMF bands were coded. The value of 0 indicates one coded band.


Gain compensation data

For each coded QMF band (see nBandsCoded above) the following data will be transmitted:

  • numGainData (3 bits) - number of gain change points coded as level/location pairs. Value of 0 indicates no coded pairs. Each coded pair consists of the following fields:
  • levcode (4 bits) - level code
  • loccode (5 bits) - location code

This data is identical with the gain control tool from the MPEG AAC SSR profile that were also developed by Sony. Please refer to section "Gain compensation" below for a description how to interpret this data.


Tonal components

The presence of tonal components is indicated by the following field:

  • numToneComp (5 bits) - Number of coded tonal components. The value of 0 indicates no coded tonal components.
  • coding_mode_selector(2 bits) -- If this is equal to 2, return error. If this is equal to 3 then every component has it's own bit to select the coefficients coding mode. (VLC/CLC). If this is equal to 1 then all the components are CLC coded. If this is 0 all components are VLC coded. (coding_mode)
  • For each tonal component
    • For each number of bands, get band flags
      • band_flags (1 bit) -- Flag per band in the Tonal Component to be processed
    • coded_values (3 bits) -- amount of coded coefficients
    • quant_step_index (3 bits) -- index into the quant step table, if it is less then/equal to 1 then return error
    • if coding_mode_selector is 3
      • coding_mode (1 bit) -- get the bands coding mode (CLC/VLC)

Other spectral coefficients

The coefficients coded in this block are assumed not to be "tonal" (noise etc.) They are quantized and coded within fixed subbands. The ATRAC3 divides the whole MDCT spectrum (1024 points) into 32 subbands of unequal width (higher frequencies - wider bands). For each subband ATRAC3 will transmit a scalefactor index and VLC codes for each quantized spectral coefficients. The format of this this block is shown below:

  • numSubbands (5 bits) - number of coded subbands. The value of 0 indicates no coded subbands.
  • codingMode (1 bit) - value indicates the coding mode for ALL subbands:
0 - coefficients are coded using variable length codes (VLC)
1 - coefficients are coded using constant length codes (CLC)

Then follow the array of coding table indexes for each coded band:

  • tblIndex (3 bits) - indicates the coding table used (VLC) or number of bits used (CLC). The value of "0" indicates "skipped" (not coded) subband.

Then follows the array of scalefactor indexes for each coded subband:

  • sfIndex (6 bits) - indicates the index into scalefactor decoding table (see below).

Then follows the codes for each spectral coefficient in this subband. The VLC codes are shown below.


Transforms

QMF

Three stacked Quadrature Mirror Filters are used to split the signal into 4 different frequency bands.

  • 0 to 2.75625 kHz (DC to f/16)
  • 2.75625 to 5.5125 kHz (f/16 to f/8)
  • 5.5125 to 11.025 kHz (f/8 to f/4)
  • 11.025 to 22.05 kHz (f/4 to f/2)


QMF window

The coeffs used in the QMF filter.

float qmf_48tap_half[24] = {
  -0.00001461907, -0.00009205479, -0.000056157569, 0.00030117269,
  0.0002422519,-0.00085293897, -0.0005205574, 0.0020340169,
  0.00078333891, -0.0042153862, -0.00075614988, 0.0078402944,
  -0.000061169922, -0.01344162, 0.0024626821, 0.021736089,
  -0.007801671, -0.034090221, 0.01880949, 0.054326009,
  -0.043596379, -0.099384367, 0.13207909, 0.46424159
};

These coeffs need to be mirrored and scaled by 2.

for (i=0 ; i<24; i++) {
  s = qmf_48tap_half[i] * 2.0;
  qmf_window[i] = s;
  qmf_window[47 - i] = s;
}

MLT

The transform is a regular MDCT.

Windows

The overlapping window is not the same for encoding and decoding. Perfect reconstruction is ensured by the encoding and decoding windows having a inverse relation. Technical details can be found in H. Malvar's paper Fast algorithms for orthogonal modulated lapped transforms [1]

Encoding
for (i = 0; i < 256; i++) {
  we[i] = (sin(((i + 0.5) / 256 - 0.5) * PI) + 1.0) * 0.5;
} 
Decoding
for (i = 0; i < 256; i++) {
  wd[i] = we[i]/(we[i]^2 + we[255-i]^2)
}

Huffman coding

VLC coding is used to compress the tonal and spectral coefficients.

Huffman tables

huffcode1[9] = {
  0x0,0x4,0x5,0xC,0xD,0x1C,0x1D,0x1E,0x1F,
};

huffbits1[9] = {
  1,3,3,4,4,5,5,5,5,
};

huffcode2[5] = {
  0x0,0x4,0x5,0x6,0x7,
};

huffbits2[5] = {
  1,3,3,3,3,
};

huffcode3[7] = {
  0x0,0x4,0x5,0xC,0xD,0xE,0xF,
};

huffbits3[7] = {
  1,3,3,4,4,4,4,
};

huffcode4[9] = {
  0x0,0x4,0x5,0xC,0xD,0x1C,0x1D,0x1E,0x1F,
};

huffbits4[9] = {
  1,3,3,4,4,5,5,5,5,
};

huffcode5[15] = {
  0x0,0x2,0x3,0x8,0x9,0xA,0xB,0xC,0xD,0x1C,0x1D,0x3C,0x3D,0x3E,0x3F,
};

huffbits5[15] = {
  2,3,3,4,4,4,4,4,4,5,5,6,6,6,6,
};

huffcode6[31] = {
  0x0,0x2,0x3,0x4,0x5,0x6,0x7,0x8,0x9,0x14,0x15,0x16,0x17,0x18,0x19,0x34,0x35,
  0x36,0x37,0x38,0x39,0x3A,0x3B,0x78,0x79,0x7A,0x7B,0x7C,0x7D,0x7E,0x7F,
};

huffbits6[31] = {
  3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,
};

huffcode7[63] = {
  0x0,0x2,0x3,0x8,0x9,0xA,0xB,0xC,0xD,0xE,0xF,0x10,0x11,0x24,0x25,0x26,0x27,0x28,
  0x29,0x2A,0x2B,0x2C,0x2D,0x2E,0x2F,0x30,0x31,0x32,0x33,0x68,0x69,0x6A,0x6B,0x6C,
  0x6D,0x6E,0x6F,0x70,0x71,0x72,0x73,0x74,0x75,0xEC,0xED,0xEE,0xEF,0xF0,0xF1,0xF2,
  0xF3,0xF4,0xF5,0xF6,0xF7,0xF8,0xF9,0xFA,0xFB,0xFC,0xFD,0xFE,0xFF,
};

huffbits7[63] = {
  3,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,
  7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
};