ATRAC3plus
- Format tag: uses WAVE_FORMAT_EXTENSIBLE with the "SubFormat" field set to the following GUID: E923AABF-CB58-4471-A119-FFFA01E4CE62
- Company: Sony
- Samples: http://samples.mplayerhq.hu/A-codecs/ATRAC3+/
- Stored in: WAV and Oma/Omg containers.
- Official information: http://www.sony.net/Products/ATRAC3/tech/atrac3plus.html
ATRAC3plus introduction
ATRAC3plus is a proprietary audio compression algorithm developed by Sony. As in the case of ATRAC3 ATRAC3plus represents the next generation of the ATRAC codec introduced in 1992 with the MiniDisc. Common use of that codec is in nowel Minidisc players and Portable Playstations made by Sony.
Streams coded with ATRAC3plus are usually stored either in the WAV container (those files have the ".at3" extension though) or in the Sony's proprietary Oma/Omg container. In the case of the WAV container the undocumented GUID:
E923AABF-CB58-4471-A119-FFFA01E4CE62
is used in order to indicate the ATRAC3plus codec.
There is very limited number of software products supporting encoding/decoding of the ATRAC3plus streams; most of them are unfortunately available for Microsoft Windows only. Those are:
- Sony's own SonicStage software (Windows only)
- ATRAC Codec Plugin for Sony Media Software (Windows only)
- Sonic Studio's expensive N-code plugin for professionals (available for Windows and Mac OS X)
There is a multi-channel version of ATRAC3plus called "ATRAC-X".
ATRAC3plus technical documentation
Supported bitrates
ATRAC3plus operates on fixed bitrates only. The following bitrates are supported:
bitrate frame size (stereo) ------------- ------------------- 48 Kbps 280 bytes 64 Kbps 376 bytes 96 Kbps 560 bytes 128 Kbps 744 bytes 160 Kbps 936 bytes 192 Kbps 1120 bytes 256 Kbps 1488 bytes 320 Kbps 1864 bytes 352 Kbps 2048 bytes
Coding techniques
ATRAC3 is a hybrid subband/MDCT codec like MP3. The signal is split into 16 subbands using Quadrature Mirror Filter before MDCT and bit allocation. The MDCT window has the size of 2048 samples per channel. Further the resultet MDCT spectrum will be devided into 32 quantization units of unequal width (higher frequencies - wider units). The relationship between QMF bands and quantization units (QU) is shown in the table below:
QMF subband | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Quant unit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
Various algorithms are used to improve compression results:
- gain control for reducing pre-echo artifacts
- generalized harmonic analysis (GHA) for separating tone components
- power compensation for better quality at low bitrates
The following techniques are used in order to make the compressed data smaller:
- variable-lenght (Huffman) coding
- vector quantization based on trained tables
- differential coding
Probably the most interesting part of the ATRAC3plus codec is the Generalized Harmonic Analysis (GHA) - an inharmonic frequency analysis proposed by Norbert Wiener in 1930. The main advantage of that is an excellent frequency resolution that surpasses the short-time Discrete Furier transformation. However it requires huge amount of calculations. Several algorithms to work around that problem were introduced during last 20 years, for example the one proposed by Dr.Hirata.
Multichannel ATRAC3plus (ATRAC-X)
ATRAC3plus supports multichannel streams (up to 8 channels). Such streams are encoded in units customary called "channel block"; each block contains max. 2 channels (ie can be MONO or STEREO). For example, taking the channel_id = 3 and looking at the table below we have a stream containing 2 channel blocks: 1 stereo + 1 mono and thus 3 channels. The base codec operates on either MONO or STEREO channel blocks only.
ATRAC-X channel configurations
channel_id | total channels | number of channel blocks | speaker mapping |
---|---|---|---|
0 | 0 | undefined |
|
1 | 1 | 1 |
|
2 | 2 | 1 |
|
3 | 3 | 2 |
|
4 | 4 | 3 |
|
5 | 5+1 | 4 |
|
6 | 6+1 | 5 |
|
7 | 7+1 | 5 |
|
Bitstream overview
The table below shows the bitstream organization of ATRAC3plus at the top-level. Depends on channel configuration a typical frame may contain more than one channel block. In this case the additional fields channel_block_type and channel_block_data will be included for each block.
name | number of bits | value | description |
---|---|---|---|
start_marker | 1 | 0 |
marks the start of the ATRAC3plus bitstream |
channel_block_type | 2 |
|
type of the channel block |
channel_block_data | variable | contains encoded sound information | |
terminator | 2 | 11b | indicates the end of the bitstream |
Channel block types
There are following channel block types in ATRAC3plus:
- Mono channel block: contains monaural sound data.
- Stereo channel block: contains stereofonic sound data.
- Extension block: as indicated by its name it's intended to carry some extension information. Its purpose is unknown though due to the lack of the official description. All existing decoder implementations are programmed to ignore such blocks.
Channel block layout
ATRAC3plus was designed to provide a high-quality sound compression. Therefore it tries to save as much bits as possible. It uses a new coding scheme for channel blocks compared to ATRAC3: channels in a stereo sound are no more coded separately but rather in one stereo channel block. The bitstream for such a block provides the possibility for both channels to share several sound parameters so that there is no need to transmit the same things twice. Depends on correlation between the channels this can lead to a significant bit reduction and thus improve coding quality.
A mono/stereo channel block contains the following pieces of sound information:
name | size in bits | description |
---|---|---|
sound_header | 6 | defines some global sound parameters |
wordlength_info | variable | quantization word length information for each quant unit |
scalefactor_info | variable | quantization scale factor indexes for each coded quant unit |
huffman_info | variable | huffman table information for each coded quant unit |
spectra | variable | huffman-coded spectral information for each coded quant unit |
window_info | variable | tells which IMDCT window shape should be used during the sound reconstruction |
gain_info | variable | gain envelope used by the gain compensation |
gha_info | variable | information about sine-like waves in the compressed sound obtained by the GHA. It contains quantized frequency, amplitude and phase for each wave to be synthesized in the decoder. |
noise_info | 1/9 | contains noise flag, level index and table selector for the white noise to be added during decoding. |
Sound header
At the start of each channel block the sound header is located. It contains the following fields:
size in bits | name | value(s) | comments |
---|---|---|---|
5 | num_quant_units | valid values: 0...27,31 | number of coded quantization units - 1. The value of "0" indicates one coded unit, the value of "31" - 32 ones. The values 28, 29 and 30 are invalid. |
1 | x_flag | unknown purpose (mute?) |
Word-length information
Word-length (or quantization precision) information follows the sound header. It defines the word-length parameter for each coded quantization unit. This parameter is in the range 0...7, where the value of "7" indicates the highest quantization precision and the value of "1" - the lowest one. The value of "0" means no data, i.e. the appropriate quantization unit was not coded.
In the case of the stereo channel block the word-length parameters for the channel 1(L) will be transmitted first followed by the the word-length parameters for the channel 2(R). The word-lengths for the channel 1 are always coded independendly. The word-lengths for the channel 2 can be coded either independendly or relative to the channel 1. In this case the 1st channel is called "master" and the 2nd one - "slave". The word-lengths for the mono block will be coded like the channel 1 in the stereo block.
In order to keep the word-length data as small as possible ATRAC3plus uses several coefficient packing techniques achieving different amount of bits needed for transmission:
- the coefficients are coded directly (3 bits value). This means no packing and used at high bitrates because the frame size is big enough to keep the infomation unpacked.
- differential coding + huffman-coded delta: the first coefficient is coded directly; all others are huffman-coded deltas to the previous coefficient.
- prediction + huffman-coded residual: this techniques offers the best packing and used at low bitrates. It's analogous to the lossless coding and based on trained shape tables serving as prediction. Later the huffman-coded residual will be added to the prediction prefectly reconstructing the coefficients.
- the word-length coefficient of the trailing quantization units corresponding to the high spectral bands tend to be either 1 (low-precision) or 0 (not coded). Such coefficients will be either grouped together (in the case of "1") or trimmed(in the case of "0").
The word-length information for each channel will be coded as follows:
size in bits | name | comments |
---|---|---|
2 | coding_mode | indicates the coding mode used. |
variable | coeff_info | word-length coefficients coded according with the coding_mode. |
The coding_mode parameter will be interpreted differently depends on the channel number. The following pseudocode explains the coding modes for the channel 1 ("master"):
- Mode 0 (direct coding):
for (i = 0; i < num_quant_units; i++) wl_coeffs[i] = get_bits(3);