Format tag: uses WAVE_FORMAT_EXTENSIBLE with the "SubFormat" field set to the following GUID: E923AABF-CB58-4471-A119-FFFA01E4CE62
Company: Sony
Samples: http://samples.mplayerhq.hu/A-codecs/ATRAC3+/
Stored in: WAV and Oma/Omg containers.
Official information: http://www.sony.net/Products/ATRAC3/tech/atrac3plus.html

ATRAC3plus introduction

ATRAC3plus is a proprietary audio compression algorithm developed by Sony. As in the case of ATRAC3 ATRAC3plus represents the next generation of the ATRAC codec introduced in 1992 with the MiniDisc. Common use of that codec is in nowel Minidisc players and Portable Playstations made by Sony.

Streams coded with ATRAC3plus are usually stored either in the WAV container (those files have the ".at3" extension though) or in the Sony's proprietary Oma/Omg container. In the case of the WAV container the undocumented GUID:

E923AABF-CB58-4471-A119-FFFA01E4CE62

is used in order to indicate the ATRAC3plus codec.

There is very limited number of software products supporting encoding/decoding of the ATRAC3plus streams; most of them are unfortunately available for Microsoft Windows only. Those are:

Sony's own SonicStage software (Windows only)
ATRAC Codec Plugin for Sony Media Software (Windows only)
Sonic Studio's expensive N-code plugin for professionals (available for Windows and Mac OS X)

There is a multi-channel version of ATRAC3plus called "ATRAC-X".

ATRAC3plus technical documentation

Supported bitrates

ATRAC3plus operates on fixed bitrates only. The following bitrates are supported:

   bitrate      frame size (stereo)
-------------   -------------------
   48 Kbps           280 bytes
   64 Kbps           376 bytes
   96 Kbps           560 bytes
  128 Kbps           744 bytes
  160 Kbps           936 bytes
  192 Kbps          1120 bytes
  256 Kbps          1488 bytes
  320 Kbps          1864 bytes
  352 Kbps          2048 bytes

Coding techniques

ATRAC3 is a hybrid subband/MDCT codec like MP3. The signal is split into 16 subbands using Quadrature Mirror Filter before MDCT and bit allocation. The MDCT window has the size of 2048 samples per channel. Further the resultet MDCT spectrum will be devided into 32 quantization units of unequal width (higher frequencies - wider units). The relationship between QMF bands and quantization units (QU) is shown in the table below:

QMF subband	0								1				2				3		4		5		6	7	8	9	10	11	12	13	14	15
Quant unit	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31

Various algorithms are used to improve compression results:

gain control for reducing pre-echo artifacts
generalized harmonic analysis (GHA) for separating tone components
power compensation for better quality at low bitrates

The following techniques are used in order to make the compressed data smaller:

variable-lenght (Huffman) coding
vector quantization based on trained tables
differential coding

Probably the most interesting part of the ATRAC3plus codec is the Generalized Harmonic Analysis (GHA) - an inharmonic frequency analysis proposed by Norbert Wiener in 1930. The main advantage of that is an excellent frequency resolution that surpasses the short-time Discrete Furier transformation. However it requires huge amount of calculations. Several algorithms to work around that problem were introduced during last 20 years, for example the one proposed by Dr.Hirata.

Multichannel ATRAC3plus (ATRAC-X)

ATRAC3plus supports multichannel streams (up to 8 channels). Such streams are encoded in units customary called "channel block"; each block contains max. 2 channels (ie can be MONO or STEREO). For example, taking the channel_id = 3 and looking at the table below we have a stream containing 2 channel blocks: 1 stereo + 1 mono and thus 3 channels. The base codec operates on either MONO or STEREO channel blocks only.

ATRAC-X channel configurations

channel_id	total channels	number of channel blocks	speaker mapping
0	0	undefined	undefined
1	1	1	front: center (MONO)
2	2	1	front: L, R (STEREO)
3	3	2	front: L, R front: center
4	4	3	front: L, R front: center rear: surround
5	5+1	4	front: L, R front: center rear: L, R LFE
6	6+1	5	front: L, R front: center rear: L, R rear: center LFE
7	7+1	5	front: L, R front: center rear: L, R side: L, R LFE

Bitstream overview

The table below shows the bitstream organization of ATRAC3plus at the top-level. Depends on channel configuration a typical frame may contain more than one channel block. In this case the additional fields channel_block_type and channel_block_data will be included for each block.

name	number of bits	value	description
start_marker	1	0	marks the start of the ATRAC3plus bitstream
channel_block_type	2	00b - MONO block 01b - STEREO block 10b - EXTENSION block	type of the channel block
channel_block_data	variable		contains encoded sound information
terminator	2	11b	indicates the end of the bitstream

Channel block types

There are following channel block types in ATRAC3plus:

Mono channel block: contains monaural sound data.
Stereo channel block: contains stereofonic sound data.
Extension block: as indicated by its name it's intended to carry some extension information. Its purpose is unknown though due to the lack of the official description. All existing decoder implementations are programmed to ignore such blocks.

Channel block layout

ATRAC3plus was designed to provide a high-quality sound compression. Therefore it tries to save as much bits as possible. It uses a new coding scheme for channel blocks compared to ATRAC3: channels in a stereo sound are no more coded separately but rather in one stereo channel block. The bitstream for such a block provides the possibility for both channels to share several sound parameters so that there is no need to transmit the same things twice. Depends on correlation between the channels this can lead to a significant bit reduction and thus improve coding quality.

A mono/stereo channel block contains the following pieces of sound information:

name	size in bits	description
sound_header	6	defines some global sound parameters
wordlength_info	variable	quantization word length information for each quant unit
scalefactor_info	variable	quantization scale factor indexes for each coded quant unit
huffman_info	variable	huffman table information for each coded quant unit
spectra	variable	huffman-coded spectral information for each coded quant unit
window_info	variable	tells which IMDCT window shape should be used during the sound reconstruction
gain_info	variable	gain envelope used by the gain compensation
gha_info	variable	information about sine-like waves in the compressed sound obtained by the GHA. It contains quantized frequency, amplitude and phase for each wave to be synthesized in the decoder.
noise_info	1/9	contains noise flag, level index and table selector for the white noise to be added during decoding.

Sound header

At the start of each channel block the sound header is located. It contains the following fields:

size in bits	name	value(s)	comments
5	num_quant_units	valid values: 0...27,31	number of coded quantization units - 1. The value of "0" indicates one coded unit, the value of "31" - 32 ones. The values 28, 29 and 30 are invalid.
1	x_flag		unknown purpose (mute?)

Word-length information

Word-length (or quantization precision) information follows the sound header. It defines the word-length parameter for each coded quantization unit. This parameter is in the range 0...7, where the value of "7" indicates the highest quantization precision and the value of "1" - the lowest one. The value of "0" means no data, i.e. the appropriate quantization unit was not coded.

In the case of the stereo channel block the word-length parameters for the channel 1(L) will be transmitted first followed by the the word-length parameters for the channel 2(R). The word-lengths for the channel 1 are always coded independendly. The word-lengths for the channel 2 can be coded either independendly or relative to the channel 1. In this case the 1st channel is called "master" and the 2nd one - "slave". The word-lengths for the mono block will be coded like the channel 1 in the stereo block.

In order to keep the word-length data as small as possible ATRAC3plus uses several coefficient packing techniques achieving different amount of bits needed for transmission:

the coefficients are coded directly (3 bits value). This means no packing and used at high bitrates because the frame size is big enough to keep the infomation unpacked.

differential coding + huffman-coded delta: the first coefficient is coded directly; all others are huffman-coded deltas to the previous coefficient.

prediction + huffman-coded residual: this techniques offers the best packing and used at low bitrates. It's analogous to the lossless coding and based on trained shape tables serving as prediction. Later the huffman-coded residual will be added to the prediction prefectly reconstructing the coefficients.

the word-length coefficient of the trailing quantization units corresponding to the high spectral bands tend to be either 1 (low-precision) or 0 (not coded). Such coefficients will be either grouped together (in the case of "1") or trimmed(in the case of "0").

The word-length information for each channel will be coded as follows:

size in bits	name	comments
2	coding_mode	indicates the coding mode used.
variable	coeff_info	word-length coefficients coded according with the coding_mode.

The coding_mode parameter will be interpreted differently depends on the channel number. The following pseudocode explains the coding modes for the channel 1 ("master"):

Mode 0 (direct coding):

for (i = 0; i < num_quant_units; i++)
     wl_coeffs[i] = get_bits(3);

ATRAC3plus

Contents

ATRAC3plus introduction

ATRAC3plus technical documentation

Supported bitrates

Coding techniques

Multichannel ATRAC3plus (ATRAC-X)

ATRAC-X channel configurations

Bitstream overview

Channel block types

Channel block layout

Sound header

Word-length information

Navigation menu

ATRAC3plus

ATRAC3plus introduction

ATRAC3plus technical documentation

Supported bitrates

Coding techniques

Multichannel ATRAC3plus (ATRAC-X)

ATRAC-X channel configurations

Bitstream overview

Channel block types

Channel block layout

Sound header

Word-length information

Navigation menu

Search