ATRAC3plus

From MultimediaWiki
Jump to navigation Jump to search

ATRAC3plus introduction

ATRAC3plus is a proprietary audio compression algorithm developed by Sony. As in the case of its predecessor ATRAC3, ATRAC3plus represents the next generation of the ATRAC codec introduced in 1992 with the MiniDisc. Common use of that codec is in nowel Minidisc players and Portable Playstations made by Sony.

Streams coded with ATRAC3plus are usually stored either in the WAV container (those files have the ".at3" extension though) or in the Sony's proprietary Oma/Omg container. In the case of the WAV container the undocumented GUID:

E923AABF-CB58-4471-A119-FFFA01E4CE62

is used in order to indicate the ATRAC3plus codec.

There is very limited number of software products supporting encoding/decoding of the ATRAC3plus streams; most of them are unfortunately available for Microsoft Windows only. Those are:

  • Sony's own SonicStage software (Windows only)
  • ATRAC Codec Plugin for Sony Media Software (Windows only)
  • Sonic Studio's expensive N-code plugin for professionals (available for Windows and Mac OS X)

There is a multi-channel version of ATRAC3plus called "ATRAC-X".

ATRAC3plus technical documentation

Available bitrates

ATRAC3plus operates on fixed bitrates only. The following bitrates are offered by the Sony Encoding software:

   bitrate      frame size (stereo)
-------------   -------------------
   48 Kbps           280 bytes
   64 Kbps           376 bytes
   96 Kbps           560 bytes
  128 Kbps           744 bytes
  160 Kbps           936 bytes
  192 Kbps          1120 bytes
  256 Kbps          1488 bytes
  320 Kbps          1864 bytes
  352 Kbps          2048 bytes

Coding techniques

ATRAC3plus is a hybrid subband/MDCT codec like MP3. The signal is split into 16 subbands using Polyphase Quadrature Filter (further PQF) before MDCT and bit allocation. The sample-frame size is 2048 samples per channel.

After the subband splitting ATRAC3plus tries to extract sine waves from each subband using Generalized Harmonic Analysis (further GHA). GHA encodes parameters of extracted sine waves such as frequency, amplitude and phase into final bitstream.

After the sine waves extraction the remained signal (residual) will be transformed into frequency domain by a 128-point Modified discrete cosine transform. The resultet MDCT spectrum will be devided into 32 quantization units of unequal width (higher frequencies - wider units). The relationship between QMF bands and quantization units (QU) is shown in the table below:

QMF subband 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Quant unit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31


The flowchart of the ATRAC3plus decoding process is shown below:

Atrac3plus decoder flow.png

"Bitstream decoder" decodes various sound parameters from supplied frame data. First the residual signal will be decoded by applying inverse quantization, power compensation, inverse MDCT and gain compensation. Then the sine waves will be synthesized according with their parameters such as frequency, amplitude and phase. Then the residual and the synthesized sine waves will be added together. Optionally, some white noise can be added if specified in the bitstream.

This processing will be repeated for each of 16 subbands. Finally the PQF synthesis filter will be applied in order to sum all subbands together and reconstruct the encoded audio signal.

Various algorithms are used to improve compression results:

  • gain control for reducing pre-echo artifacts
  • power compensation for better quality at low bitrates

The following techniques are used in order to make the compressed data smaller:

Probably the most interesting part of the ATRAC3plus codec is the Generalized Harmonic Analysis (GHA) - an inharmonic frequency analysis proposed by Norbert Wiener in 1930. The main advantage of that is an excellent frequency resolution that surpasses the short-time Discrete Furier transformation. However it requires huge amount of calculations. Several algorithms to work around that problem were introduced during last 20 years, for example the one proposed by Dr.Hirata.

Coding methods for compressing bitstream parameters

Coding methods described in this section serve the purpose of representing different bitstream parameters like word-length, scale factor etc. using a smaller number of bits. It will be achieved by exploring and removing redundancy from the signals being encoded. The coding techniques described here are lossless.

Huffman coding

ATRAC3plus uses this coding technique widely. There are more than 130 different huffman tables in total for coding bitstream signals. Usually more frequently occuring values will have shorter codes. ATRAC3plus huffman trees are canonical ones. That means those can be stored very compactly by specifying the following parameters:

  • number of bits of the shortest codeword
  • number of bits of the longest codeword
  • number of items for every bit length
  • order of items

In my code I'm using the following descriptor in order to specify a canonical huffman table:

uint8_t min; /* shortest codeword length */
uint8_t max; /* longest  codeword length */
uint8_t num_items[max - min + 1]; /* number of items for every bit length */

For example, the huffman table vlc_tab_index = 3 here will be described as follows:

min = 1
max = 5
num_items[1, 0, 2, 3, 2]

The 2nd element of the array "num_items" is set to "0" because there is no codeword with the length of 2 bits.

The following C-pseudocode can be used for generating huffman tables from the descriptor described above during decoder initialization:

code = 0;
index = 0;

for (num_bits = min; num_bits <= max; num_bits++) {
    for (i = num_items[num_bits]; i > 0; i--) {
        bits [index] = num_bits;
        codes[index] = code++;
        index++;
    }
    code <<= 1;
}

The array "bits" receives length in bits for each codeword, "codes" receives codeword itself.

Finally, the order of codes need to be specified. A simple remapping table will be used to translate the code index into final code. For the table described above the translation table will look as follows:

0, 1, 7, 2, 3, 6, 4, 5
Delta coding

ATRAC3plus utilizes various delta-coding schemes in order to remove linear correlation from the signal. It often uses the modular arithmetic as well. The main advantage of this coding is that only the half of the range of the difference values is required. An example: word-length information coefficients in the range 0...7 need to be transmitted compactly. Using delta coding this would require to code difference values in the range -7...+7, also 15 values.

In the case of modular arithmetic the range of the difference values can be reduced to 0...7 by introducing a "wrap-around" so that the final equation looks like this:

B = (A + delta) & 7;

Below an example with "wrap around":

Consider we need to code the value B = "1" and the reference value A = "6". Then the difference value (delta) will be = "-5". According with equation above the delta value of "3" can be used instead of "-5":

(6 + 3) & 7 = 1;

Another example without "wrap around":

Consider we need to code the value B = "7" and the reference value A = "2". Then the difference value (delta) will be = "5":

(2 + 5) & 7 = 7;

Further variable-length codes will be used to reduce amount of bits of difference values in accordance with their probability.

The following is a description of the delta-coding methods used in ATRAC3plus:

Method A: huffman-coded modulo difference to previous

Consider the following signal:

3, 6, 6, 3, 3, 3, 4, 2, 2, 1, 1, 1, 3

Now code it using delta coding:

Coefficient Modulo delta value Huffman code Number of bits
3 - - 3
6 3 11110 5
6 0 0 1
3 5 1101 4
3 0 0 1
3 0 0 1
4 1 100 3
2 6 1110 4
2 0 0 1
1 7 101 3
1 0 0 1
1 0 0 1
3 2 1100 4

The 1st coefficient has no delta value associated with it because there is no previous value. It will be coded "as is" using fixed length of 3 bits. The following delta values get a variable-length code from the table val_tab_index = 2 here so the final number of bits to be transmitted will be = 32. Compared to the unpacked version (13 x 3 bits = 39 bits) the coding method described above will yield a bit-reduction of 7 bits (18% smaller).

Method B: huffman-coded modulo difference to master

In a stereo mix the signal of the left channel is often very similar to the signal of the right channel (i.e. there is a high cross-correlation between the channels). In this case the estimated sound parameters like word-length or scale factor will have a high similarity as well. Then coding the differential signal between the channels can lead to a significant bit reduction. Surely at least the one of the channels must be coded independently. Such a channel will be called "master" (it's usually the left channel but ATRAC3plus has the possibility to make the right channel act like a master as well). For the 2nd channel only the difference to master will be coded. The 2nd channel will be called "slave" in this case.

Below an example of such a high-correlated signal:

Left : 6, 5, 6, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1
Right: 6, 5, 6, 2, 2, 2, 3, 1, 1, 1, 2, 1, 1
Diff : 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0

Coding the difference signal using the table val_tab_index = 0 here will result in another signal 15 bits long. Compared to the unpacked version (13 x 3 bits = 39 bits) that coding method will yield a bit-reduction of 24 bits (62% smaller).

Method C: shorter delta to min

Sometimes coefficients in a signal are very close to each other, so subtracting the minimum value from each coefficient will result in smaller deltas whose can be coded using fewer bits.

An example:

2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1

As one can see the values in the sequence above are very similar to each other. Let us find minimum and maximum values and then determine the number of delta bits:

min = 1; max = 2; num_delta_bits = ilog2(max - min + 1) = 1 bit

Now let us encode the sequence above using shorter deltas:

num_delta_bits = 1 will be coded as a 2-bit value
min = 1 will be coded as a 3-bit value
deltas: 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0

The encoded signal is 5 + 1 x 15 = 20 bits long while the unpacked one is 15 x 3 = 45 bits long. The bit-reduction is thereafter 25 bits (55% smaller).

Another example:

1, 2, 3, 2, 4, 2, 1, 2, 3, 3, 1, 4, 4, 1, 1
min = 1; max = 4; num_delta_bits = ilog2(max - min + 1) = 2 bits

Now the encoded signal:

num_delta_bits = 2 (will be coded as a 2-bit value)
min = 1 (will be coded as a 3-bit value)
deltas: 0, 1, 2, 1, 3, 1, 0, 1, 2, 2, 0, 3, 3, 0, 0

The encoded signal is 5 + 2 x 15 = 35 bits long while the unpacked one is 15 x 3 = 45 bits long. The bit-reduction is thereafter 10 bits (22% smaller).

Method D: sequence of numbers in ascending order

Sometimes ATRAC3plus have to deal with sequences of numbers (i.e. gain control position information) where all items are known to be in ascending order (i.e. satisfy the following equation: Vn+1 > Vn). Such sequences can be packed without any additional bitstream information by examining previous value (predecessor), calculating magnitude between it and the maximum value and making the decision about number of bits of the next delta value.

Consider the following sequence:

Position index: 0,  1,  2,  3,  4,  5,  6,  7
---------------------------------------------
Position info : 5,  7, 14, 15, 18, 25, 29, 30
---------------------------------------------
Num delta bits: 5,  5,  5,  4,  4,  3,  1,  0

1st coefficient (position index = 0) will be coded directly using 5 bits because the sequence should start somewhere. The following coefficients (except one with the value of "30") will be coded according to the following pseudocode:

num_delta_bits = ilog2(31 - prev_val);
if (num_delta_bits == 5)
    new_val = get_bits(5);
else
    new_val = prev_val + get_bits(num_delta_bits) + 1;

Let us return to our sequence. The 2nd value will be coded directly as well using 5 bits because ilog2(31 - 5) = 5. Similar for the 3rd one. No delta coding is applied in that case. The 4th value will be delta-coded using 4 bits:

num_delta_bits = ilog2(31 - 15) = 4 bits;
delta = 18 - 15 - 1 = 2

And so on until we reach the last value = 30. In this case there is only one value that meets our condition Vn+1 > Vn: the value of "31". In this case no delta will be transmitted and the coming value will be calculated just as:

new_val = prev_val + 1;

Therefore the resulting sequence will be 27 bits long. Compared to the unpacked version (8 x 5 bits = 40 bits) this packing method will yield a bit-reduction of 13 bits (32% smaller).

Vector quantization with residual encoding

One further packing technique used in ATRAC3plus is based on so-called "shape prediction vectors". Encoder decomposes a signal (word-length or scale factor info) into "shape prediction" + residual. Then only the index of the "shape prediction vector" and the huffman-coded residual will be transmitted. The main advantage of this method is when the shape matches the coded signal closely, the residual can be represented very compactly (usually 1-2 bits per value). Moreover, the majority of values of the residual will turn into zeroes, which can be further packed.

Each entry of the "shape prediction tables" contain an average value over 3 coefficients. This helps to keep those tables comparable small. For example, for a signal of 32 values each "shape table" will have 10 entries (last entry contains usually an average value over 5 coefficients).

Consider the following signal to be encoded:

7, 7, 6, 5, 4, 4, 3, 2, 2, 2, 1, 1

Let us "quantize" that signal by diving it into 4 * 3 groups and find the averaged value in each group:

floor((7 + 7 + 6) / 3 + 0.5) = 7,
floor((5 + 4 + 4) / 3 + 0.5) = 4,
floor((3 + 2 + 2) / 3 + 0.5) = 2,
floor((2 + 1 + 1) / 3 + 0.5) = 1

Find a "shape table" in the trained set that closely matches our "quantized" version. It will be (for example):

7, 5, 2, 1

Now compute the residual:

Original signal 7 7 6 5 4 4 3 2 2 2 1 1
Unpacked shape table 7 7 7 5 5 5 2 2 2 1 1 1
Residual 0 0 -1 0 -1 -1 1 0 0 1 0 0

Now select a huffman table that represents the residual above as small as possible. The following huffman tree assigns the shortest code (1 bit) to the most frequently occuring symbol = "0" and 2-bit codes to the others: "1" and "-1":

Huffman code Number of bits Delta value
0 1 0
10 2 1
11 2 -1

The packed signal will occupy 21 bits: 4 bits "shape table" index + 17 bits residual(7 bits for "zeroes" + 10 bits for "non-zeroes"). Compared to the unpacked version (12 x 3 bits = 36 bits) this packing method will yield a bit-reduction of 15 bits (42% smaller).

Value grouping with "group coded" flag

If a signal contains lots of zeroes, grouping several values together and assigning the "group coded" flag to each group will achieve a significant bit-reduction. Consider the following sequence of numbers to be encoded:

0, 0, 1, 2, 0, 0, 3, 3, 0, 0, 0, 7, 0, 6, 0, 0

Let us cluster each two values together and assign the "coded" flag (1 bit) to each group:

(0, 0); flag = 0 (group not coded)
(1, 2); flag = 1 (group coded)
(0, 0); flag = 0 (group not coded)
(3, 3); flag = 1 (group coded)
(0, 0); flag = 0 (group not coded)
(0, 7); flag = 1 (group coded)
(0, 6); flag = 1 (group coded)
(0, 0); flag = 0 (group not coded)

Thereafter, each "not coded" group requires only one bit to be transmitted indicating that all values in that group are zero. On the other hand, each "coded" group requires one extra bit to be transmitted indicating that at least one value in that group is non-zero. In the case above that overhead is worthwhile because the half of the signal contains zeroes.

The encoded signal is 4 x 1 + 4 x 7 = 32 bits long while the unpacked one is 16 x 3 = 48 bits long. The bit-reduction is thereafter 16 bits (33% smaller).

Multichannel ATRAC3plus (ATRAC-X)

ATRAC3plus supports multichannel streams (up to 8 channels). Such streams are encoded in units customary called "channel block"; each block contains max. 2 channels (ie can be MONO or STEREO). For example, taking the channel_id = 3 and looking at the table below we have a stream containing 2 channel blocks: 1 stereo + 1 mono and thus 3 channels. The base codec operates on either MONO or STEREO channel blocks only.

ATRAC-X channel configurations

channel_id total channels number of channel blocks speaker mapping
0 0 undefined
  • undefined
1 1 1
  • front: center (MONO)
2 2 1
  • front: L, R (STEREO)
3 3 2
  • front: L, R
  • front: center
4 4 3
  • front: L, R
  • front: center
  • rear: surround
5 5+1 4
  • front: L, R
  • front: center
  • rear: L, R
  • LFE
6 6+1 5
  • front: L, R
  • front: center
  • rear: L, R
  • rear: center
  • LFE
7 7+1 5
  • front: L, R
  • front: center
  • rear: L, R
  • side: L, R
  • LFE

Bitstream overview

The table below shows the bitstream organization of ATRAC3plus at the top-level. Depends on channel configuration a typical frame may contain more than one channel block. In this case the additional fields channel_block_type and channel_block_data will be included for each block.


name number of bits value description
start_marker 1 0

marks the start of the ATRAC3plus bitstream

channel_block_type 2
  • 00b - MONO block
  • 01b - STEREO block
  • 10b - EXTENSION block
type of the channel block
channel_block_data variable contains encoded sound information
terminator 2 11b indicates the end of the bitstream

Channel block types

There are following channel block types in ATRAC3plus:

  • Mono channel block: contains monaural sound data.
  • Stereo channel block: contains stereophonic sound data.
  • Extension block: as indicated by its name it's intended to carry some extension information. Its purpose is unknown though due to the lack of an official description. All existing decoder implementations are programmed to ignore blocks of that type.

Channel block layout

ATRAC3plus was designed to provide a high-quality sound compression. Therefore it tries to save as much bits as possible. It uses a new coding scheme for channel blocks compared to ATRAC3: channels in a stereo sound are no more coded separately but rather in one stereo channel block. The bitstream for such a block provides the possibility for both channels to share several sound parameters so that there is no need to transmit the same things twice. Depends on correlation between the channels this can lead to a significant bit reduction and thus improve coding quality.

A mono/stereo channel block contains the following pieces of sound information:

name size in bits description
sound_header 6 defines some global sound parameters
wordlength_info variable quantization word length information for each quant unit
scalefactor_info variable quantization scale factor indexes for each coded quant unit
codetable_info variable code table table information for each coded quant unit
spectra variable huffman-coded spectral information for each coded quant unit
window_info variable tells which IMDCT window shape should be used during the sound reconstruction
gain_info variable gain envelope used by the gain compensation
gha_info variable information about sine-like waves in the compressed sound obtained by the GHA. It contains quantized frequency, amplitude and phase for each wave to be synthesized in the decoder.
noise_info 1/9 contains noise flag, level index and table selector for the white noise to be added during decoding.


Sound header

At the start of each channel block the sound header is located. It contains the following fields:

size in bits name value(s) comments
5 num_quant_units valid values: 0...27,31 number of coded quantization units - 1. The value of "0" indicates one coded unit, the value of "31" - 32 ones. The values 28, 29 and 30 are invalid.
1 x_flag to be figured out


Word-length information

Coding summary

Word-length (or quantization precision) information follows the sound header. It defines the word-length parameter for each coded quantization unit. This parameter is in the range 0...7, where the value of "7" indicates the highest quantization precision and the value of "1" - the lowest one. The value of "0" means no data, i.e. the corresponding quantization unit was not coded.

In the case of the stereo channel block the word-length parameters for the channel 1(L) will be transmitted first followed by the the word-length parameters for the channel 2(R). The word-lengths for the channel 1 are always coded independently. The word-lengths for the channel 2 can be coded either independently or relative to the channel 1. In this case the 1st channel is called "master" and the 2nd one - "slave". The word-lengths for the mono block will be coded just like the channel 1 in the stereo block.

In order to keep the word-length data as small as possible ATRAC3plus uses several coefficient packing techniques achieving different amount of bits needed for transmission:

  • the coefficients are coded directly (3 bits value). This means no packing and used at high bitrates because the frame size is big enough to keep the infomation unpacked.
  • differential coding + huffman-coded delta: the first coefficient is coded directly; all others are huffman-coded deltas to the previous coefficient.
  • prediction + huffman-coded residual: this techniques offers the best packing and used at low bitrates. It's analogous to the lossless coding and based on trained shape tables serving as prediction. Later the huffman-coded residual will be added to the prediction prefectly reconstructing the coefficients.
Reconstruction of trimmed word-length coefficients

Word-length coefficient of the trailing quantization units corresponding to the high spectral bands tend to be either 1 (low-precision) or 0 (not coded). Such coefficients will be ommited and one the following modes will be used in order to reconstruct their values during decoding:

mode code(2 bits) num_coded_vals split_point_delta Action(master) Action(slave)
0 not present not present no trimmed coefficients
1 5 bits set all trimmed coefficients to "0"
2 set all trimmed coefficients to "1" for each trimmed coefficient read one bit of its direct value
3 2 bits set all trimmed coefficients up to split point to "1" and after split point - to "0". The split point is calculated differently for master and slave channels (see below)

To calculate the split point from split_point_delta do the following:

  • for the master channel: number of zeroes = split_point_delta + 1
  • for the slave channel: number of ones = split_point_delta + 3

The following C-pseudocode shows how to parse a bitstream according with the table above:

mode = get_bits(2);
if (mode) {
    num_coded_vals = get_bits(5);
    if (mode == 3)
        split_point_delta = get_bits(2);
} else {
    num_coded_vals = num_quant_units;
}

The following C-pseudocode shows how to reconstruct trimmed word-length coefficients according with the table above:

switch (mode) {
case 0: /* no further action */
    break;
case 1:
    for (pos = num_coded_vals; pos < num_quant_units; pos++)
        wl_coeffs[pos] = 0;
    break;
case 2:
    for (pos = num_coded_vals; pos < num_quant_units; pos++) {
        if (channel == master)
            wl_coeffs[pos] = 1;
        else
            wl_coeffs[pos] = get_bits(1);
    }
    break;
case 3:
    if (channel == master)
        split_point = num_quant_units - split_point_delta - 1;
    else
        split_point = num_coded_vals + split_point_delta + 3;

    for (pos = num_coded_vals; pos < split_point; pos++)
        wl_coeffs[pos] = 1;

    for (; pos < num_quant_units; pos++)
        wl_coeffs[pos] = 0;
}


Word-length coding in detail

The word-length information for each channel will be coded as follows:

size in bits name comments
2 coding_mode indicates the coding mode used.
variable coeff_info word-length coefficients coded according with the coding_mode.


The coding_mode parameter may be interpreted differently depends on the channel number. The following pseudocode examples explain the coding modes in detail:

Mode 0 (master and slave)

All coefficients will be directly coded as follows:

for (i = 0; i < num_quant_units; i++)
     wl_coeffs[i] = get_bits(3);
Mode 1 (master)

Leading "n" values are stored directly while trailing ones are packed using Method C: shorter delta to min method.

Data stored in the bitstream:

  • 2 bits: index of the table of weigths, "0" - indicates "no table used"
  • 2/7/9 or more bits (depending on mode): info for the reconstruction of trimmed coefficients
  • 5 bits: number of directly coded coefficients (num_direct_coeffs). This value must be < num_coded_vals
  • 2 bits: size of deltas in bits (delta_bits)
  • 3 bits: minimum value (min_value)
  • for each num_direct_coeffs
    • 3 bits: coefficient value
  • if delta_bits > 0: for each (num_coded_vals - num_direct_coeffs)
    • delta_bits: delta value to be added to the min_value

The following C-pseudocode summarizes all above:

weigths_tab_indx = get_bits(2); /* get index of weights table to be added after decoding */

/* parse mode/num_coded_vals/split_point_delta parameters for trimmed coefficients */

num_direct_coeffs = get_bits(5);
if (num_direct_coeffs > num_coded_vals)
    ABORT("Invalid number of directly coded coefficients");

delta_bits = get_bits(2);
min_value  = get_bits(3);

for (pos = 0; pos < num_direct_coeffs; pos++)
    wl_coeffs[pos] = get_bits(3);

for (; pos < num_coded_vals; pos++) {
    if (delta_bits)
        wl_coeffs[pos] = min_value + get_bits(delta_bits);
    else
        wl_coeffs[pos] = min_value;
}

/* reconstruct trimmed coefficients as described here */

/* add weighting coefficients if requested */
if (weigths_tab_indx) {
    for (pos = 0; pos < num_quant_units; pos++)
        wl_coeffs[pos] += wl_weights[channel_num][weights_tab_indx - 1][pos];
}
Mode 1 (slave)

Coding method: Huffman-coded modulo difference to master.

Data stored in the bitstream:

/* parse mode/num_coded_vals/split_point_delta parameters for trimmed coefficients */

vlc_sel = get_bits(2); /* selects a huffman table from this set */

for (i = 0; i < num_coded_vals; i++) {
    delta = get_vlc(vlc_sel);
    wl_coeffs[i] = (master_ch->wl_coeffs[i] + delta) & 7;
}
Mode 2 (master)

Coding method: Vector quantization with residual encoding and Value grouping with "group coded" flag.

Data stored in the bitstream:

  • 2/7/9 or more bits (depending on mode): info for the reconstruction of trimmed coefficients.
  • 1 bit: enable_grouping flag. "1" indicates that residual values were coded pairwise (in groups of two).
  • 1 bit: selects one of the first two huffman tables from this set.
  • 3 bits: start_value selecting a subset of "shape tables" from the trained set.
  • 4 bits: shape_index selecting a "shape table" within the subset indicated by start_value.
  • for each num_coded_vals
    • if enable_grouping == 1:
      • 1 bit: group_coded flag
      • if group_coded == 1:
        • 2 huffman-coded residual values to be added to the unpacked "shape table" using modular arithmetic
    • if enable_grouping == 0:
      • one huffman-coded residual value to be added to the unpacked "shape table" using modular arithmetic

Annex A: Decoding tables

Word-length related tables

Tables of weights

The weights below will be added to the decoded word-length coefficients. The tables are organized as follows:

  • [channel_number: 0 or 1][index: 0...2][coeff_indx: 0...31]
wl_weights[2][3][32] = {
    {
        {5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
        {5, 5, 5, 4, 4, 4, 3, 3, 3, 2, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
        {6, 5, 5, 5, 4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0},
    },
    {
        {5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
        {5, 5, 5, 4, 4, 4, 3, 3, 3, 2, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
        {6, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
    }
};

Huffman tables for delta coding

PLEASE NOTE: delta values indicated in the tables below will be added using modular arithmetic as described here, so in the case of "wrap around" the value of "7" will be treated as "-1", the value of "6" = "-2" and so on.

  • vlc_tab_index = 0, delta range -1...1
Huffman code Number of bits Delta value
0 1 0
10 2 1
11 2 7


  • vlc_tab_index = 1, delta range -2...2
Huffman code Number of bits Delta value
0 1 0
100 3 1
101 3 2
110 3 6
111 3 7


  • vlc_tab_index = 2, delta range 0...7 (-4...3)
Huffman code Number of bits Delta value
0 1 0
100 3 1
101 3 7
1100 4 2
1101 4 5
1110 4 6
11110 5 3
11111 5 4


  • vlc_tab_index = 3, delta range 0...7 (-4...3)
Huffman code Number of bits Delta value
0 1 0
100 3 1
101 3 7
1100 4 2
1101 4 3
1110 4 6
11110 5 4
11111 5 5