Decoding AAC CPE

From MultimediaWiki
Jump to navigation Jump to search

Part of Understanding AAC

A CPE is a channel pair element. This element contains the encoded data for 2 audio channels which probably have data in common. Presently, this description is only concerned with what it takes to decode low complexity (LC) data and processing that affects other features is skipped (except if the data needs to be parsed from the bitstream).

A note about the ad-hoc conventions in this syntax description: This notation:

6 bits: foo

indicates that the next 6 bits are to be read from the bitstream and stored in variable foo. Similarily, this notation:

foo: bar

indicates that the next (foo) quantity bits are to be read from the bitstream and stored in variable bar.

Function Hierarchy

When FAAD2 wants to decode a CPE, this is the sequence of functions it calls in its internal hierarchy:

syntax.c:decode_cpe()
 +-syntax.c:channel_pair_element()
    +-syntax.c:ics_info()
      +-specrec.c:window_grouping_info()
    +-syntax.c:ltp_data() (only for LTP decoding)
    +-syntax.c:individual_channel_stream()
      +-syntax.c:ics_info()
      +-syntax.c:section_data()
      +-syntax.c:scale_factor_data()
        +-rvlc.c:decode_scale_factors()
      +-syntax.c:pulse_data()
      +-syntax.c:tns_data()
      +-syntax.c:gain_control_data() (for SSR decoding)
      +-rvlc.c:rvlc_decode_scale_factors()
      +-hcr.c:reordered_spectral_data()
      +-syntax.c:spectral_data()
        +-huffman.c:huffman_spectral_data()
      +-pulse.c:pulse_decode()
    +-specrec.c:reconstruct_channel_pair()

decode_cpe()

 channel_pair_element()

channel_pair_element()

 declare 2 ic_stream structures: ics1 and ics2
 declare 2 arrays of 16-bit ints for specrtral data: spec_data1 and spec_data2
 4 bits: element_instance_tag
 1 bit:  common_window
 if common window is 1 then both channels have common ics information
   ics_info(ics1)
   2 bits: ics1.ms_mask_present
   if ics1.ms_mask_present is 1
     foreach g in 0..ics1.num_windows_groups-1
       foreach sfb in 0..ics.max_sfb-1
          1 bit: ics.ms_used[g][sfb]
   // error resilience stuff
   copy ics1 into ics2
 else
   ics1.ms_mask_present = 0
 individual_channel_stream(ics1, spec_data1)
 // error resilience stuff
 individual_channel_stream(ics2, spec_data2)
 // SBR stuff
 reconstruct_channel_pair(ics1, ics2, spec_data1, spec_data2)

ics_info(ic_stream ics)

 1 bit:  reserved
 2 bits: ics.window_sequence
 1 bit:  ics.window_shape
   #define ONLY_LONG_SEQUENCE   0x0
   #define LONG_START_SEQUENCE  0x1
   #define EIGHT_SHORT_SEQUENCE 0x2
   #define LONG_STOP_SEQUENCE   0x3
 if ics.window_sequence = EIGHT_SHORT_SEQUENCE
   4 bits: ics.max_sfb
   7 bits: ics.scale_factor_grouping
 else
   6 bits: ics.max_sfb
 window_grouping_info(ics)
 if ics.max_sfb > ics.num_swb
   error
 if ics.window_sequence != EIGHT_SHORT_SEQUENCE
   1 bit: ics.predictor_data_present
   if ics.predictor_data_present = 1
     // main profile stuff
     // LTP stuff
     // ER stuff

window_sequence(ic_stream ics)

 if ics.window_sequence is 0, 1, or 2
   ics.num_windows = 1
   ics.num_window_groups = 1
   ics.window_group_length[ics.num_window_groups - 1] = 1
   if aac_object_type is LD
     if frame_length is 512
       ics.num_swb = num_swb_512_window[sf_index]
     else
       ics.num_swb = num_swb_480_window[sf_index]
   else
     if frame_length is 1024
       ics.num_swb = num_swb_1024_window[sf_index]
     else
       ics.num_swb = num_swb_960_window[sf_index]
   if aac_object_type is LD
     if frame_length is 512
       foreach i in 0..ics.num_swb-1
         ics.sect_sfb_offset[0][i] = swb_offset_512_window[sf_index][i]
         ics.swb_offset[i] = swb_offset_512_window[sf_index][i]
     else
       foreach i in 0..ics.num_swb-1
         ics.sect_sfb_offset[0][i] = swb_offset_480_window[sf_index][i]
         ics.swb_offset[i] = swb_offset_480_window[sf_index][i]
     ics.sect_sfb_offset[0][ics.num_swb] = frameLength;
     ics.swb_offset[ics.num_swb] = frameLength;
   else
     foreach i in 0..ics.num_swb-1
       ics.sect_sfb_offset[0][i] = swb_offset_1024_window[sf_index][i]
       ics.swb_offset[i] = swb_offset_1024_window[sf_index][i]
     ics.sect_sfb_offset[0][ics.num_swb] = frameLength
     ics.swb_offset[ics.num_swb] = frameLength;
  else (ics.window_sequence is 3) { EIGHT_SHORT_SEQUENCE }
    ics.num_windows = 1
    ics.num_window_groups = 1
    ics.window_group_length[ics.num_window_groups - 1] = 1
    ics.num_swb = num_swb_128_window[sf_index]
    foreach i in 0..ics.num_swb-1
      ics.sect_sfb_offset[0][i] = swb_offset_128_window[sf_index][i]
    ics.swb_offset[ics.num_swb] = frameLength / 8
    foreach i in 0..ics.num_windows-1
      if bit #6-i in ics.scale_factor_grouping is set
        ics.num_windows>groups++
        ics.window_group_length[ics.num_window_groups - 1] = 1
      else
        ics.window_group_length[ics.num_window_groups - 1]++
    foreach g in 0..ics.num_window_groups
      declare local_width
      declare local_sect_sfb = 0
      declare local_offset = 0
      foreach i in ics.num_swb
        if i + 1 == ics.num_swb
          width = frameLength / 8 - swb_offset_128_window[sf_index][i]
        else
          width = swb_offset_128_window[sf_index][i+1] - swb_offset_128_window[sf_index][i]
        width *= ics.window_group_length[g]
        ics.sect_sfb_offset[g][sect_sfb++] = offset
        offset += width
      ics.sect_sfb_offset[g][sect_sfb] = offset;

individual_channel_stream(ic_stream ics, spec_data[1024])

 8 bits: ics.global_gain
 do ics_info process if both element.common_window and scal_flag are 0
 section_data(ics)
 scale_factor_data(ics)
 if scal_flag is 0
   1 bit: pulse_data_present
   if pulse_data_present is 1
     pulse_data(ics)
   1 bit: tns_data_present
   if tns_data_present
     tns_data(ics)
   1 bit: gain_control_data_present
   if gain_control_data_present
     if object_type is SSR
       gain_control_data(ics)
 // error resilience and DRM stuff
 spectral_data(ics)
 if pulse_data_present
   if ics.window_sequence == EIGHT_SHORT_SEQUENCE
     error
   else
     pulse_decode(ics)

section_data(ic_stream ics)

 if ics.window_sequence = EIGHT_SHORT_SEQUENCE
   section_bits = 3
 else
   section_bits = 5
 section_escape_value = (1 << section_bits) - 1 (either 7 or 31/0x1F)
 foreach g in 0..ics.num_window-groups-1
   k = i = 0
   // remember to check that the following loop is not stuck
   while k < ics.max_sfb
     if aacSectionDataResilienceFlag
       section_codebook_bits = 5
     else
       section_codebook_bits = 4
     section_codebook_bits: ics.section_codebook[g][i]
     if ics.section_codebook[g][i] = NOISE_HCB // 13
       ics.noise_used = 1
     // error resilience stuff
     section_bits: section_length_increment
     while section_length_increment = section_escape_value
       section_length += section_length_increment
       section_bits: section_length_increment
     section_length += section_length_increment
     ics.section_start[g][i] = k
     ics.section_end[g][i] = k
     if k + section_length >= 8*15
       error
     if i >= 8*15
       error
     foreach sfb = k..k+section_length-1
       ics.sfb_codebook[g][sfb] = ics.section_codebook[g][i]
     k += section_length
     i++

scale_factor_data(ic_stream ics)

 decode_scale_factors(ics)

decode_scale_factors(ic_stream ics)

Comment from FAAD2:

/*
* All scalefactors (and also the stereo positions and pns energies) are
* transmitted using Huffman coded DPCM relative to the previous active
* scalefactor (respectively previous stereo position or previous pns energy,
* see subclause 4.6.2 and 4.6.3). The first active scalefactor is
* differentially coded relative to the global gain.
*/

Decoding process:

 local g, sfb;
 local t;
 local noise_pcm_flag = 1;
 local scale_factor = ics->global_gain;
 local is_position = 0;
 local noise_energy = ics->global_gain - 90;

 foreach g in 0..ics.num_window_groups
   foreach sfb in 0..ics.max_sfb
     if ics.sfb_cb[g][sfb] is ZERO_HCB /* zero book */
       ics.scale_factors[g][sfb] = 0
     else if ics.sfb_cb[g][sfb] is INTENSITY_HCB (15) or INTENSITY_HCB2 (14)
       t = get Huffman coded scale factor
       is_position += (t - 60)
       ics.scale_factors[g][sfb] = is_position
     else if ics.sfb_cb[g][sfb] is NOISE_HCB (13)
       if noise_pcm_energy is 1
         noise_pcm_energy = 0
         9 bits: t
       else
         t = get Huffman coded scale factor
         t -= 60
       noise_energy += t
       ics.scale_factors[g][sfb] = noise_energy
     else  /* spectral books */
       ics.scale_factors[g][sfb] = 0
       t = get Huffman coded scale factor
       scale_factor += (t - 60)
       if scale_factor < 0  scale_factor > 255
         error
       ics.scale_factors[g][sfb] = scale_factor

For the scale factor Huffman table, see AAC Huffman Tables.

rvlc_decode_scale_factors(ic_stream ics)

Decode reverse VLCs, only applicable when for error resilience facilities are enabled.

pulse_data(ic_stream ics, pulse_data pulse)

 2 bits: pulse.number_pulse
 6 bits: pulse.pulse_start_sfb
 if pulse.pulse_start_sfb > ics.num_swb
   error
 foreach i in 0..pulse.number_pulse
   5 bits: pulse.pulse_offset[i]
   4 bits: pulse.pulse_amp[i]

tns_data(ic_stream ics, tns)

Definition: tns = temporal noise shaping

 if ics.window_sequence = EIGHT_SHORT_SEQUENCE
   n_filter_bits = 1
   length_bits = 4
   order_bits = 3
 else
   n_filter_bits = 2
   length_bits = 6
   order_bits = 5
 foreach w in 0..ics.num_windows-1
   n_filter_bits: tns.n_filter[w]
   if tns.n_filter[w]
     1 bit: tns.coef_res[w]
     if tns.coef_res[w] = 1
       start_coef_bits = 4
     else
       start_coef_bits = 3
   for filter in 0..tns.n_filter[w]-1
     length_bits: tns.length[w][filter]
     order_bits: tns.order[w][filter]
     if tns.order[w][filter]
       1 bit: tns.direction[w][filter]
       1 bit: tns.coef_compress[w][filter]
       coefficient_bits = start_coef_bits - tns.coef_compress[w][filter]
       foreach i in 0..tns.order[w][filter]
         coefficient_bits: tns.coef[w][filter][i]

gain_control_data(ic_stream ics)

This function pertains to SSR decoding

 local bd, wd, ad
 2 bits: ssr.max_band
 if ics.window_sequence is ONLY_LONG_SEQUENCE
   foreach bd in 1..ssr.max_band
     foreach wd in 0..0  /* yes, just one iteration */
       3 bits: ssr.adjust_num[bd][wd]
       foreach ad in 0..ssr.adjust_num[bd][wd] - 1
         4 bits: ssr.alevcode[bd][wd][ad]
         5 bits: ssr.aloccode[bd][wd][ad]
 else if ics.window_sequence is LONG_START_SEQUENCE
   foreach bd in 1..ssr.max_band
     foreach wd in 0..1
       3 bits: ssr.adjust_num[bd][wd]
       foreach ad in 0..ssr.adjust_num[bd][wd] - 1
         4 bits: ssr.alevcode[bd][wd][ad]
         if wd is 0
           4 bits: ssr.aloccode[bd][wd][ad]
         else
           2 bits: ssr.aloccode[bd][wd][ad]
 else if ics.window_sequence is EIGHT_SHORT_SEQUENCE
   foreach bd in 1..ssr.max_band
     foreach wd in 0..8
       3 bits: ssr.adjust_num[bd][wd]
       foreach ad in 0..ssr.adjust_num[bd][wd] - 1
         4 bits: ssr.alevcode[bd][wd][ad]
         2 bits: ssr.aloccode[bd][wd][ad]
 else if ics.window_sequence is LONG_STOP_SEQUENCE
   foreach bd in 1..ssr.max_band
     foreach wd in 0..1
       3 bits: ssr.adjust_num[bd][wd]
       foreach ad in 0..ssr.adjust_num[bd][wd] - 1
         4 bits: ssr.alevcode[bd][wd][ad]
         if wd is 0
           4 bits: ssr.aloccode[bd][wd][ad]
         else
           5 bits: ssr.aloccode[bd][wd][ad]

spectral_data(ic_stream ics)

 p = 0
 groups = 0
 nshort = framelength / 8
 foreach g in 0..ics.num_window_groups
   p = nshort * groups
   for i in 0..ics.num_sec[g]
     section_codebook = ics.section_codebook[g][i]
     if section_codebook >= FIRST_PAIR_HCB (5)
       increment = 2
     else
       increment = 4
     if section_codebook is ZERO_HCB (0), NOISE_HCB (13),
       INTENSITY_HCB (14), or INTENSITY_HCB2 (15)
       p += ics.section_sfb_offset[g][ics.section_end[g][i]] -
            ics.section_sfb_offset[g][ics.section_start[g][i]]
     else
       for k in ics.section_sfb_offset[g][ics.section_start[g][i]]..
                ics.section_sfb_offset[g][ics.section_end[g][i]]
                k += increment
         huffman_spectral_data(section_codebook, spectral_data[p])
         p += inc
   groups += ics.window_group_length[g]

pulse_decode(ic_stream ics, array spectral_data (16-bit ints), frame_length)

Note that this function does not modify the bitstream. It just modifies decoded variables.

 k = ics.swb_offset[ics.pulse.pulse_start_sfb]
 for i in 0..ics.pulse.number_pulse
   k += ics.pulse.pulse_offset[i]
   if (k >= frame_length)
     error
   if (spectral_data[k] > 0)
     spectral_data[k] += ics.pulse.pulse_amp[i]
   else
     spectral_data[k] -= ics.pulse.pulse_amp[i]

huffman_spectral_data(section_codebook, int16_t *spectral_data)

This function branches into a number of Huffman decoders depending of codebook. It decodes a pair or quad of 16-bit spectral values.

 if section_codebook is:
   1 or 2:
     decode quadruple of unsigned values from Huffman table 1 or 2, respectively
   3 or 4:
     decode quadruple of signed values from Huffman table 3 or 4, respectively
   5 or 6:
     decode pair of unsigned values from Huffman table 5 or 6, respectively
   7, 8, 9, or 10:
     decode pair of signed values from Huffman table 7, 8, 9, or 10, respectively
   11:
     decode escape code using Huffman table 11
   12:
     generate some other code using using Huffman table 11 (yes, 11)

Section codebook 12 is interesting. It uses Huffman table 11 to retrieve a pair of values. Then FAAD2 runs the values through a function called huffman_codebook(int i). If i is 0, return the upper 16 bits of the decimal value 16428320, else return the lower 16 bits. What is 16428320 expressed in hex? 0xFAAD20. Must be some application-defined piece of the AAC spec.

The Huffman tables are listed on a separate page: AAC Huffman Tables

reordered_spectral_data()

Comment block from FAAD2 source, which comes from the ISO spec:

 ISO/IEC 14496-3/Amd.1
 8.5.3.3: Huffman Codeword Reordering for AAC spectral data (HCR)

 HCR devides the spectral data in known fixed size segments, and
 sorts it by the importance of the data. The importance is firstly
 the (lower) position in the spectrum, and secondly the largest
 value in the used codebook.
 The most important data is written at the start of each segment
 (at known positions), the remaining data is interleaved inbetween,
 with the writing direction alternating.
 Data length is not increased.

This operation is applicable when reversible VLCs are enabled.

Reconstruction

If you have made it this far, congratulations! You are ready to proceed to the CPE reconstruction phase: Reconstructing AAC CPE.