Decoding AAC CPE
Part of Understanding AAC
A CPE is a channel pair element. This element contains the encoded data for 2 audio channels which probably have data in common. Presently, this description is only concerned with what it takes to decode low complexity (LC) data and processing that affects other features is skipped (except if the data needs to be parsed from the bitstream).
A note about the ad-hoc conventions in this syntax description: This notation:
6 bits: foo
indicates that the next 6 bits are to be read from the bitstream and stored in variable foo. Similarily, this notation:
foo: bar
indicates that the next (foo) quantity bits are to be read from the bitstream and stored in variable bar.
Function Hierarchy
When FAAD2 wants to decode a CPE, this is the sequence of functions it calls in its internal hierarchy:
syntax.c:decode_cpe() +-syntax.c:channel_pair_element() +-syntax.c:ics_info() +-specrec.c:window_grouping_info() +-syntax.c:ltp_data() (only for LTP decoding) +-syntax.c:individual_channel_stream() +-syntax.c:ics_info() +-syntax.c:section_data() +-syntax.c:scale_factor_data() +-rvlc.c:decode_scale_factors() +-syntax.c:pulse_data() +-syntax.c:tns_data() +-syntax.c:gain_control_data() (for SSR decoding) +-rvlc.c:rvlc_decode_scale_factors() +-hcr.c:reordered_spectral_data() +-syntax.c:spectral_data() +-huffman.c:huffman_spectral_data() +-pulse.c:pulse_decode() +-specrec.c:reconstruct_channel_pair()
decode_cpe()
channel_pair_element()
channel_pair_element()
declare 2 ic_stream structures: ics1 and ics2 declare 2 arrays of 16-bit ints for specrtral data: spec_data1 and spec_data2 4 bits: element_instance_tag 1 bit: common_window if common window is 1 then both channels have common ics information ics_info(ics1) 2 bits: ics1.ms_mask_present if ics1.ms_mask_present is 1 foreach g in 0..ics1.num_windows_groups-1 foreach sfb in 0..ics.max_sfb-1 1 bit: ics.ms_used[g][sfb] // error resilience stuff copy ics1 into ics2 else ics1.ms_mask_present = 0 individual_channel_stream(ics1, spec_data1) // error resilience stuff individual_channel_stream(ics2, spec_data2) // SBR stuff reconstruct_channel_pair(ics1, ics2, spec_data1, spec_data2)
ics_info(ic_stream ics)
1 bit: reserved 2 bits: ics.window_sequence 1 bit: ics.window_shape #define ONLY_LONG_SEQUENCE 0x0 #define LONG_START_SEQUENCE 0x1 #define EIGHT_SHORT_SEQUENCE 0x2 #define LONG_STOP_SEQUENCE 0x3 if ics.window_sequence = EIGHT_SHORT_SEQUENCE 4 bits: ics.max_sfb 7 bits: ics.scale_factor_grouping else 6 bits: ics.max_sfb window_grouping_info(ics) if ics.max_sfb > ics.num_swb error if ics.window_sequence != EIGHT_SHORT_SEQUENCE 1 bit: ics.predictor_data_present if ics.predictor_data_present = 1 // main profile stuff // LTP stuff // ER stuff
window_sequence(ic_stream ics)
if ics.window_sequence is 0, 1, or 2 ics.num_windows = 1 ics.num_window_groups = 1 ics.window_group_length[ics.num_window_groups - 1] = 1 if aac_object_type is LD if frame_length is 512 ics.num_swb = num_swb_512_window[sf_index] else ics.num_swb = num_swb_480_window[sf_index] else if frame_length is 1024 ics.num_swb = num_swb_1024_window[sf_index] else ics.num_swb = num_swb_960_window[sf_index] if aac_object_type is LD if frame_length is 512 foreach i in 0..ics.num_swb-1 ics.sect_sfb_offset[0][i] = swb_offset_512_window[sf_index][i] ics.swb_offset[i] = swb_offset_512_window[sf_index][i] else foreach i in 0..ics.num_swb-1 ics.sect_sfb_offset[0][i] = swb_offset_480_window[sf_index][i] ics.swb_offset[i] = swb_offset_480_window[sf_index][i] ics.sect_sfb_offset[0][ics.num_swb] = frameLength; ics.swb_offset[ics.num_swb] = frameLength; else foreach i in 0..ics.num_swb-1 ics.sect_sfb_offset[0][i] = swb_offset_1024_window[sf_index][i] ics.swb_offset[i] = swb_offset_1024_window[sf_index][i] ics.sect_sfb_offset[0][ics.num_swb] = frameLength ics.swb_offset[ics.num_swb] = frameLength; else (ics.window_sequence is 3) { EIGHT_SHORT_SEQUENCE } ics.num_windows = 1 ics.num_window_groups = 1 ics.window_group_length[ics.num_window_groups - 1] = 1 ics.num_swb = num_swb_128_window[sf_index] foreach i in 0..ics.num_swb-1 ics.sect_sfb_offset[0][i] = swb_offset_128_window[sf_index][i] ics.swb_offset[ics.num_swb] = frameLength / 8 foreach i in 0..ics.num_windows-1 if bit #6-i in ics.scale_factor_grouping is set ics.num_windows>groups++ ics.window_group_length[ics.num_window_groups - 1] = 1 else ics.window_group_length[ics.num_window_groups - 1]++ foreach g in 0..ics.num_window_groups declare local_width declare local_sect_sfb = 0 declare local_offset = 0 foreach i in ics.num_swb if i + 1 == ics.num_swb width = frameLength / 8 - swb_offset_128_window[sf_index][i] else width = swb_offset_128_window[sf_index][i+1] - swb_offset_128_window[sf_index][i] width *= ics.window_group_length[g] ics.sect_sfb_offset[g][sect_sfb++] = offset offset += width ics.sect_sfb_offset[g][sect_sfb] = offset;
individual_channel_stream(ic_stream ics, spec_data[1024])
8 bits: ics.global_gain do ics_info process if both element.common_window and scal_flag are 0 section_data(ics) scale_factor_data(ics) if scal_flag is 0 1 bit: pulse_data_present if pulse_data_present is 1 pulse_data(ics) 1 bit: tns_data_present if tns_data_present tns_data(ics) 1 bit: gain_control_data_present if gain_control_data_present if object_type is SSR gain_control_data(ics) // error resilience and DRM stuff spectral_data(ics) if pulse_data_present if ics.window_sequence == EIGHT_SHORT_SEQUENCE error else pulse_decode(ics)
section_data(ic_stream ics)
if ics.window_sequence = EIGHT_SHORT_SEQUENCE section_bits = 3 else section_bits = 5 section_escape_value = (1 << section_bits) - 1 (either 7 or 31/0x1F) foreach g in 0..ics.num_window-groups-1 k = i = 0 // remember to check that the following loop is not stuck while k < ics.max_sfb if aacSectionDataResilienceFlag section_codebook_bits = 5 else section_codebook_bits = 4 section_codebook_bits: ics.section_codebook[g][i] if ics.section_codebook[g][i] = NOISE_HCB // 13 ics.noise_used = 1 // error resilience stuff section_bits: section_length_increment while section_length_increment = section_escape_value section_length += section_length_increment section_bits: section_length_increment section_length += section_length_increment ics.section_start[g][i] = k ics.section_end[g][i] = k if k + section_length >= 8*15 error if i >= 8*15 error foreach sfb = k..k+section_length-1 ics.sfb_codebook[g][sfb] = ics.section_codebook[g][i] k += section_length i++
scale_factor_data(ic_stream ics)
decode_scale_factors(ics)
decode_scale_factors(ic_stream ics)
Comment from FAAD2:
/* * All scalefactors (and also the stereo positions and pns energies) are * transmitted using Huffman coded DPCM relative to the previous active * scalefactor (respectively previous stereo position or previous pns energy, * see subclause 4.6.2 and 4.6.3). The first active scalefactor is * differentially coded relative to the global gain. */
Decoding process:
local g, sfb; local t; local noise_pcm_flag = 1; local scale_factor = ics->global_gain; local is_position = 0; local noise_energy = ics->global_gain - 90; foreach g in 0..ics.num_window_groups foreach sfb in 0..ics.max_sfb if ics.sfb_cb[g][sfb] is ZERO_HCB /* zero book */ ics.scale_factors[g][sfb] = 0 else if ics.sfb_cb[g][sfb] is INTENSITY_HCB (15) or INTENSITY_HCB2 (14) t = get Huffman coded scale factor is_position += (t - 60) ics.scale_factors[g][sfb] = is_position else if ics.sfb_cb[g][sfb] is NOISE_HCB (13) if noise_pcm_energy is 1 noise_pcm_energy = 0 9 bits: t else t = get Huffman coded scale factor t -= 60 noise_energy += t ics.scale_factors[g][sfb] = noise_energy else /* spectral books */ ics.scale_factors[g][sfb] = 0 t = get Huffman coded scale factor scale_factor += (t - 60) if scale_factor < 0 scale_factor > 255 error ics.scale_factors[g][sfb] = scale_factor
For the scale factor Huffman table, see AAC Huffman Tables.
rvlc_decode_scale_factors(ic_stream ics)
Decode reverse VLCs, only applicable when for error resilience facilities are enabled.
pulse_data(ic_stream ics, pulse_data pulse)
2 bits: pulse.number_pulse 6 bits: pulse.pulse_start_sfb if pulse.pulse_start_sfb > ics.num_swb error foreach i in 0..pulse.number_pulse 5 bits: pulse.pulse_offset[i] 4 bits: pulse.pulse_amp[i]
tns_data(ic_stream ics, tns)
Definition: tns = temporal noise shaping
if ics.window_sequence = EIGHT_SHORT_SEQUENCE n_filter_bits = 1 length_bits = 4 order_bits = 3 else n_filter_bits = 2 length_bits = 6 order_bits = 5 foreach w in 0..ics.num_windows-1 n_filter_bits: tns.n_filter[w] if tns.n_filter[w] 1 bit: tns.coef_res[w] if tns.coef_res[w] = 1 start_coef_bits = 4 else start_coef_bits = 3 for filter in 0..tns.n_filter[w]-1 length_bits: tns.length[w][filter] order_bits: tns.order[w][filter] if tns.order[w][filter] 1 bit: tns.direction[w][filter] 1 bit: tns.coef_compress[w][filter] coefficient_bits = start_coef_bits - tns.coef_compress[w][filter] foreach i in 0..tns.order[w][filter] coefficient_bits: tns.coef[w][filter][i]
gain_control_data(ic_stream ics)
This function pertains to SSR decoding
local bd, wd, ad 2 bits: ssr.max_band if ics.window_sequence is ONLY_LONG_SEQUENCE foreach bd in 1..ssr.max_band foreach wd in 0..0 /* yes, just one iteration */ 3 bits: ssr.adjust_num[bd][wd] foreach ad in 0..ssr.adjust_num[bd][wd] - 1 4 bits: ssr.alevcode[bd][wd][ad] 5 bits: ssr.aloccode[bd][wd][ad] else if ics.window_sequence is LONG_START_SEQUENCE foreach bd in 1..ssr.max_band foreach wd in 0..1 3 bits: ssr.adjust_num[bd][wd] foreach ad in 0..ssr.adjust_num[bd][wd] - 1 4 bits: ssr.alevcode[bd][wd][ad] if wd is 0 4 bits: ssr.aloccode[bd][wd][ad] else 2 bits: ssr.aloccode[bd][wd][ad] else if ics.window_sequence is EIGHT_SHORT_SEQUENCE foreach bd in 1..ssr.max_band foreach wd in 0..8 3 bits: ssr.adjust_num[bd][wd] foreach ad in 0..ssr.adjust_num[bd][wd] - 1 4 bits: ssr.alevcode[bd][wd][ad] 2 bits: ssr.aloccode[bd][wd][ad] else if ics.window_sequence is LONG_STOP_SEQUENCE foreach bd in 1..ssr.max_band foreach wd in 0..1 3 bits: ssr.adjust_num[bd][wd] foreach ad in 0..ssr.adjust_num[bd][wd] - 1 4 bits: ssr.alevcode[bd][wd][ad] if wd is 0 4 bits: ssr.aloccode[bd][wd][ad] else 5 bits: ssr.aloccode[bd][wd][ad]
spectral_data(ic_stream ics)
p = 0 groups = 0 nshort = framelength / 8 foreach g in 0..ics.num_window_groups p = nshort * groups for i in 0..ics.num_sec[g] section_codebook = ics.section_codebook[g][i] if section_codebook >= FIRST_PAIR_HCB (5) increment = 2 else increment = 4 if section_codebook is ZERO_HCB (0), NOISE_HCB (13), INTENSITY_HCB (14), or INTENSITY_HCB2 (15) p += ics.section_sfb_offset[g][ics.section_end[g][i]] - ics.section_sfb_offset[g][ics.section_start[g][i]] else for k in ics.section_sfb_offset[g][ics.section_start[g][i]].. ics.section_sfb_offset[g][ics.section_end[g][i]] k += increment huffman_spectral_data(section_codebook, spectral_data[p]) p += inc groups += ics.window_group_length[g]
pulse_decode(ic_stream ics, array spectral_data (16-bit ints), frame_length)
Note that this function does not modify the bitstream. It just modifies decoded variables.
k = ics.swb_offset[ics.pulse.pulse_start_sfb] for i in 0..ics.pulse.number_pulse k += ics.pulse.pulse_offset[i] if (k >= frame_length) error if (spectral_data[k] > 0) spectral_data[k] += ics.pulse.pulse_amp[i] else spectral_data[k] -= ics.pulse.pulse_amp[i]
huffman_spectral_data(section_codebook, int16_t *spectral_data)
This function branches into a number of Huffman decoders depending of codebook. It decodes a pair or quad of 16-bit spectral values.
if section_codebook is: 1 or 2: decode quadruple of unsigned values from Huffman table 1 or 2, respectively 3 or 4: decode quadruple of signed values from Huffman table 3 or 4, respectively 5 or 6: decode pair of unsigned values from Huffman table 5 or 6, respectively 7, 8, 9, or 10: decode pair of signed values from Huffman table 7, 8, 9, or 10, respectively 11: decode escape code using Huffman table 11 12: generate some other code using using Huffman table 11 (yes, 11)
Section codebook 12 is interesting. It uses Huffman table 11 to retrieve a pair of values. Then FAAD2 runs the values through a function called huffman_codebook(int i). If i is 0, return the upper 16 bits of the decimal value 16428320, else return the lower 16 bits. What is 16428320 expressed in hex? 0xFAAD20. Must be some application-defined piece of the AAC spec.
The Huffman tables are listed on a separate page: AAC Huffman Tables
reordered_spectral_data()
Comment block from FAAD2 source, which comes from the ISO spec:
ISO/IEC 14496-3/Amd.1 8.5.3.3: Huffman Codeword Reordering for AAC spectral data (HCR) HCR devides the spectral data in known fixed size segments, and sorts it by the importance of the data. The importance is firstly the (lower) position in the spectrum, and secondly the largest value in the used codebook. The most important data is written at the start of each segment (at known positions), the remaining data is interleaved inbetween, with the writing direction alternating. Data length is not increased.
This operation is applicable when reversible VLCs are enabled.
Reconstruction
If you have made it this far, congratulations! You are ready to proceed to the CPE reconstruction phase: Reconstructing AAC CPE.