Decoding AAC CPE: Difference between revisions
(outline Huffman decoding method) |
(typo) |
||
Line 3: | Line 3: | ||
A CPE is a channel pair element. This element contains the encoded data for 2 audio channels which probably have data in common. Presently, this description is only concerned with what it takes to decode low complexity (LC) data and processing that affects other features is skipped (except if the data needs to be parsed from the bitstream). | A CPE is a channel pair element. This element contains the encoded data for 2 audio channels which probably have data in common. Presently, this description is only concerned with what it takes to decode low complexity (LC) data and processing that affects other features is skipped (except if the data needs to be parsed from the bitstream). | ||
A note about | A note about the ad-hoc conventions in this syntax description: This notation: | ||
6 bits: foo | 6 bits: foo | ||
indicates that the next 6 bits are to be read from the bitstream and stored in variable foo. Similarily, this notation: | indicates that the next 6 bits are to be read from the bitstream and stored in variable foo. Similarily, this notation: |
Revision as of 01:04, 6 March 2006
Part of Understanding AAC
A CPE is a channel pair element. This element contains the encoded data for 2 audio channels which probably have data in common. Presently, this description is only concerned with what it takes to decode low complexity (LC) data and processing that affects other features is skipped (except if the data needs to be parsed from the bitstream).
A note about the ad-hoc conventions in this syntax description: This notation:
6 bits: foo
indicates that the next 6 bits are to be read from the bitstream and stored in variable foo. Similarily, this notation:
foo: bar
indicates that the next (foo) quantity bits are to be read from the bitstream and stored in variable bar.
Function Hierarchy
When FAAD2 wants to decode a CPE, this is the sequence of functions it calls in its internal hierarchy:
syntax.c:decode_cpe() +-syntax.c:channel_pair_element() +-syntax.c:ics_info() +-specrec.c:window_grouping_info() +-syntax.c:ltp_data() (only for LTP decoding) +-syntax.c:individual_channel_stream() +-syntax.c:ics_info() +-syntax.c:section_data() +-syntax.c:scale_factor_data() +-rvlc.c:decode_scale_factors() +-syntax.c:pulse_data() +-syntax.c:tns_data() +-syntax.c:gain_control_data() (for SSR decoding) +-rvlc.c:rvlc_decode_scale_factors() +-hcr.c:reordered_spectral_data() +-syntax.c:spectral_data() +-huffman.c:huffman_spectral_data() +-pulse.c:pulse_decode() +-specrec.c:reconstruct_channel_pair()
decode_cpe
channel_pair_element()
channel_pair_element
declare 2 ic_stream structures: ics1 and ics2 1 bit: element_instance_tag 1 bit: common_window if 1 then both channels have common ics information ics_info(ics1) 2 bits: ics1.ms_mask_present if ics1.ms_mask_present is 1 foreach g in 0..ics1.num_windows_groups-1 foreach sfb in 0..ics.max_sfb-1 1 bit: ics.ms_used[g][sfb] // error resilience stuff copy ics1 into ics2 else ics1.ms_mask_present = 0 individual_channel_stream(ics1) // error resilience stuff individual_channel_stream(ics2) // SBR stuff reconstruct_channel_pair(ics1, ics2)
ics_info(ic_stream ics)
1 bit: reserved 2 bits: ics.window_sequence 1 bit: ics.window_shape #define ONLY_LONG_SEQUENCE 0x0 #define LONG_START_SEQUENCE 0x1 #define EIGHT_SHORT_SEQUENCE 0x2 #define LONG_STOP_SEQUENCE 0x3 if ics.window_sequence = EIGHT_SHORT_SEQUENCE 4 bits: ics.max_sfb 7 bits: ics.scale_factor_grouping else 6 bits: ics.max_sfb window_grouping_info(ics) if ics.max_sfb > ics.num_swb error if ics.window_sequence != EIGHT_SHORT_SEQUENCE 1 bit: ics.predictor_data_present if ics.predictor_data_present = 1 // main profile stuff // LTP stuff // ER stuff
window_sequence(ic_stream ics)
if ics.window_sequence is 0, 1, or 2 ics.num_windows = 1 ics.num_window_groups = 1 ics.window_group_length[ics.num_window_groups - 1] = 1 if aac_object_type is LD if frame_length is 512 ics.num_swb = num_swb_512_window[sf_index] else ics.num_swb = num_swb_480_window[sf_index] else if frame_length is 1024 ics.num_swb = num_swb_1024_window[sf_index] else ics.num_swb = num_swb_960_window[sf_index] if aac_object_type is LD if frame_length is 512 foreach i in 0..ics.num_swb-1 ics.sect_sfb_offset[0][i] = swb_offset_512_window[sf_index][i] ics.swb_offset[i] = swb_offset_512_window[sf_index][i] else foreach i in 0..ics.num_swb-1 ics.sect_sfb_offset[0][i] = swb_offset_480_window[sf_index][i] ics.swb_offset[i] = swb_offset_480_window[sf_index][i] ics.sect_sfb_offset[0][ics.num_swb] = frameLength; ics.swb_offset[ics.num_swb] = frameLength; else foreach i in 0..ics.num_swb-1 ics.sect_sfb_offset[0][i] = swb_offset_1024_window[sf_index][i] ics.swb_offset[i] = swb_offset_1024_window[sf_index][i] ics.sect_sfb_offset[0][ics.num_swb] = frameLength ics.swb_offset[ics.num_swb] = frameLength; else (ics.window_sequence is 3) { EIGHT_SHORT_SEQUENCE } ics.num_windows = 1 ics.num_window_groups = 1 ics.window_group_length[ics.num_window_groups - 1] = 1 ics.num_swb = num_swb_128_window[sf_index] foreach i in 0..ics.num_swb-1 ics.sect_sfb_offset[0][i] = swb_offset_128_window[sf_index][i] ics.swb_offset[ics.num_swb] = frameLength / 8 foreach i in 0..ics.num_windows-1 if bit #6-i in ics.scale_factor_grouping is set ics.num_windows>groups++ ics.window_group_length[ics.num_window_groups - 1] = 1 else ics.window_group_length[ics.num_window_groups - 1]++ foreach g in 0..ics.num_window_groups declare local_width declare local_sect_sfb = 0 declare local_offset = 0 foreach i in ics.num_swb if i + 1 == ics.num_swb width = frameLength / 8 - swb_offset_128_window[sf_index][i] else width = swb_offset_128_window[sf_index][i+1] - swb_offset_128_window[sf_index][i] width *= ics.window_group_length[g] ics.sect_sfb_offset[g][sect_sfb++] = offset offset += width ics.sect_sfb_offset[g][sect_sfb] = offset;
individual_channel_stream(ic_stream ics)
8 bits: ics.global_gain do ics_info process if both element.common_window and scal_flag are 0 section_data(ics) scale_factor_data(ics) if scal_flag is 0 1 bit: pulse_data_present if pulse_data_present is 1 pulse_data(ics) 1 bit: tns_data_present if tns_data_present tns_data(ics) 1 bit: gain_control_data_present if gain_control_data_present if object_type is SSR gain_control_data(ics) // error resilience and DRM stuff spectral_data(ics) if pulse_data_present if ics.window_sequence == EIGHT_SHORT_SEQUENCE error else pulse_decode(ics)
section_data(ic_stream ics)
if ics.window_sequence = EIGHT_SHORT_SEQUENCE section_bits = 3 else section_bits = 5 section_escape_value = (1 << section_bits) - 1 (either 7 or 31/0x1F) foreach g in 0..ics.num_window-groups-1 k = i = 0 // remember to check that the following loop is not stuck while k < ics.max_sfb if aacSectionDataResilienceFlag section_codebook_bits = 5 else section_codebook_bits = 4 section_codebook_bits: ics.section_codebook[g][i] if ics.section_codebook[g][i] = NOISE_HCB // 13 ics.noise_used = 1 // error resilience stuff section_bits: section_length_increment while section_length_increment = section_escape_value section_length += section_length_increment section_bits: section_length_increment section_length += section_length_increment ics.section_start[g][i] = k ics.section_end[g][i] = k if k + section_length >= 8*15 error if i >= 8*15 error foreach sfb = k..k+section_length-1 ics.sfb_codebook[g][sfb] = ics.section_codebook[g][i] k += section_length i++
scale_factor_data(ic_stream ics)
decode_scale_factors(ics)
pulse_data(ic_stream ics, pulse_data pulse)
2 bits: pulse.number_pulse 6 bits: pulse.pulse_start_sfb if pulse.pulse_start_sfb > ics.num_swb error foreach i in 0..pulse.number_pulse 5 bits: pulse.pulse_offset[i] 4 bits: pulse.pulse_amp[i]
tns_data(ic_stream ics, tns)
Definition: tns = temporal noise shaping
if ics.window_sequence = EIGHT_SHORT_SEQUENCE n_filter_bits = 1 length_bits = 4 order_bits = 3 else n_filter_bits = 2 length_bits = 6 order_bits = 5 foreach w in 0..ics.num_windows-1 n_filter_bits: tns.n_filter[w] if tns.n_filter[w] 1 bit: tns.coef_res[w] if tns.coef_res[w] = 1 start_coef_bits = 4 else start_coef_bits = 3 for filter in 0..tns.n_filter[w]-1 length_bits: tns.length[w][filter] order_bits: tns.order[w][filter] if tns.order[w][filter] 1 bit: tns.direction[w][filter] 1 bit: tns.coef_compress[w][filter] coefficient_bits = start_coef_bits - tns.coef_compress[w][filter] foreach i in 0..tns.order[w][filter] coefficient_bits: tns.coef[w][filter][i]
gain_control_data(ic_stream ics)
This function pertains to SSR decoding
local bd, wd, ad 2 bits: ssr.max_band if ics.window_sequence is ONLY_LONG_SEQUENCE foreach bd in 1..ssr.max_band foreach wd in 0..0 /* yes, just one iteration */ 3 bits: ssr.adjust_num[bd][wd] foreach ad in 0..ssr.adjust_num[bd][wd] - 1 4 bits: ssr.alevcode[bd][wd][ad] 5 bits: ssr.aloccode[bd][wd][ad] else if ics.window_sequence is LONG_START_SEQUENCE foreach bd in 1..ssr.max_band foreach wd in 0..1 3 bits: ssr.adjust_num[bd][wd] foreach ad in 0..ssr.adjust_num[bd][wd] - 1 4 bits: ssr.alevcode[bd][wd][ad] if wd is 0 4 bits: ssr.aloccode[bd][wd][ad] else 2 bits: ssr.aloccode[bd][wd][ad] else if ics.window_sequence is EIGHT_SHORT_SEQUENCE foreach bd in 1..ssr.max_band foreach wd in 0..8 3 bits: ssr.adjust_num[bd][wd] foreach ad in 0..ssr.adjust_num[bd][wd] - 1 4 bits: ssr.alevcode[bd][wd][ad] 2 bits: ssr.aloccode[bd][wd][ad] else if ics.window_sequence is LONG_STOP_SEQUENCE foreach bd in 1..ssr.max_band foreach wd in 0..1 3 bits: ssr.adjust_num[bd][wd] foreach ad in 0..ssr.adjust_num[bd][wd] - 1 4 bits: ssr.alevcode[bd][wd][ad] if wd is 0 4 bits: ssr.aloccode[bd][wd][ad] else 5 bits: ssr.aloccode[bd][wd][ad]
spectral_data(ic_stream ics)
p = 0 groups = 0 nshort = framelength / 8 foreach g in 0..ics.num_window_groups p = nshort * groups for i in 0..ics.num_sec[g] section_codebook = ics.section_codebook[g][i] if section_codebook >= FIRST_PAIR_HCB (5) increment = 2 else increment = 4 if section_codebook is ZERO_HCB (0), NOISE_HCB (13), INTENSITY_HCB (14), or INTENSITY_HCB2 (15) p += ics.section_sfb_offset[g][ics.section_end[g][i]] - ics.section_sfb_offset[g][ics.section_start[g][i]] else for k in ics.section_sfb_offset[g][ics.section_start[g][i]].. ics.section_sfb_offset[g][ics.section_end[g][i]] k += increment huffman_spectral_data(section_codebook, spectral_data[p]) p += inc groups += ics.window_group_length[g]
pulse_decode(ic_stream ics, array spectral_data (16-bit ints), frame_length)
Note that this function does not modify the bitstream. It just modifies decoded variables.
k = ics.swb_offset[ics.pulse.pulse_start_sfb] for i in 0..ics.pulse.number_pulse k += ics.pulse.pulse_offset[i] if (k >= frame_length) error if (spectral_data[k] > 0) spectral_data[k] += ics.pulse.pulse_amp[i] else spectral_data[k] -= ics.pulse.pulse_amp[i]
huffman_spectral_data(section_codebook, int16_t *spectral_data)
This function branches into a number of Huffman decoders depending of codebook. It decodes a pair or quad of 16-bit spectral values.
if section_codebook is 1 or 2 2-step method for data quadruples if section_codebook is 3 binary search for data quadruples if section_codebook is 4 2-step method for data quadruples if section_codebook is 5 binary search for data pairs if section_codebook is 6 2-step method for data pairs if section_codebook is 7 or 9 binary search for data pairs if section_codebook is 8 or 10 2-step method for data pairs if section_codebook is 12 2-step method for data pairs using section codebook 11 extra processing
2-step method for data pairs
binary search for data pairs
2-step method for data quadruples
binary search for data quadruples
rvlc_decode_scale_factors(ic_stream ics)
reverse VLCS, used for error resilience
reordered_spectral_data()
Comment block from FAAD2 source, which comes from the ISO spec:
ISO/IEC 14496-3/Amd.1 8.5.3.3: Huffman Codeword Reordering for AAC spectral data (HCR) HCR devides the spectral data in known fixed size segments, and sorts it by the importance of the data. The importance is firstly the (lower) position in the spectrum, and secondly the largest value in the used codebook. The most important data is written at the start of each segment (at known positions), the remaining data is interleaved inbetween, with the writing direction alternating. Data length is not increased.
Reconstruction
If you have made it this far, congratulations! You are ready to proceed to the CPE reconstruction phase: Reconstructing AAC CPE.