Reconstructing AAC CPE: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
(revert spam/vandalism)
mNo edit summary
Line 5: Line 5:
== Reconstruction Process ==
== Reconstruction Process ==
   specrec.c:reconstruct_channel_pair()
   specrec.c:reconstruct_channel_pair()
    +-specrec.c:allocate_channel_pair()
    -specrec.c:allocate_channel_pair()
    +-specrec.c:inverse_quantization()
    -specrec.c:inverse_quantization()
    +-specrec.c:apply_scalefactors()
    -specrec.c:apply_scalefactors()
    +-specrec.c:quant_to_spec()
    -specrec.c:quant_to_spec()
    +-pns.c:pns_decode()
    -pns.c:pns_decode()
      +-pns.c:generate_random_vector()
      -pns.c:generate_random_vector()
    +-ms.c:ms_decode()
    -ms.c:ms_decode()
    +-is.c:is_decode()
    -is.c:is_decode()
    +-ic_predict.c:ic_prediction() (main profile)
    -ic_predict.c:ic_prediction() (main profile)
    +-ic_predict.c:pns_reset_pred_state() (main profile)
    -ic_predict.c:pns_reset_pred_state() (main profile)
    +-(processing that is specific LTP & LD profiles)
    -(processing that is specific LTP
    +-tns.c:tns_decode_frame()
    +-drc.c:drc_decode()
    +-filtbank.c:ifilter_bank()
      +-filtbank.c:imdct_long()
      +-mdct.c:faad_imdct()
    +-ssr.c:ssr_decode()
    +-lt_predict.c:lt_update_state()
    +-sbr_dec.c:sbrDecodeInit()
    +-sbr_dec.c:sbrDecodeCoupleFrame()
 
== reconstruct_channel_pair(2 ic_streams, 2 spec_data arrays(16 bit ints)) ==
  declare 2 1024-element float arrays for spectral coefficients (spec_coeff1 and spec_coeff2)
  inverse_quantization(spec_coeff1, spec_data1)
  inverse_quantization(spec_coeff2, spec_data2)
  apply_scalefactors(ics1, spec_coeff1)
  apply_scalefactors(ics2, spec_coeff2)
  if ics1.window_sequence is EIGHT_SHORT_SEQUENCE (2)
    quant_to_spec(ics1, spec_coeff1)
  if ics2.window_sequence is EIGHT_SHORT_SEQUENCE (2)
    quant_to_spec(ics2, spec_coeff2)
  if ics1.ms_mask_present
    pns_decode(ics1, ics2, spec_coef1, spec_coeff2)
  else
    pns_decode(ics1, spec_coef1)
    pns_decode(ics2, spec_coef2)
  ms_decode(ics1, ics2, spec_coef1, spec_coef2)
  is_decode(ics1, ics2, spec_coef1, spec_coef2)
  ''// main profile decoding''
  ''// LTP decoding''
  tns_decode(ics1, spec_coeff1)
  tns_decode(ics2, spec_coeff2)
  ''// DRC stuff''
  ifilter_bank(ics1, spec_coeff1)
  ifilter_bank(ics2, spec_coeff2)
  save window shape for next frame (I thought frames were independent)
  ''// LTP stuff''
  ''// SBR stuff''
 
== allocate_channel_pair() ==
This function mostly contains considerations for non-LC decoding modes. Need to revisit the mechanics of this later.
 
== inverse_quantization(real spec_coeff[1024], int16 spec_data[1024]) ==
Perform inverse non-linear quantization of spectral coefficients, converting from int -> real in the process. Can this be effectively parallelized via SIMD?
  foreach i in 0..1023
    spec_coeff[i] = abs(spec_data[i])<sup>4/3</sup>
    if (spec_data[i] < 0)
      spec_coeff[i] = -spec_coeff[i]
 
== apply_scalefactors(ic_string ics, real spec_coeff[1024]) ==
  nshort = frame_length (1024) / 8
  groups = 0
  foreach g in 0..ics.num_window_groups - 1
    k = 0
    foreach scalefactor_band in 0..ics.max_scalefactor_bands - 1
      top = ics.section_scalefactor_band_offset[g][scalefactor_band + 1]
      if (ics.scalefactors[g][scalefactor_band] < 0) OR (ics.scalefactors[g][scalefactor_band] > 255)
        scale = 1
      else
        exponent = (ics.scalefactors[g][scalefactor_band] - 100) / 4.0
        scale = 2<sup>exponent</sup>
      while (k < top)
        spec_coeff[k + (groups * nshort) + 0] *= scale
        spec_coeff[k + (groups * nshort) + 1] *= scale
        spec_coeff[k + (groups * nshort) + 2] *= scale
        spec_coeff[k + (groups * nshort) + 3] *= scale
        k += 4
    groups += ics.window_group_length[g]
 
== quant_to_spec(ic_stream ics, real spec_coeff[1024]) ==
Comment from FAAD2:
  For ONLY_LONG_SEQUENCE windows (num_window_groups = 1,
  window_group_length[0] = 1) the spectral data is in ascending spectral
  order.
  For the EIGHT_SHORT_SEQUENCE window, the spectral order depends on the
  grouping in the following manner:
  - Groups are ordered sequentially
  - Within a group, a scalefactor band consists of the spectral data of all
    grouped SHORT_WINDOWs for the associated scalefactor window band. To
    clarify via example, the length of a group is in the range of one to eight
    SHORT_WINDOWs.
  - If there are eight groups each with length one (num_window_groups = 8,
    window_group_length[0..7] = 1), the result is a sequence of eight spectra,
    each in ascending spectral order.
  - If there is only one group with length eight (num_window_groups = 1,
    window_group_length[0] = 8), the result is that spectral data of all eight
    SHORT_WINDOWs is interleaved by scalefactor window bands.
  - Within a scalefactor window band, the coefficients are in ascending
    spectral order.
Algorithm:
  allocate a temporary spectral data array: temp_spec[1024]
  k = gindex = 0
  foreach g in 0..ics.num_window_groups - 1
    j = gincrease = 0
    window_increment = ics.scalefactor_window_band_offset[ics.num_scalefactor_window_bands]
    foreach scalefactor_band in 0..ics.num_scalefactor_window_bands - 1
      width = ics.scalefactor_window_band_offset[scalefactor_band + 1] -
      ics.scalefactor_window_band[scalefactor_band]
      foreach window in 0..ics.window_group_length[g] - 1
        foreach bin in 0..width - 1, step 4
          temp_spec[gindex + (window * window_increment) + j + bin + 0] = spec_coeff[k + 0]
          temp_spec[gindex + (window * window_increment) + j + bin + 1] = spec_coeff[k + 1]
          temp_spec[gindex + (window * window_increment) + j + bin + 2] = spec_coeff[k + 2]
          temp_spec[gindex + (window * window_increment) + j + bin + 3] = spec_coeff[k + 3]
          gincrease += 4
          k += 4
      j += width
    gindex + gincrease
  copy temp_spec -> spec_coeff
 
== pns_decode(ic_stream ics1, ic_stream ics2, real spec_coeff1[1024], real spec_coeff2[1024], bool channel_pair) ==
pns = perceptual noise substitution
 
  group = 0
  nshort = frame_length / 8 (frame_length is usually 1024)
  foreach g in ics1.num_window_groups - 1
    foreach b in ics1.window_group_length[g] - 1
      foreach scalefactor_band in 0..ics1.max_scalefactor_bands - 1
        if ics1.scalefactor_band_codebook[group][scalefactor_band] == NOISE_HCB (13)
          /* LTP and PNS are not allowed in the same band and PNS takes precedence;
          * thus, disable LTP for this band */
          ics1.ltp.long_used[scalefactor_band] = 0
          ics1.ltp2.long_used[scalefactor_band] = 0
          /* disable predictors for bands using PNS */
          ics1.pred.prediction_used[scalefactor_band] = 0
          offset = ics1.scalefactor_window_band_offset[scalefactor_band]
          size = ics1.scalefactor_window_band_offset[scalefactor_band + 1] - offset
          generate_random_vector(&spec_coeff1[(group * nshort) + offset], ics1.scalefactors[g][scalefactor_band], size)
        if channel_pair AND ics2.scalefactor_band_codebook[g][scalefactor_band] == NOISE_HCB (13)
          if (ics1.midside_mask_present == 1 AND ics1.midside_used[g][scalefactor_band]) OR ics1.midside_mask_present == 2
            offset = ics2.scalefactor_window_band_offset[scalefactor_band]
            size = ics2.scalefactor_window_band_offset[scalefactor_band + 1] - offset
            foreach c in 0..size - 1
              spec_coeff2[(group * nshort) + offset + c] = spec_coeff1[(group * nshort) + offset + c]
          else
            /* LTP and PNS are not allowed in the same band and PNS takes precedence;
            * thus, disable LTP for this band */
            ics2.ltp.long_used[scalefactor_band] = 0
            ics2.ltp2.long_used[scalefactor_band] = 0
            /* disable predictors for bands using PNS */
            ics2.pred.prediction_used[scalefactor_band] = 0
            offset = ics2.scalefactor_window_band_offset[scalefactor_band]
            size = ics2.scalefactor_window_band_offset[scalefactor_band + 1] - offset
            generate_random_vector(&spec_coeff2[(group * nshort) + offset], ics2.scalefactors[g][scalefactor_band], size)
        group++
 
== generate_random_vector(real spec_coeff[], int scale_factor, int size) ==
The AAC decode process can generate random numbers as part of the decode process. It is not known if any random number generator will do. This section documents the precise operation of the generator used in FAAD2. A comment from the source code:
/* The function gen_rand_vector(addr, size) generates a vector of length
  <size> with signed random values of average energy MEAN_NRG per random
  value. A suitable random number generator can be realized using one
  multiplication/accumulation per random value.
*/
Algorithm:
  energy = 0
  scale = 1/size
  foreach i in 0..size - 1
    tmp = scale * random_int()
    spec_coeff[i] = tmp
    energy += tmp * tmp
  exponent = 0.25 * scale_factor;
  scale = 2<sup>exponent</sup>/sqrt(energy)
  foreach 1 in 0..size - 1
    spec_coeff[i] *= scale
 
As for the random_int() function, this is FAAD's comment:
/*
*  This is a simple random number generator with good quality for audio purposes.
*  It consists of two polycounters with opposite rotation direction and different
*  periods. The periods are coprime, so the total period is the product of both.
*
*    -------------------------------------------------------------------------------------------------
* +-> |31:30:29:28:27:26:25:24:23:22:21:20:19:18:17:16:15:14:13:12:11:10: 9: 8: 7: 6: 5: 4: 3: 2: 1: 0|
* |  -------------------------------------------------------------------------------------------------
* |                                                                          |  |  |  |    |        |
* |                                                                          +--+--+--+-XOR-+--------+
* |                                                                                      |
* +--------------------------------------------------------------------------------------+
*
*    -------------------------------------------------------------------------------------------------
*    |31:30:29:28:27:26:25:24:23:22:21:20:19:18:17:16:15:14:13:12:11:10: 9: 8: 7: 6: 5: 4: 3: 2: 1: 0| <-+
*    -------------------------------------------------------------------------------------------------  |
*      |  |          |  |                                                                              |
*      +--+----XOR----+--+                                                                              |
*                |                                                                                        |
*                +----------------------------------------------------------------------------------------+
*
*
*  The first has an period of 3*5*17*257*65537, the second of 7*47*73*178481,
*  which gives a period of 18.410.713.077.675.721.215. The result is the
*  XORed values of both generators.
*/
The algorithm is:
  declare static r1 and r2 as 32-bit unsigned its, initialize both to 0
  declare t1, t2, t3, and t4 as 32-bit unsigned ints
  t3 = t1 = r1
  t4 = t2 = r2
  t1 &= 0xF5
  t2 >>= 25
  t1 = parity[t1]
  t2 &= 0x63
  t1 <<= 31
  t2 = parity[t2]
  r1 = (t3 >> 1) | t1
  r2 = (t4 + t4) | t2
  random_int = r1 ^ r2
 
This is the parity table referenced in the function:
static const  uint8_t    parity [256] = {
        0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0,1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,1,
        1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0,
        1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0,
        0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0,1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,1,
        1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0,
        0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0,1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,1,
        0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0,1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,1,
        1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0
};
 
== ms_decode(ic_stream ics1, ic_stream ics2, real spec_coeff1[1024], real spec_coeff2[1024]) ==
This function decodes mid/side coding between two channels.
  group = 0
  nshort = frame_length / 8 (frame_length is usualy 1024)
  if ics1.midside_mask_present
    foreach g in 0..ics1.num_window_groups - 1
      foreach b in 0..ics.window_group_length[g] - 1
        foreach scalefactor_band in 0..ics1.max_scalefactor_band - 1
          /* no M/S coding if in PNS or IS are enable for the scalefactor band */
          if (ics1.midside_used[g][scalefactor_band] OR ics1.midside_mask_present == 2) AND
              ics1.scalefactor_band_codebook[group][scalefactor_band] != NOISE_HCB (13) AND
              ics2.scalefactor_band_codebook[group][scalefactor_band] != INTENSITY_HCB or INTENSITY_HCB2 (15 or 14, respectively)
            foreach i in ics1.scalefactor_window_band_offset[scalefactor_band] .. ics1.scalefactor_window_band_offset[scalefactor_band + 1]
              k = (group * nshort) + i
              tmp = spec_coeff1[k] - spec_coeff2[k]
              spec_coeff1[k] += spec_coeff2[k]
              spec_coeff2[k] = tmp
        group++
 
== is_decode(ic_stream ics1, ic_stream ics2, real spec_coeff1[1024], real spec_coeff2[1024]) ==
This function decodes intensity stereo coding.
  nshort = frame_length / 8 (frame_length is usually 1024)
  group = 0
  foreach g in 0..ics2.num_window_groups - 1
    foreach b in 0..ics2.window_group_length[g]
      if ics2.scalefactor_band_codebook[group][scalefactor_band] is either INTENSITY_HCB or INTENSITY_HCB2 (15 or 14)
        /* disable predictors if intensity stereo is in use for this scalefactor band */
        ics1.pred.prediction_used[scalefactor_band] = 0
        ics2.pred.prediction_used[scalefactor_band] = 0
        exponent = 0.25 * ics2.scalefactors[g][scalefactor_band]
        scale = 0.5<sup>exponent</sup>
        foreach i in ics2.scalefactor_window_band_offset[scalefactor_band] .. ics2.scalefactor_window_band_offset[scalefactor_band + 1]
          spec_coeff2[(group * nshort) + i] = spec_coeff1[(group * nshort) + i] * scale
          invert spec_coeff2[(group * nshort) + i] under certain conditions '''(TODO)'''
      group++
 
== ic_prediction() ==
This function applies to the main profile.
 
== pns_reset_pred_state() ==
This function applies to the main profile.
 
== tns_decode_frame(ic_stream ics, tns_info tns, int sample_rate_index, int object_type, real spec_coeff[1024]) ==
tns = temporal noise shaping
  nshort = frame_length / 8 (frame_length is usually 1024)
  declare array of real called lpc[TNS_MAX_ORDER + 1] (TNS_MAX_ORDER = 20)
  if ics.tns_data_present == 0
    nothing to do in this phase
  '''(TODO)'''
 
== drc_decode() ==
 
== ifilter_bank() ==
This function transforms the spectral data through a modified discrete cosine transform (MDCT) and windows the transformed time domain samples.
 
== imdct_long() ==
In low complexity mode, this function simply calls down to faad_imdct(). If LD is enabled, there is some more processing beforehand.
 
== faad_imdct(real *in, real *out) ==
This comment appears in FAAD2's mdct.c file header:
 
"Fast (I)MDCT Implementation using (I)FFT ((Inverse) Fast Fourier Transform) and consists of three steps: pre-(I)FFT complex multiplication, complex (I)FFT, post-(I)FFT complex multiplication."
 
This function is apparently a good candidate for SIMD optimization.
 
== ssr_decode() ==
 
== lt_update_state() ==
 
== sbrDecodeInit() ==
 
== sbrDecodeCoupleFrame() ==

Revision as of 07:52, 11 April 2007

Part of Understanding AAC

This page describes the process of reconstructing PCM data based on the decoded CPE parameters in an AAC bitstream. For the moment, this description focuses on what is necessary to reconstruct low complexity (LC) profile data.

Reconstruction Process

 specrec.c:reconstruct_channel_pair()
    -specrec.c:allocate_channel_pair()
    -specrec.c:inverse_quantization()
    -specrec.c:apply_scalefactors()
    -specrec.c:quant_to_spec()
    -pns.c:pns_decode()
      -pns.c:generate_random_vector()
    -ms.c:ms_decode()
    -is.c:is_decode()
    -ic_predict.c:ic_prediction() (main profile)
    -ic_predict.c:pns_reset_pred_state() (main profile)
    -(processing that is specific LTP