Understanding AAC: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
(")
(")
Line 5: Line 5:


More possible details here: [http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-avt-rtp-mpeg2aac-00.txt http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-avt-rtp-mpeg2aac-00.txt] or [http://tools.ietf.org/wg/avt/draft-ietf-avt-rtp-mpeg2aac/ http://tools.ietf.org/wg/avt/draft-ietf-avt-rtp-mpeg2aac/]
More possible details here: [http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-avt-rtp-mpeg2aac-00.txt http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-avt-rtp-mpeg2aac-00.txt] or [http://tools.ietf.org/wg/avt/draft-ietf-avt-rtp-mpeg2aac/ http://tools.ietf.org/wg/avt/draft-ietf-avt-rtp-mpeg2aac/]
== Overview ==
AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
AAC includes a variety of profiles:
* low complexity (LC): reported to be the simplest (Apple iTunes files)
* main (MAIN): LC profile with backwards prediction
* sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
* long term prediction (LTP): main profile with forward prediction
* high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
* FAAD refers to another profile named LD, possibly the same as SRS
* provisions all over the libfaad source for error recovery (ER)
== Bitpacking ==
Done in most significant byte first, most significant bit first. Example:
5 bits: 2 (00010)
4 bits: 4 (0100)
4 bits: 2 (0010)
3 bits: 0 (000)
Byte 1: 00010010
Byte 2: 00010000
00010010 00010000
[ 2 ][ 4 ][2 ][0]
== Packaging/Encapsulation And Setup Data==
There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds  atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
5 bits: object type
4 bits: frequency index
if (frequency index == 15)
    24 bits: frequency
4 bits: channel configuration
1 bit: frame length flag
1 bit: dependsOnCoreCoder
1 bit: extensionFlag
These are the possible object types:
* 0: NULL
* 1: AAC Main
* 2: AAC Low complexity
* 3: AAC SSR
* 4: AAC Long term prediction
* 5: AAC High efficiency
* 6: Scalable
* 7: [[TwinVQ]]
* 8: CELP
* 9: HVXC
* 10: Reserved
* 11: Reserved
* 12: TTSI
* 13: Main synthetic
* 14: Wavetable synthesis
* 15: General MIDI
* 16: Algorithmic Synthesis and Audio FX
* 17: AAC Low complexity with error recovery
* 18: Reserved
* 19: AAC Long term prediction with error recovery
* 20: AAC scalable with error recovery
* 21: TwinVQ with error recovery
* 22: BSAC with error recovery
* 23: AAC LD with error recovery
* 24: CELP with error recovery
* 25: HXVC with error recovery
* 26: HILN with error recovery
* 27: Parametric with error recovery
* 28: Reserved
* 29: Reserved
* 30: Reserved
* 31: Reserved
There are 13 supported frequencies (frequency indices 13..14 are invalid):
* 0: 96000 Hz
* 1: 88200 Hz
* 2: 64000 Hz
* 3: 48000 Hz
* 4: 44100 Hz
* 5: 32000 Hz
* 6: 24000 Hz
* 7: 22050 Hz
* 8: 16000 Hz
* 9: 12000 Hz
* 10: 11025 Hz
* 11: 8000 Hz
* 12: 7350 Hz
* 15: frequency is written explictly
These are the channel configurations:
* 0: custom configuration '''(TODO)'''
* 1: 1 channel: front-center
* 2: 2 channels: front-left, front-right
* 3: 3 channels: front-center, front-left, front-right
* 4: 4 channels: front-center, front-left, front-right, back-center
* 5: 5 channels: front-center, front-left, front-right, back-left, back-right
* 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
* 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
frame length flag:
* 0: Each packet contains 1024 samples
* 1: Each packet contains 960 samples
== Frames And Syntax Elements ==
In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
There are 8 different syntax elements:
* 0 SCE  single channel element (codes a single audio channel)
* 1 CPE  channel pair element (codes stereo signal)
* 2 CCE  something to do with channel coupling, not implemented in libfaad2
* 3 LFE  low-frequency effects? referenced as "special effects" in RTP doc
* 4 DSE  data stream element (user data)
* 5 PCE  program configuration element (describe bitstream)
* 6 FIL  fill element (pad space/extension data)
* 7 END  marks the end of the frame
This is an example layout for a 5.1 audio stream:
SCE CPE CPE LFE END
indicates
center - left/right - surround left/right - lfe - end
An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
== Decoding Process ==
First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
* ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
* ms = any parameter with this in its name deals with mid/side coding
* sfb = probably something to do with scale factors
* swb = scalefactor window band
* is = intensity stereo
As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
  max_sfb
  num_swb
  num_window_groups
  num_windows
  window_sequence
  window_group_length[8]
  window_shape
  scale_factor_grouping
  section_sfb_offset[8][8*15]
  swb_offset
  section_codebook[8][15*8]
  section_start[8][15*8]
  section_end[8][15*8]
  sfb_codebook[8][15*8]
  number_sections[8]  ''// number of sections in a group''
  global_gain
  scale_factors[8][51]  ''// FAAD2 comment: [0..255]?''
  ms_mask_present
  ms_used[MAX_WINDOW_GROUPS][MAX_SFB]  ''// dimensions = [8][51]''
  noise_used
  pulse_data_present
  tns_data_present
  gain_control_data_present
  predictor_data_present
  pulse_info pulse
  tns_info tns
  ''data structures for main profile, document later''
  ''data structures for LTP, document later''
  ''data structures for SSR, document later''
  ''data structures for error resilience, document later''
These pages detail the process for decoding the various syntax elements:
* [[Decoding AAC SCE and LFE]]
* [[Decoding AAC CPE]]
** [[Reconstructing AAC CPE]]
* Decoding AAC CCE
* Decoding AAC DSE
* Decoding AAC PCE
* [[Decoding AAC FIL]]
* [[Decoding AAC END]]
== Overview ==
AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
AAC includes a variety of profiles:
* low complexity (LC): reported to be the simplest (Apple iTunes files)
* main (MAIN): LC profile with backwards prediction
* sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
* long term prediction (LTP): main profile with forward prediction
* high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
* FAAD refers to another profile named LD, possibly the same as SRS
* provisions all over the libfaad source for error recovery (ER)
== Bitpacking ==
Done in most significant byte first, most significant bit first. Example:
5 bits: 2 (00010)
4 bits: 4 (0100)
4 bits: 2 (0010)
3 bits: 0 (000)
Byte 1: 00010010
Byte 2: 00010000
00010010 00010000
[ 2 ][ 4 ][2 ][0]
== Packaging/Encapsulation And Setup Data==
There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds  atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
5 bits: object type
4 bits: frequency index
if (frequency index == 15)
    24 bits: frequency
4 bits: channel configuration
1 bit: frame length flag
1 bit: dependsOnCoreCoder
1 bit: extensionFlag
These are the possible object types:
* 0: NULL
* 1: AAC Main
* 2: AAC Low complexity
* 3: AAC SSR
* 4: AAC Long term prediction
* 5: AAC High efficiency
* 6: Scalable
* 7: [[TwinVQ]]
* 8: CELP
* 9: HVXC
* 10: Reserved
* 11: Reserved
* 12: TTSI
* 13: Main synthetic
* 14: Wavetable synthesis
* 15: General MIDI
* 16: Algorithmic Synthesis and Audio FX
* 17: AAC Low complexity with error recovery
* 18: Reserved
* 19: AAC Long term prediction with error recovery
* 20: AAC scalable with error recovery
* 21: TwinVQ with error recovery
* 22: BSAC with error recovery
* 23: AAC LD with error recovery
* 24: CELP with error recovery
* 25: HXVC with error recovery
* 26: HILN with error recovery
* 27: Parametric with error recovery
* 28: Reserved
* 29: Reserved
* 30: Reserved
* 31: Reserved
There are 13 supported frequencies (frequency indices 13..14 are invalid):
* 0: 96000 Hz
* 1: 88200 Hz
* 2: 64000 Hz
* 3: 48000 Hz
* 4: 44100 Hz
* 5: 32000 Hz
* 6: 24000 Hz
* 7: 22050 Hz
* 8: 16000 Hz
* 9: 12000 Hz
* 10: 11025 Hz
* 11: 8000 Hz
* 12: 7350 Hz
* 15: frequency is written explictly
These are the channel configurations:
* 0: custom configuration '''(TODO)'''
* 1: 1 channel: front-center
* 2: 2 channels: front-left, front-right
* 3: 3 channels: front-center, front-left, front-right
* 4: 4 channels: front-center, front-left, front-right, back-center
* 5: 5 channels: front-center, front-left, front-right, back-left, back-right
* 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
* 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
frame length flag:
* 0: Each packet contains 1024 samples
* 1: Each packet contains 960 samples
== Frames And Syntax Elements ==
In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
There are 8 different syntax elements:
* 0 SCE  single channel element (codes a single audio channel)
* 1 CPE  channel pair element (codes stereo signal)
* 2 CCE  something to do with channel coupling, not implemented in libfaad2
* 3 LFE  low-frequency effects? referenced as "special effects" in RTP doc
* 4 DSE  data stream element (user data)
* 5 PCE  program configuration element (describe bitstream)
* 6 FIL  fill element (pad space/extension data)
* 7 END  marks the end of the frame
This is an example layout for a 5.1 audio stream:
SCE CPE CPE LFE END
indicates
center - left/right - surround left/right - lfe - end
An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
== Decoding Process ==
First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
* ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
* ms = any parameter with this in its name deals with mid/side coding
* sfb = probably something to do with scale factors
* swb = scalefactor window band
* is = intensity stereo
As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
  max_sfb
  num_swb
  num_window_groups
  num_windows
  window_sequence
  window_group_length[8]
  window_shape
  scale_factor_grouping
  section_sfb_offset[8][8*15]
  swb_offset
  section_codebook[8][15*8]
  section_start[8][15*8]
  section_end[8][15*8]
  sfb_codebook[8][15*8]
  number_sections[8]  ''// number of sections in a group''
  global_gain
  scale_factors[8][51]  ''// FAAD2 comment: [0..255]?''
  ms_mask_present
  ms_used[MAX_WINDOW_GROUPS][MAX_SFB]  ''// dimensions = [8][51]''
  noise_used
  pulse_data_present
  tns_data_present
  gain_control_data_present
  predictor_data_present
  pulse_info pulse
  tns_info tns
  ''data structures for main profile, document later''
  ''data structures for LTP, document later''
  ''data structures for SSR, document later''
  ''data structures for error resilience, document later''
These pages detail the process for decoding the various syntax elements:
* [[Decoding AAC SCE and LFE]]
* [[Decoding AAC CPE]]
** [[Reconstructing AAC CPE]]
* Decoding AAC CCE
* Decoding AAC DSE
* Decoding AAC PCE
* [[Decoding AAC FIL]]
* [[Decoding AAC END]]
<div id="nolabel" style="overflow:auto;height:1px;">
Pharmacy themes
This very nice Pharmacy:
Order tramadol, Search over 500,000 pharmacy Archive [http://www.zorpia.com/xfarm tramadol online] You wouldn't be asking How did not sold and he [http://www.geocities.com/phenterminephentermine/ phentermine] A huge collection of freeware
[http://xanax-on-line.umaxnet.com/ xanax on line]
[http://2mg-xanax.umaxnet.com/ 2mg xanax] mean the events in this-wait [http://generic-xanax.umaxnet.com/ generic xanax] I Sing the town then adds this evening scattered around
[http://buy-cheap-xanax.umaxnet.com/ buy cheap xanax]
[http://buy-xanax-online.umaxnet.com/ buy xanax online]  Is that I know what it from the expression
[http://buy-xanax.umaxnet.com/ buy xanax]
</div>


== Overview ==
== Overview ==

Revision as of 21:59, 23 March 2006

This portion of the MultimediaWiki tracks an effort to get an open, freely-distributable, usable, and clear specification for the Advanced Audio Coding (AAC) format. The goal is to understand enough details about the format to create new decoder implementations that can handle production bitstreams starting with data packaged inside MPEG-4 files.

The homepage for libfaad has a Wiki that provides some decent details regarding the background coding concepts: http://www.audiocoding.com/modules/wiki/?page=AAC

More possible details here: http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-avt-rtp-mpeg2aac-00.txt or http://tools.ietf.org/wg/avt/draft-ietf-avt-rtp-mpeg2aac/

Overview

AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.

Surface details of the format can be found at Wikipedia: http://en.wikipedia.org/wiki/Advanced_Audio_Coding

Conformance vectors can be obtained here: ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/

AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).

AAC includes a variety of profiles:

  • low complexity (LC): reported to be the simplest (Apple iTunes files)
  • main (MAIN): LC profile with backwards prediction
  • sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
  • long term prediction (LTP): main profile with forward prediction
  • high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
  • FAAD refers to another profile named LD, possibly the same as SRS
  • provisions all over the libfaad source for error recovery (ER)

Bitpacking

Done in most significant byte first, most significant bit first. Example:

5 bits: 2 (00010)
4 bits: 4 (0100)
4 bits: 2 (0010)
3 bits: 0 (000)

Byte 1: 00010010
Byte 2: 00010000

00010010 00010000
[ 2 ][ 4 ][2 ][0]

Packaging/Encapsulation And Setup Data

There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.

Much AAC data is encapsulated in MPEG-4 files which is an extension of the Apple QuickTime container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. (TODO: need to document the precise format and method for obtaining the setup data.) This setup data is generally 2 bytes. This setup data has the following layout:

5 bits: object type
4 bits: frequency index
if (frequency index == 15)
    24 bits: frequency
4 bits: channel configuration
1 bit: frame length flag
1 bit: dependsOnCoreCoder
1 bit: extensionFlag

These are the possible object types:

  • 0: NULL
  • 1: AAC Main
  • 2: AAC Low complexity
  • 3: AAC SSR
  • 4: AAC Long term prediction
  • 5: AAC High efficiency
  • 6: Scalable
  • 7: TwinVQ
  • 8: CELP
  • 9: HVXC
  • 10: Reserved
  • 11: Reserved
  • 12: TTSI
  • 13: Main synthetic
  • 14: Wavetable synthesis
  • 15: General MIDI
  • 16: Algorithmic Synthesis and Audio FX
  • 17: AAC Low complexity with error recovery
  • 18: Reserved
  • 19: AAC Long term prediction with error recovery
  • 20: AAC scalable with error recovery
  • 21: TwinVQ with error recovery
  • 22: BSAC with error recovery
  • 23: AAC LD with error recovery
  • 24: CELP with error recovery
  • 25: HXVC with error recovery
  • 26: HILN with error recovery
  • 27: Parametric with error recovery
  • 28: Reserved
  • 29: Reserved
  • 30: Reserved
  • 31: Reserved

There are 13 supported frequencies (frequency indices 13..14 are invalid):

  • 0: 96000 Hz
  • 1: 88200 Hz
  • 2: 64000 Hz
  • 3: 48000 Hz
  • 4: 44100 Hz
  • 5: 32000 Hz
  • 6: 24000 Hz
  • 7: 22050 Hz
  • 8: 16000 Hz
  • 9: 12000 Hz
  • 10: 11025 Hz
  • 11: 8000 Hz
  • 12: 7350 Hz
  • 15: frequency is written explictly

These are the channel configurations:

  • 0: custom configuration (TODO)
  • 1: 1 channel: front-center
  • 2: 2 channels: front-left, front-right
  • 3: 3 channels: front-center, front-left, front-right
  • 4: 4 channels: front-center, front-left, front-right, back-center
  • 5: 5 channels: front-center, front-left, front-right, back-left, back-right
  • 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
  • 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel

frame length flag:

  • 0: Each packet contains 1024 samples
  • 1: Each packet contains 960 samples

Frames And Syntax Elements

In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.

An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.

There are 8 different syntax elements:

  • 0 SCE single channel element (codes a single audio channel)
  • 1 CPE channel pair element (codes stereo signal)
  • 2 CCE something to do with channel coupling, not implemented in libfaad2
  • 3 LFE low-frequency effects? referenced as &quot;special effects&quot; in RTP doc
  • 4 DSE data stream element (user data)
  • 5 PCE program configuration element (describe bitstream)
  • 6 FIL fill element (pad space/extension data)
  • 7 END marks the end of the frame

This is an example layout for a 5.1 audio stream:

SCE CPE CPE LFE END

indicates

center - left/right - surround left/right - lfe - end 

An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).

Decoding Process

First, let's list a few basic terms that FAAD2 uses throughout its decoding process:

  • ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
  • ms = any parameter with this in its name deals with mid/side coding
  • sfb = probably something to do with scale factors
  • swb = scalefactor window band
  • is = intensity stereo

As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:

 max_sfb
 num_swb
 num_window_groups
 num_windows
 window_sequence
 window_group_length[8]
 window_shape
 scale_factor_grouping
 section_sfb_offset[8][8*15]
 swb_offset
 section_codebook[8][15*8]
 section_start[8][15*8]
 section_end[8][15*8]
 sfb_codebook[8][15*8]
 number_sections[8]  // number of sections in a group
 global_gain
 scale_factors[8][51]  // FAAD2 comment: [0..255]?
 ms_mask_present
 ms_used[MAX_WINDOW_GROUPS][MAX_SFB]  // dimensions = [8][51]
 noise_used
 pulse_data_present
 tns_data_present
 gain_control_data_present
 predictor_data_present
 pulse_info pulse
 tns_info tns
 data structures for main profile, document later
 data structures for LTP, document later
 data structures for SSR, document later
 data structures for error resilience, document later

These pages detail the process for decoding the various syntax elements:



Overview

AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.

Surface details of the format can be found at Wikipedia: http://en.wikipedia.org/wiki/Advanced_Audio_Coding

Conformance vectors can be obtained here: ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/

AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).

AAC includes a variety of profiles:

  • low complexity (LC): reported to be the simplest (Apple iTunes files)
  • main (MAIN): LC profile with backwards prediction
  • sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
  • long term prediction (LTP): main profile with forward prediction
  • high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
  • FAAD refers to another profile named LD, possibly the same as SRS
  • provisions all over the libfaad source for error recovery (ER)

Bitpacking

Done in most significant byte first, most significant bit first. Example:

5 bits: 2 (00010)
4 bits: 4 (0100)
4 bits: 2 (0010)
3 bits: 0 (000)

Byte 1: 00010010
Byte 2: 00010000

00010010 00010000
[ 2 ][ 4 ][2 ][0]

Packaging/Encapsulation And Setup Data

There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.

Much AAC data is encapsulated in MPEG-4 files which is an extension of the Apple QuickTime container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. (TODO: need to document the precise format and method for obtaining the setup data.) This setup data is generally 2 bytes. This setup data has the following layout:

5 bits: object type
4 bits: frequency index
if (frequency index == 15)
    24 bits: frequency
4 bits: channel configuration
1 bit: frame length flag
1 bit: dependsOnCoreCoder
1 bit: extensionFlag

These are the possible object types:

  • 0: NULL
  • 1: AAC Main
  • 2: AAC Low complexity
  • 3: AAC SSR
  • 4: AAC Long term prediction
  • 5: AAC High efficiency
  • 6: Scalable
  • 7: TwinVQ
  • 8: CELP
  • 9: HVXC
  • 10: Reserved
  • 11: Reserved
  • 12: TTSI
  • 13: Main synthetic
  • 14: Wavetable synthesis
  • 15: General MIDI
  • 16: Algorithmic Synthesis and Audio FX
  • 17: AAC Low complexity with error recovery
  • 18: Reserved
  • 19: AAC Long term prediction with error recovery
  • 20: AAC scalable with error recovery
  • 21: TwinVQ with error recovery
  • 22: BSAC with error recovery
  • 23: AAC LD with error recovery
  • 24: CELP with error recovery
  • 25: HXVC with error recovery
  • 26: HILN with error recovery
  • 27: Parametric with error recovery
  • 28: Reserved
  • 29: Reserved
  • 30: Reserved
  • 31: Reserved

There are 13 supported frequencies (frequency indices 13..14 are invalid):

  • 0: 96000 Hz
  • 1: 88200 Hz
  • 2: 64000 Hz
  • 3: 48000 Hz
  • 4: 44100 Hz
  • 5: 32000 Hz
  • 6: 24000 Hz
  • 7: 22050 Hz
  • 8: 16000 Hz
  • 9: 12000 Hz
  • 10: 11025 Hz
  • 11: 8000 Hz
  • 12: 7350 Hz
  • 15: frequency is written explictly

These are the channel configurations:

  • 0: custom configuration (TODO)
  • 1: 1 channel: front-center
  • 2: 2 channels: front-left, front-right
  • 3: 3 channels: front-center, front-left, front-right
  • 4: 4 channels: front-center, front-left, front-right, back-center
  • 5: 5 channels: front-center, front-left, front-right, back-left, back-right
  • 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
  • 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel

frame length flag:

  • 0: Each packet contains 1024 samples
  • 1: Each packet contains 960 samples

Frames And Syntax Elements

In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.

An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.

There are 8 different syntax elements:

  • 0 SCE single channel element (codes a single audio channel)
  • 1 CPE channel pair element (codes stereo signal)
  • 2 CCE something to do with channel coupling, not implemented in libfaad2
  • 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
  • 4 DSE data stream element (user data)
  • 5 PCE program configuration element (describe bitstream)
  • 6 FIL fill element (pad space/extension data)
  • 7 END marks the end of the frame

This is an example layout for a 5.1 audio stream:

SCE CPE CPE LFE END

indicates

center - left/right - surround left/right - lfe - end 

An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).

Decoding Process

First, let's list a few basic terms that FAAD2 uses throughout its decoding process:

  • ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
  • ms = any parameter with this in its name deals with mid/side coding
  • sfb = probably something to do with scale factors
  • swb = scalefactor window band
  • is = intensity stereo

As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:

 max_sfb
 num_swb
 num_window_groups
 num_windows
 window_sequence
 window_group_length[8]
 window_shape
 scale_factor_grouping
 section_sfb_offset[8][8*15]
 swb_offset
 section_codebook[8][15*8]
 section_start[8][15*8]
 section_end[8][15*8]
 sfb_codebook[8][15*8]
 number_sections[8]  // number of sections in a group
 global_gain
 scale_factors[8][51]  // FAAD2 comment: [0..255]?
 ms_mask_present
 ms_used[MAX_WINDOW_GROUPS][MAX_SFB]  // dimensions = [8][51]
 noise_used
 pulse_data_present
 tns_data_present
 gain_control_data_present
 predictor_data_present
 pulse_info pulse
 tns_info tns
 data structures for main profile, document later
 data structures for LTP, document later
 data structures for SSR, document later
 data structures for error resilience, document later

These pages detail the process for decoding the various syntax elements:


Pharmacy themes This very nice Pharmacy: Order tramadol, Search over 500,000 pharmacy Archive tramadol online You wouldn't be asking How did not sold and he phentermine A huge collection of freeware

xanax on line 

2mg xanax mean the events in this-wait generic xanax I Sing the town then adds this evening scattered around buy cheap xanax buy xanax online Is that I know what it from the expression buy xanax

Overview

AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.

Surface details of the format can be found at Wikipedia: http://en.wikipedia.org/wiki/Advanced_Audio_Coding

Conformance vectors can be obtained here: ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/

AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).

AAC includes a variety of profiles:

  • low complexity (LC): reported to be the simplest (Apple iTunes files)
  • main (MAIN): LC profile with backwards prediction
  • sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
  • long term prediction (LTP): main profile with forward prediction
  • high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
  • FAAD refers to another profile named LD, possibly the same as SRS
  • provisions all over the libfaad source for error recovery (ER)

Bitpacking

Done in most significant byte first, most significant bit first. Example:

5 bits: 2 (00010)
4 bits: 4 (0100)
4 bits: 2 (0010)
3 bits: 0 (000)

Byte 1: 00010010
Byte 2: 00010000

00010010 00010000
[ 2 ][ 4 ][2 ][0]

Packaging/Encapsulation And Setup Data

There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.

Much AAC data is encapsulated in MPEG-4 files which is an extension of the Apple QuickTime container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. (TODO: need to document the precise format and method for obtaining the setup data.) This setup data is generally 2 bytes. This setup data has the following layout:

5 bits: object type
4 bits: frequency index
if (frequency index == 15)
    24 bits: frequency
4 bits: channel configuration
1 bit: frame length flag
1 bit: dependsOnCoreCoder
1 bit: extensionFlag

These are the possible object types:

  • 0: NULL
  • 1: AAC Main
  • 2: AAC Low complexity
  • 3: AAC SSR
  • 4: AAC Long term prediction
  • 5: AAC High efficiency
  • 6: Scalable
  • 7: TwinVQ
  • 8: CELP
  • 9: HVXC
  • 10: Reserved
  • 11: Reserved
  • 12: TTSI
  • 13: Main synthetic
  • 14: Wavetable synthesis
  • 15: General MIDI
  • 16: Algorithmic Synthesis and Audio FX
  • 17: AAC Low complexity with error recovery
  • 18: Reserved
  • 19: AAC Long term prediction with error recovery
  • 20: AAC scalable with error recovery
  • 21: TwinVQ with error recovery
  • 22: BSAC with error recovery
  • 23: AAC LD with error recovery
  • 24: CELP with error recovery
  • 25: HXVC with error recovery
  • 26: HILN with error recovery
  • 27: Parametric with error recovery
  • 28: Reserved
  • 29: Reserved
  • 30: Reserved
  • 31: Reserved

There are 13 supported frequencies (frequency indices 13..14 are invalid):

  • 0: 96000 Hz
  • 1: 88200 Hz
  • 2: 64000 Hz
  • 3: 48000 Hz
  • 4: 44100 Hz
  • 5: 32000 Hz
  • 6: 24000 Hz
  • 7: 22050 Hz
  • 8: 16000 Hz
  • 9: 12000 Hz
  • 10: 11025 Hz
  • 11: 8000 Hz
  • 12: 7350 Hz
  • 15: frequency is written explictly

These are the channel configurations:

  • 0: custom configuration (TODO)
  • 1: 1 channel: front-center
  • 2: 2 channels: front-left, front-right
  • 3: 3 channels: front-center, front-left, front-right
  • 4: 4 channels: front-center, front-left, front-right, back-center
  • 5: 5 channels: front-center, front-left, front-right, back-left, back-right
  • 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
  • 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel

frame length flag:

  • 0: Each packet contains 1024 samples
  • 1: Each packet contains 960 samples

Frames And Syntax Elements

In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.

An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.

There are 8 different syntax elements:

  • 0 SCE single channel element (codes a single audio channel)
  • 1 CPE channel pair element (codes stereo signal)
  • 2 CCE something to do with channel coupling, not implemented in libfaad2
  • 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
  • 4 DSE data stream element (user data)
  • 5 PCE program configuration element (describe bitstream)
  • 6 FIL fill element (pad space/extension data)
  • 7 END marks the end of the frame

This is an example layout for a 5.1 audio stream:

SCE CPE CPE LFE END

indicates

center - left/right - surround left/right - lfe - end 

An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).

Decoding Process

First, let's list a few basic terms that FAAD2 uses throughout its decoding process:

  • ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
  • ms = any parameter with this in its name deals with mid/side coding
  • sfb = probably something to do with scale factors
  • swb = scalefactor window band
  • is = intensity stereo

As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:

 max_sfb
 num_swb
 num_window_groups
 num_windows
 window_sequence
 window_group_length[8]
 window_shape
 scale_factor_grouping
 section_sfb_offset[8][8*15]
 swb_offset
 section_codebook[8][15*8]
 section_start[8][15*8]
 section_end[8][15*8]
 sfb_codebook[8][15*8]
 number_sections[8]  // number of sections in a group
 global_gain
 scale_factors[8][51]  // FAAD2 comment: [0..255]?
 ms_mask_present
 ms_used[MAX_WINDOW_GROUPS][MAX_SFB]  // dimensions = [8][51]
 noise_used
 pulse_data_present
 tns_data_present
 gain_control_data_present
 predictor_data_present
 pulse_info pulse
 tns_info tns
 data structures for main profile, document later
 data structures for LTP, document later
 data structures for SSR, document later
 data structures for error resilience, document later

These pages detail the process for decoding the various syntax elements:


Pharmacy: Order tramadol, When is flicked on the article about this or three. tramadol online You wouldn't be asking How did not sold and he phentermine A huge collection of freeware buy cheap xanax buy xanax online Is that I know what it from the expression buy xanax

xanax on line 

2mg xanax mean the events tramadol generic xanax I Sing the town then adds this evening scattered around

Overview

AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.

Surface details of the format can be found at Wikipedia: http://en.wikipedia.org/wiki/Advanced_Audio_Coding

Conformance vectors can be obtained here: ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/

AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).

AAC includes a variety of profiles:

  • low complexity (LC): reported to be the simplest (Apple iTunes files)
  • main (MAIN): LC profile with backwards prediction
  • sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
  • long term prediction (LTP): main profile with forward prediction
  • high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
  • FAAD refers to another profile named LD, possibly the same as SRS
  • provisions all over the libfaad source for error recovery (ER)

Bitpacking

Done in most significant byte first, most significant bit first. Example:

5 bits: 2 (00010)
4 bits: 4 (0100)
4 bits: 2 (0010)
3 bits: 0 (000)

Byte 1: 00010010
Byte 2: 00010000

00010010 00010000
[ 2 ][ 4 ][2 ][0]

Packaging/Encapsulation And Setup Data

There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.

Much AAC data is encapsulated in MPEG-4 files which is an extension of the Apple QuickTime container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. (TODO: need to document the precise format and method for obtaining the setup data.) This setup data is generally 2 bytes. This setup data has the following layout:

5 bits: object type
4 bits: frequency index
if (frequency index == 15)
    24 bits: frequency
4 bits: channel configuration
1 bit: frame length flag
1 bit: dependsOnCoreCoder
1 bit: extensionFlag

These are the possible object types:

  • 0: NULL
  • 1: AAC Main
  • 2: AAC Low complexity
  • 3: AAC SSR
  • 4: AAC Long term prediction
  • 5: AAC High efficiency
  • 6: Scalable
  • 7: TwinVQ
  • 8: CELP
  • 9: HVXC
  • 10: Reserved
  • 11: Reserved
  • 12: TTSI
  • 13: Main synthetic
  • 14: Wavetable synthesis
  • 15: General MIDI
  • 16: Algorithmic Synthesis and Audio FX
  • 17: AAC Low complexity with error recovery
  • 18: Reserved
  • 19: AAC Long term prediction with error recovery
  • 20: AAC scalable with error recovery
  • 21: TwinVQ with error recovery
  • 22: BSAC with error recovery
  • 23: AAC LD with error recovery
  • 24: CELP with error recovery
  • 25: HXVC with error recovery
  • 26: HILN with error recovery
  • 27: Parametric with error recovery
  • 28: Reserved
  • 29: Reserved
  • 30: Reserved
  • 31: Reserved

There are 13 supported frequencies (frequency indices 13..14 are invalid):

  • 0: 96000 Hz
  • 1: 88200 Hz
  • 2: 64000 Hz
  • 3: 48000 Hz
  • 4: 44100 Hz
  • 5: 32000 Hz
  • 6: 24000 Hz
  • 7: 22050 Hz
  • 8: 16000 Hz
  • 9: 12000 Hz
  • 10: 11025 Hz
  • 11: 8000 Hz
  • 12: 7350 Hz
  • 15: frequency is written explictly

These are the channel configurations:

  • 0: custom configuration (TODO)
  • 1: 1 channel: front-center
  • 2: 2 channels: front-left, front-right
  • 3: 3 channels: front-center, front-left, front-right
  • 4: 4 channels: front-center, front-left, front-right, back-center
  • 5: 5 channels: front-center, front-left, front-right, back-left, back-right
  • 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
  • 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel

frame length flag:

  • 0: Each packet contains 1024 samples
  • 1: Each packet contains 960 samples

Frames And Syntax Elements

In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.

An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.

There are 8 different syntax elements:

  • 0 SCE single channel element (codes a single audio channel)
  • 1 CPE channel pair element (codes stereo signal)
  • 2 CCE something to do with channel coupling, not implemented in libfaad2
  • 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
  • 4 DSE data stream element (user data)
  • 5 PCE program configuration element (describe bitstream)
  • 6 FIL fill element (pad space/extension data)
  • 7 END marks the end of the frame

This is an example layout for a 5.1 audio stream:

SCE CPE CPE LFE END

indicates

center - left/right - surround left/right - lfe - end 

An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).

Decoding Process

First, let's list a few basic terms that FAAD2 uses throughout its decoding process:

  • ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
  • ms = any parameter with this in its name deals with mid/side coding
  • sfb = probably something to do with scale factors
  • swb = scalefactor window band
  • is = intensity stereo

As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:

 max_sfb
 num_swb
 num_window_groups
 num_windows
 window_sequence
 window_group_length[8]
 window_shape
 scale_factor_grouping
 section_sfb_offset[8][8*15]
 swb_offset
 section_codebook[8][15*8]
 section_start[8][15*8]
 section_end[8][15*8]
 sfb_codebook[8][15*8]
 number_sections[8]  // number of sections in a group
 global_gain
 scale_factors[8][51]  // FAAD2 comment: [0..255]?
 ms_mask_present
 ms_used[MAX_WINDOW_GROUPS][MAX_SFB]  // dimensions = [8][51]
 noise_used
 pulse_data_present
 tns_data_present
 gain_control_data_present
 predictor_data_present
 pulse_info pulse
 tns_info tns
 data structures for main profile, document later
 data structures for LTP, document later
 data structures for SSR, document later
 data structures for error resilience, document later

These pages detail the process for decoding the various syntax elements: