|
|
(8 intermediate revisions by 7 users not shown) |
Line 20: |
Line 20: |
| * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3 | | * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3 |
| * long term prediction (LTP): main profile with forward prediction | | * long term prediction (LTP): main profile with forward prediction |
| * high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo | | * high efficiency (HE, HE-AAC, aacPlus): uses [http://www.codingtechnologies.com/products/sbr.htm Spectral Band Replication (SBR)] and may use [http://www.codingtechnologies.com/products/paraSter.htm Parametric Stereo (PS)] |
| | ** [http://www.codingtechnologies.com/products/aacPlus.htm aacPlus] (a.k.a. AAC+) decoder Note: aacPlus v1 is AAC + [http://www.codingtechnologies.com/products/sbr.htm SBR], aacPlus v2 is AAC + [http://www.codingtechnologies.com/products/sbr.htm SBR] + [http://www.codingtechnologies.com/products/paraSter.htm PS]. |
| * FAAD refers to another profile named LD, possibly the same as SRS | | * FAAD refers to another profile named LD, possibly the same as SRS |
| * provisions all over the libfaad source for error recovery (ER) | | * provisions all over the libfaad source for error recovery (ER) |
| == Bitpacking ==
| |
| Done in most significant byte first, most significant bit first. Example:
| |
| 5 bits: 2 (00010)
| |
| 4 bits: 4 (0100)
| |
| 4 bits: 2 (0010)
| |
| 3 bits: 0 (000)
| |
|
| |
| Byte 1: 00010010
| |
| Byte 2: 00010000
| |
|
| |
| 00010010 00010000
| |
| [ 2 ][ 4 ][2 ][0]
| |
|
| |
| == Packaging/Encapsulation And Setup Data==
| |
| There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
| |
|
| |
| Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
| |
| 5 bits: object type
| |
| 4 bits: frequency index
| |
| if (frequency index == 15)
| |
| 24 bits: frequency
| |
| 4 bits: channel configuration
| |
| 1 bit: frame length flag
| |
| 1 bit: dependsOnCoreCoder
| |
| 1 bit: extensionFlag
| |
| These are the possible object types:
| |
| * 0: NULL
| |
| * 1: AAC Main
| |
| * 2: AAC Low complexity
| |
| * 3: AAC SSR
| |
| * 4: AAC Long term prediction
| |
| * 5: AAC High efficiency
| |
| * 6: Scalable
| |
| * 7: [[TwinVQ]]
| |
| * 8: CELP
| |
| * 9: HVXC
| |
| * 10: Reserved
| |
| * 11: Reserved
| |
| * 12: TTSI
| |
| * 13: Main synthetic
| |
| * 14: Wavetable synthesis
| |
| * 15: General MIDI
| |
| * 16: Algorithmic Synthesis and Audio FX
| |
| * 17: AAC Low complexity with error recovery
| |
| * 18: Reserved
| |
| * 19: AAC Long term prediction with error recovery
| |
| * 20: AAC scalable with error recovery
| |
| * 21: TwinVQ with error recovery
| |
| * 22: BSAC with error recovery
| |
| * 23: AAC LD with error recovery
| |
| * 24: CELP with error recovery
| |
| * 25: HXVC with error recovery
| |
| * 26: HILN with error recovery
| |
| * 27: Parametric with error recovery
| |
| * 28: Reserved
| |
| * 29: Reserved
| |
| * 30: Reserved
| |
| * 31: Reserved
| |
| There are 13 supported frequencies (frequency indices 13..14 are invalid):
| |
| * 0: 96000 Hz
| |
| * 1: 88200 Hz
| |
| * 2: 64000 Hz
| |
| * 3: 48000 Hz
| |
| * 4: 44100 Hz
| |
| * 5: 32000 Hz
| |
| * 6: 24000 Hz
| |
| * 7: 22050 Hz
| |
| * 8: 16000 Hz
| |
| * 9: 12000 Hz
| |
| * 10: 11025 Hz
| |
| * 11: 8000 Hz
| |
| * 12: 7350 Hz
| |
| * 15: frequency is written explictly
| |
| These are the channel configurations:
| |
| * 0: custom configuration '''(TODO)'''
| |
| * 1: 1 channel: front-center
| |
| * 2: 2 channels: front-left, front-right
| |
| * 3: 3 channels: front-center, front-left, front-right
| |
| * 4: 4 channels: front-center, front-left, front-right, back-center
| |
| * 5: 5 channels: front-center, front-left, front-right, back-left, back-right
| |
| * 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
| |
| * 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
| |
| frame length flag:
| |
| * 0: Each packet contains 1024 samples
| |
| * 1: Each packet contains 960 samples
| |
|
| |
| == Frames And Syntax Elements ==
| |
| In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
| |
|
| |
| An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
| |
|
| |
| There are 8 different syntax elements:
| |
| * 0 SCE single channel element (codes a single audio channel)
| |
| * 1 CPE channel pair element (codes stereo signal)
| |
| * 2 CCE something to do with channel coupling, not implemented in libfaad2
| |
| * 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
| |
| * 4 DSE data stream element (user data)
| |
| * 5 PCE program configuration element (describe bitstream)
| |
| * 6 FIL fill element (pad space/extension data)
| |
| * 7 END marks the end of the frame
| |
| This is an example layout for a 5.1 audio stream:
| |
| SCE CPE CPE LFE END
| |
| indicates
| |
| center - left/right - surround left/right - lfe - end
| |
| An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
| |
|
| |
| == Decoding Process ==
| |
| First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
| |
|
| |
| * ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
| |
| * ms = any parameter with this in its name deals with mid/side coding
| |
| * sfb = probably something to do with scale factors
| |
| * swb = scalefactor window band
| |
| * is = intensity stereo
| |
|
| |
| As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
| |
|
| |
| max_sfb
| |
| num_swb
| |
| num_window_groups
| |
| num_windows
| |
| window_sequence
| |
| window_group_length[8]
| |
| window_shape
| |
| scale_factor_grouping
| |
| section_sfb_offset[8][8*15]
| |
| swb_offset
| |
| section_codebook[8][15*8]
| |
| section_start[8][15*8]
| |
| section_end[8][15*8]
| |
| sfb_codebook[8][15*8]
| |
| number_sections[8] ''// number of sections in a group''
| |
| global_gain
| |
| scale_factors[8][51] ''// FAAD2 comment: [0..255]?''
| |
| ms_mask_present
| |
| ms_used[MAX_WINDOW_GROUPS][MAX_SFB] ''// dimensions = [8][51]''
| |
| noise_used
| |
| pulse_data_present
| |
| tns_data_present
| |
| gain_control_data_present
| |
| predictor_data_present
| |
| pulse_info pulse
| |
| tns_info tns
| |
| ''data structures for main profile, document later''
| |
| ''data structures for LTP, document later''
| |
| ''data structures for SSR, document later''
| |
| ''data structures for error resilience, document later''
| |
|
| |
| These pages detail the process for decoding the various syntax elements:
| |
|
| |
| * [[Decoding AAC SCE and LFE]]
| |
| * [[Decoding AAC CPE]]
| |
| ** [[Reconstructing AAC CPE]]
| |
| * Decoding AAC CCE
| |
| * Decoding AAC DSE
| |
| * Decoding AAC PCE
| |
| * [[Decoding AAC FIL]]
| |
| * [[Decoding AAC END]]
| |
|
| |
|
| |
|
| |
|
| |
| == Overview ==
| |
| AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
| |
|
| |
| Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
| |
|
| |
| Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
| |
|
| |
| AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
| |
|
| |
| AAC includes a variety of profiles:
| |
| * low complexity (LC): reported to be the simplest (Apple iTunes files)
| |
| * main (MAIN): LC profile with backwards prediction
| |
| * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
| |
| * long term prediction (LTP): main profile with forward prediction
| |
| * high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
| |
| * FAAD refers to another profile named LD, possibly the same as SRS
| |
| * provisions all over the libfaad source for error recovery (ER)
| |
| == Bitpacking ==
| |
| Done in most significant byte first, most significant bit first. Example:
| |
| 5 bits: 2 (00010)
| |
| 4 bits: 4 (0100)
| |
| 4 bits: 2 (0010)
| |
| 3 bits: 0 (000)
| |
|
| |
| Byte 1: 00010010
| |
| Byte 2: 00010000
| |
|
| |
| 00010010 00010000
| |
| [ 2 ][ 4 ][2 ][0]
| |
|
| |
| == Packaging/Encapsulation And Setup Data==
| |
| There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
| |
|
| |
| Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
| |
| 5 bits: object type
| |
| 4 bits: frequency index
| |
| if (frequency index == 15)
| |
| 24 bits: frequency
| |
| 4 bits: channel configuration
| |
| 1 bit: frame length flag
| |
| 1 bit: dependsOnCoreCoder
| |
| 1 bit: extensionFlag
| |
| These are the possible object types:
| |
| * 0: NULL
| |
| * 1: AAC Main
| |
| * 2: AAC Low complexity
| |
| * 3: AAC SSR
| |
| * 4: AAC Long term prediction
| |
| * 5: AAC High efficiency
| |
| * 6: Scalable
| |
| * 7: [[TwinVQ]]
| |
| * 8: CELP
| |
| * 9: HVXC
| |
| * 10: Reserved
| |
| * 11: Reserved
| |
| * 12: TTSI
| |
| * 13: Main synthetic
| |
| * 14: Wavetable synthesis
| |
| * 15: General MIDI
| |
| * 16: Algorithmic Synthesis and Audio FX
| |
| * 17: AAC Low complexity with error recovery
| |
| * 18: Reserved
| |
| * 19: AAC Long term prediction with error recovery
| |
| * 20: AAC scalable with error recovery
| |
| * 21: TwinVQ with error recovery
| |
| * 22: BSAC with error recovery
| |
| * 23: AAC LD with error recovery
| |
| * 24: CELP with error recovery
| |
| * 25: HXVC with error recovery
| |
| * 26: HILN with error recovery
| |
| * 27: Parametric with error recovery
| |
| * 28: Reserved
| |
| * 29: Reserved
| |
| * 30: Reserved
| |
| * 31: Reserved
| |
| There are 13 supported frequencies (frequency indices 13..14 are invalid):
| |
| * 0: 96000 Hz
| |
| * 1: 88200 Hz
| |
| * 2: 64000 Hz
| |
| * 3: 48000 Hz
| |
| * 4: 44100 Hz
| |
| * 5: 32000 Hz
| |
| * 6: 24000 Hz
| |
| * 7: 22050 Hz
| |
| * 8: 16000 Hz
| |
| * 9: 12000 Hz
| |
| * 10: 11025 Hz
| |
| * 11: 8000 Hz
| |
| * 12: 7350 Hz
| |
| * 15: frequency is written explictly
| |
| These are the channel configurations:
| |
| * 0: custom configuration '''(TODO)'''
| |
| * 1: 1 channel: front-center
| |
| * 2: 2 channels: front-left, front-right
| |
| * 3: 3 channels: front-center, front-left, front-right
| |
| * 4: 4 channels: front-center, front-left, front-right, back-center
| |
| * 5: 5 channels: front-center, front-left, front-right, back-left, back-right
| |
| * 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
| |
| * 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
| |
| frame length flag:
| |
| * 0: Each packet contains 1024 samples
| |
| * 1: Each packet contains 960 samples
| |
|
| |
| == Frames And Syntax Elements ==
| |
| In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
| |
|
| |
| An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
| |
|
| |
| There are 8 different syntax elements:
| |
| * 0 SCE single channel element (codes a single audio channel)
| |
| * 1 CPE channel pair element (codes stereo signal)
| |
| * 2 CCE something to do with channel coupling, not implemented in libfaad2
| |
| * 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
| |
| * 4 DSE data stream element (user data)
| |
| * 5 PCE program configuration element (describe bitstream)
| |
| * 6 FIL fill element (pad space/extension data)
| |
| * 7 END marks the end of the frame
| |
| This is an example layout for a 5.1 audio stream:
| |
| SCE CPE CPE LFE END
| |
| indicates
| |
| center - left/right - surround left/right - lfe - end
| |
| An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
| |
|
| |
| == Decoding Process ==
| |
| First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
| |
|
| |
| * ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
| |
| * ms = any parameter with this in its name deals with mid/side coding
| |
| * sfb = probably something to do with scale factors
| |
| * swb = scalefactor window band
| |
| * is = intensity stereo
| |
|
| |
| As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
| |
|
| |
| max_sfb
| |
| num_swb
| |
| num_window_groups
| |
| num_windows
| |
| window_sequence
| |
| window_group_length[8]
| |
| window_shape
| |
| scale_factor_grouping
| |
| section_sfb_offset[8][8*15]
| |
| swb_offset
| |
| section_codebook[8][15*8]
| |
| section_start[8][15*8]
| |
| section_end[8][15*8]
| |
| sfb_codebook[8][15*8]
| |
| number_sections[8] ''// number of sections in a group''
| |
| global_gain
| |
| scale_factors[8][51] ''// FAAD2 comment: [0..255]?''
| |
| ms_mask_present
| |
| ms_used[MAX_WINDOW_GROUPS][MAX_SFB] ''// dimensions = [8][51]''
| |
| noise_used
| |
| pulse_data_present
| |
| tns_data_present
| |
| gain_control_data_present
| |
| predictor_data_present
| |
| pulse_info pulse
| |
| tns_info tns
| |
| ''data structures for main profile, document later''
| |
| ''data structures for LTP, document later''
| |
| ''data structures for SSR, document later''
| |
| ''data structures for error resilience, document later''
| |
|
| |
| These pages detail the process for decoding the various syntax elements:
| |
|
| |
| * [[Decoding AAC SCE and LFE]]
| |
| * [[Decoding AAC CPE]]
| |
| ** [[Reconstructing AAC CPE]]
| |
| * Decoding AAC CCE
| |
| * Decoding AAC DSE
| |
| * Decoding AAC PCE
| |
| * [[Decoding AAC FIL]]
| |
| * [[Decoding AAC END]]
| |
|
| |
|
| |
|
| |
|
| |
| == Overview ==
| |
| AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
| |
|
| |
| Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
| |
|
| |
| Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
| |
|
| |
| AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
| |
|
| |
| AAC includes a variety of profiles:
| |
| * low complexity (LC): reported to be the simplest (Apple iTunes files)
| |
| * main (MAIN): LC profile with backwards prediction
| |
| * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
| |
| * long term prediction (LTP): main profile with forward prediction
| |
| * high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
| |
| * FAAD refers to another profile named LD, possibly the same as SRS
| |
| * provisions all over the libfaad source for error recovery (ER)
| |
| == Bitpacking ==
| |
| Done in most significant byte first, most significant bit first. Example:
| |
| 5 bits: 2 (00010)
| |
| 4 bits: 4 (0100)
| |
| 4 bits: 2 (0010)
| |
| 3 bits: 0 (000)
| |
|
| |
| Byte 1: 00010010
| |
| Byte 2: 00010000
| |
|
| |
| 00010010 00010000
| |
| [ 2 ][ 4 ][2 ][0]
| |
|
| |
| == Packaging/Encapsulation And Setup Data==
| |
| There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
| |
|
| |
| Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
| |
| 5 bits: object type
| |
| 4 bits: frequency index
| |
| if (frequency index == 15)
| |
| 24 bits: frequency
| |
| 4 bits: channel configuration
| |
| 1 bit: frame length flag
| |
| 1 bit: dependsOnCoreCoder
| |
| 1 bit: extensionFlag
| |
| These are the possible object types:
| |
| * 0: NULL
| |
| * 1: AAC Main
| |
| * 2: AAC Low complexity
| |
| * 3: AAC SSR
| |
| * 4: AAC Long term prediction
| |
| * 5: AAC High efficiency
| |
| * 6: Scalable
| |
| * 7: [[TwinVQ]]
| |
| * 8: CELP
| |
| * 9: HVXC
| |
| * 10: Reserved
| |
| * 11: Reserved
| |
| * 12: TTSI
| |
| * 13: Main synthetic
| |
| * 14: Wavetable synthesis
| |
| * 15: General MIDI
| |
| * 16: Algorithmic Synthesis and Audio FX
| |
| * 17: AAC Low complexity with error recovery
| |
| * 18: Reserved
| |
| * 19: AAC Long term prediction with error recovery
| |
| * 20: AAC scalable with error recovery
| |
| * 21: TwinVQ with error recovery
| |
| * 22: BSAC with error recovery
| |
| * 23: AAC LD with error recovery
| |
| * 24: CELP with error recovery
| |
| * 25: HXVC with error recovery
| |
| * 26: HILN with error recovery
| |
| * 27: Parametric with error recovery
| |
| * 28: Reserved
| |
| * 29: Reserved
| |
| * 30: Reserved
| |
| * 31: Reserved
| |
| There are 13 supported frequencies (frequency indices 13..14 are invalid):
| |
| * 0: 96000 Hz
| |
| * 1: 88200 Hz
| |
| * 2: 64000 Hz
| |
| * 3: 48000 Hz
| |
| * 4: 44100 Hz
| |
| * 5: 32000 Hz
| |
| * 6: 24000 Hz
| |
| * 7: 22050 Hz
| |
| * 8: 16000 Hz
| |
| * 9: 12000 Hz
| |
| * 10: 11025 Hz
| |
| * 11: 8000 Hz
| |
| * 12: 7350 Hz
| |
| * 15: frequency is written explictly
| |
| These are the channel configurations:
| |
| * 0: custom configuration '''(TODO)'''
| |
| * 1: 1 channel: front-center
| |
| * 2: 2 channels: front-left, front-right
| |
| * 3: 3 channels: front-center, front-left, front-right
| |
| * 4: 4 channels: front-center, front-left, front-right, back-center
| |
| * 5: 5 channels: front-center, front-left, front-right, back-left, back-right
| |
| * 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
| |
| * 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
| |
| frame length flag:
| |
| * 0: Each packet contains 1024 samples
| |
| * 1: Each packet contains 960 samples
| |
|
| |
| == Frames And Syntax Elements ==
| |
| In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
| |
|
| |
| An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
| |
|
| |
| There are 8 different syntax elements:
| |
| * 0 SCE single channel element (codes a single audio channel)
| |
| * 1 CPE channel pair element (codes stereo signal)
| |
| * 2 CCE something to do with channel coupling, not implemented in libfaad2
| |
| * 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
| |
| * 4 DSE data stream element (user data)
| |
| * 5 PCE program configuration element (describe bitstream)
| |
| * 6 FIL fill element (pad space/extension data)
| |
| * 7 END marks the end of the frame
| |
| This is an example layout for a 5.1 audio stream:
| |
| SCE CPE CPE LFE END
| |
| indicates
| |
| center - left/right - surround left/right - lfe - end
| |
| An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
| |
|
| |
| == Decoding Process ==
| |
| First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
| |
|
| |
| * ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
| |
| * ms = any parameter with this in its name deals with mid/side coding
| |
| * sfb = probably something to do with scale factors
| |
| * swb = scalefactor window band
| |
| * is = intensity stereo
| |
|
| |
| As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
| |
|
| |
| max_sfb
| |
| num_swb
| |
| num_window_groups
| |
| num_windows
| |
| window_sequence
| |
| window_group_length[8]
| |
| window_shape
| |
| scale_factor_grouping
| |
| section_sfb_offset[8][8*15]
| |
| swb_offset
| |
| section_codebook[8][15*8]
| |
| section_start[8][15*8]
| |
| section_end[8][15*8]
| |
| sfb_codebook[8][15*8]
| |
| number_sections[8] ''// number of sections in a group''
| |
| global_gain
| |
| scale_factors[8][51] ''// FAAD2 comment: [0..255]?''
| |
| ms_mask_present
| |
| ms_used[MAX_WINDOW_GROUPS][MAX_SFB] ''// dimensions = [8][51]''
| |
| noise_used
| |
| pulse_data_present
| |
| tns_data_present
| |
| gain_control_data_present
| |
| predictor_data_present
| |
| pulse_info pulse
| |
| tns_info tns
| |
| ''data structures for main profile, document later''
| |
| ''data structures for LTP, document later''
| |
| ''data structures for SSR, document later''
| |
| ''data structures for error resilience, document later''
| |
|
| |
|
| These pages detail the process for decoding the various syntax elements:
| |
|
| |
| * [[Decoding AAC SCE and LFE]]
| |
| * [[Decoding AAC CPE]]
| |
| ** [[Reconstructing AAC CPE]]
| |
| * Decoding AAC CCE
| |
| * Decoding AAC DSE
| |
| * Decoding AAC PCE
| |
| * [[Decoding AAC FIL]]
| |
| * [[Decoding AAC END]]
| |
|
| |
|
| |
|
| |
|
| |
| == Overview ==
| |
| AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
| |
|
| |
| Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
| |
|
| |
| Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
| |
|
| |
| AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
| |
|
| |
| AAC includes a variety of profiles:
| |
| * low complexity (LC): reported to be the simplest (Apple iTunes files)
| |
| * main (MAIN): LC profile with backwards prediction
| |
| * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
| |
| * long term prediction (LTP): main profile with forward prediction
| |
| * high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
| |
| * FAAD refers to another profile named LD, possibly the same as SRS
| |
| * provisions all over the libfaad source for error recovery (ER)
| |
| == Bitpacking == | | == Bitpacking == |
| Done in most significant byte first, most significant bit first. Example: | | Done in most significant byte first, most significant bit first. Example: |
Line 576: |
Line 41: |
| There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging. | | There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging. |
|
| |
|
| Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout: | | Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[QuickTime container|QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout: |
| 5 bits: object type | | 5 bits: object type |
| 4 bits: frequency index | | 4 bits: frequency index |
Line 585: |
Line 50: |
| 1 bit: dependsOnCoreCoder | | 1 bit: dependsOnCoreCoder |
| 1 bit: extensionFlag | | 1 bit: extensionFlag |
| These are the possible object types:
| |
| * 0: NULL
| |
| * 1: AAC Main
| |
| * 2: AAC Low complexity
| |
| * 3: AAC SSR
| |
| * 4: AAC Long term prediction
| |
| * 5: AAC High efficiency
| |
| * 6: Scalable
| |
| * 7: [[TwinVQ]]
| |
| * 8: CELP
| |
| * 9: HVXC
| |
| * 10: Reserved
| |
| * 11: Reserved
| |
| * 12: TTSI
| |
| * 13: Main synthetic
| |
| * 14: Wavetable synthesis
| |
| * 15: General MIDI
| |
| * 16: Algorithmic Synthesis and Audio FX
| |
| * 17: AAC Low complexity with error recovery
| |
| * 18: Reserved
| |
| * 19: AAC Long term prediction with error recovery
| |
| * 20: AAC scalable with error recovery
| |
| * 21: TwinVQ with error recovery
| |
| * 22: BSAC with error recovery
| |
| * 23: AAC LD with error recovery
| |
| * 24: CELP with error recovery
| |
| * 25: HXVC with error recovery
| |
| * 26: HILN with error recovery
| |
| * 27: Parametric with error recovery
| |
| * 28: Reserved
| |
| * 29: Reserved
| |
| * 30: Reserved
| |
| * 31: Reserved
| |
| There are 13 supported frequencies (frequency indices 13..14 are invalid):
| |
| * 0: 96000 Hz
| |
| * 1: 88200 Hz
| |
| * 2: 64000 Hz
| |
| * 3: 48000 Hz
| |
| * 4: 44100 Hz
| |
| * 5: 32000 Hz
| |
| * 6: 24000 Hz
| |
| * 7: 22050 Hz
| |
| * 8: 16000 Hz
| |
| * 9: 12000 Hz
| |
| * 10: 11025 Hz
| |
| * 11: 8000 Hz
| |
| * 12: 7350 Hz
| |
| * 15: frequency is written explictly
| |
| These are the channel configurations:
| |
| * 0: custom configuration '''(TODO)'''
| |
| * 1: 1 channel: front-center
| |
| * 2: 2 channels: front-left, front-right
| |
| * 3: 3 channels: front-center, front-left, front-right
| |
| * 4: 4 channels: front-center, front-left, front-right, back-center
| |
| * 5: 5 channels: front-center, front-left, front-right, back-left, back-right
| |
| * 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
| |
| * 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
| |
| frame length flag:
| |
| * 0: Each packet contains 1024 samples
| |
| * 1: Each packet contains 960 samples
| |
|
| |
| == Frames And Syntax Elements ==
| |
| In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
| |
|
| |
|
| An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
| | Object type and sampling frequency (index) are described in detail in the [[MPEG-4 Audio]] article. |
|
| |
|
| There are 8 different syntax elements:
| |
| * 0 SCE single channel element (codes a single audio channel)
| |
| * 1 CPE channel pair element (codes stereo signal)
| |
| * 2 CCE something to do with channel coupling, not implemented in libfaad2
| |
| * 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
| |
| * 4 DSE data stream element (user data)
| |
| * 5 PCE program configuration element (describe bitstream)
| |
| * 6 FIL fill element (pad space/extension data)
| |
| * 7 END marks the end of the frame
| |
| This is an example layout for a 5.1 audio stream:
| |
| SCE CPE CPE LFE END
| |
| indicates
| |
| center - left/right - surround left/right - lfe - end
| |
| An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
| |
|
| |
| == Decoding Process ==
| |
| First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
| |
|
| |
| * ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
| |
| * ms = any parameter with this in its name deals with mid/side coding
| |
| * sfb = probably something to do with scale factors
| |
| * swb = scalefactor window band
| |
| * is = intensity stereo
| |
|
| |
| As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
| |
|
| |
| max_sfb
| |
| num_swb
| |
| num_window_groups
| |
| num_windows
| |
| window_sequence
| |
| window_group_length[8]
| |
| window_shape
| |
| scale_factor_grouping
| |
| section_sfb_offset[8][8*15]
| |
| swb_offset
| |
| section_codebook[8][15*8]
| |
| section_start[8][15*8]
| |
| section_end[8][15*8]
| |
| sfb_codebook[8][15*8]
| |
| number_sections[8] ''// number of sections in a group''
| |
| global_gain
| |
| scale_factors[8][51] ''// FAAD2 comment: [0..255]?''
| |
| ms_mask_present
| |
| ms_used[MAX_WINDOW_GROUPS][MAX_SFB] ''// dimensions = [8][51]''
| |
| noise_used
| |
| pulse_data_present
| |
| tns_data_present
| |
| gain_control_data_present
| |
| predictor_data_present
| |
| pulse_info pulse
| |
| tns_info tns
| |
| ''data structures for main profile, document later''
| |
| ''data structures for LTP, document later''
| |
| ''data structures for SSR, document later''
| |
| ''data structures for error resilience, document later''
| |
|
| |
| These pages detail the process for decoding the various syntax elements:
| |
|
| |
| * [[Decoding AAC SCE and LFE]]
| |
| * [[Decoding AAC CPE]]
| |
| ** [[Reconstructing AAC CPE]]
| |
| * Decoding AAC CCE
| |
| * Decoding AAC DSE
| |
| * Decoding AAC PCE
| |
| * [[Decoding AAC FIL]]
| |
| * [[Decoding AAC END]]
| |
|
| |
|
| |
| <div id="nolabel" style="overflow:auto;height:1px;">
| |
| Pharmacy:
| |
| Order tramadol, When is flicked on the article about this or three. [http://www.zorpia.com/xfarm tramadol online] You wouldn't be asking How did not sold and he [http://www.geocities.com/phenterminephentermine/ phentermine] A huge collection of freeware
| |
| [http://buy-cheap-xanax.umaxnet.com/ buy cheap xanax]
| |
| [http://buy-xanax-online.umaxnet.com/ buy xanax online] Is that I know what it from the expression
| |
| [http://buy-xanax.umaxnet.com/ buy xanax]
| |
| [http://xanax-on-line.umaxnet.com/ xanax on line]
| |
| [http://2mg-xanax.umaxnet.com/ 2mg xanax] mean the events tramadol [http://generic-xanax.umaxnet.com/ generic xanax] I Sing the town then adds this evening scattered around
| |
| </div>
| |
|
| |
| == Overview ==
| |
| AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
| |
|
| |
| Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
| |
|
| |
| Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
| |
|
| |
| AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
| |
|
| |
| AAC includes a variety of profiles:
| |
| * low complexity (LC): reported to be the simplest (Apple iTunes files)
| |
| * main (MAIN): LC profile with backwards prediction
| |
| * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
| |
| * long term prediction (LTP): main profile with forward prediction
| |
| * high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
| |
| * FAAD refers to another profile named LD, possibly the same as SRS
| |
| * provisions all over the libfaad source for error recovery (ER)
| |
| == Bitpacking ==
| |
| Done in most significant byte first, most significant bit first. Example:
| |
| 5 bits: 2 (00010)
| |
| 4 bits: 4 (0100)
| |
| 4 bits: 2 (0010)
| |
| 3 bits: 0 (000)
| |
|
| |
| Byte 1: 00010010
| |
| Byte 2: 00010000
| |
|
| |
| 00010010 00010000
| |
| [ 2 ][ 4 ][2 ][0]
| |
|
| |
| == Packaging/Encapsulation And Setup Data==
| |
| There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
| |
|
| |
| Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
| |
| 5 bits: object type
| |
| 4 bits: frequency index
| |
| if (frequency index == 15)
| |
| 24 bits: frequency
| |
| 4 bits: channel configuration
| |
| 1 bit: frame length flag
| |
| 1 bit: dependsOnCoreCoder
| |
| 1 bit: extensionFlag
| |
| These are the possible object types:
| |
| * 0: NULL
| |
| * 1: AAC Main
| |
| * 2: AAC Low complexity
| |
| * 3: AAC SSR
| |
| * 4: AAC Long term prediction
| |
| * 5: AAC High efficiency
| |
| * 6: Scalable
| |
| * 7: [[TwinVQ]]
| |
| * 8: CELP
| |
| * 9: HVXC
| |
| * 10: Reserved
| |
| * 11: Reserved
| |
| * 12: TTSI
| |
| * 13: Main synthetic
| |
| * 14: Wavetable synthesis
| |
| * 15: General MIDI
| |
| * 16: Algorithmic Synthesis and Audio FX
| |
| * 17: AAC Low complexity with error recovery
| |
| * 18: Reserved
| |
| * 19: AAC Long term prediction with error recovery
| |
| * 20: AAC scalable with error recovery
| |
| * 21: TwinVQ with error recovery
| |
| * 22: BSAC with error recovery
| |
| * 23: AAC LD with error recovery
| |
| * 24: CELP with error recovery
| |
| * 25: HXVC with error recovery
| |
| * 26: HILN with error recovery
| |
| * 27: Parametric with error recovery
| |
| * 28: Reserved
| |
| * 29: Reserved
| |
| * 30: Reserved
| |
| * 31: Reserved
| |
| There are 13 supported frequencies (frequency indices 13..14 are invalid):
| |
| * 0: 96000 Hz
| |
| * 1: 88200 Hz
| |
| * 2: 64000 Hz
| |
| * 3: 48000 Hz
| |
| * 4: 44100 Hz
| |
| * 5: 32000 Hz
| |
| * 6: 24000 Hz
| |
| * 7: 22050 Hz
| |
| * 8: 16000 Hz
| |
| * 9: 12000 Hz
| |
| * 10: 11025 Hz
| |
| * 11: 8000 Hz
| |
| * 12: 7350 Hz
| |
| * 15: frequency is written explictly
| |
| These are the channel configurations:
| |
| * 0: custom configuration '''(TODO)'''
| |
| * 1: 1 channel: front-center
| |
| * 2: 2 channels: front-left, front-right
| |
| * 3: 3 channels: front-center, front-left, front-right
| |
| * 4: 4 channels: front-center, front-left, front-right, back-center
| |
| * 5: 5 channels: front-center, front-left, front-right, back-left, back-right
| |
| * 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
| |
| * 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
| |
| frame length flag:
| |
| * 0: Each packet contains 1024 samples
| |
| * 1: Each packet contains 960 samples
| |
|
| |
| == Frames And Syntax Elements ==
| |
| In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
| |
|
| |
| An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
| |
|
| |
| There are 8 different syntax elements:
| |
| * 0 SCE single channel element (codes a single audio channel)
| |
| * 1 CPE channel pair element (codes stereo signal)
| |
| * 2 CCE something to do with channel coupling, not implemented in libfaad2
| |
| * 3 LFE low-frequency effects? referenced as &quot;special effects&quot; in RTP doc
| |
| * 4 DSE data stream element (user data)
| |
| * 5 PCE program configuration element (describe bitstream)
| |
| * 6 FIL fill element (pad space/extension data)
| |
| * 7 END marks the end of the frame
| |
| This is an example layout for a 5.1 audio stream:
| |
| SCE CPE CPE LFE END
| |
| indicates
| |
| center - left/right - surround left/right - lfe - end
| |
| An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
| |
|
| |
| == Decoding Process ==
| |
| First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
| |
|
| |
| * ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
| |
| * ms = any parameter with this in its name deals with mid/side coding
| |
| * sfb = probably something to do with scale factors
| |
| * swb = scalefactor window band
| |
| * is = intensity stereo
| |
|
| |
| As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
| |
|
| |
| max_sfb
| |
| num_swb
| |
| num_window_groups
| |
| num_windows
| |
| window_sequence
| |
| window_group_length[8]
| |
| window_shape
| |
| scale_factor_grouping
| |
| section_sfb_offset[8][8*15]
| |
| swb_offset
| |
| section_codebook[8][15*8]
| |
| section_start[8][15*8]
| |
| section_end[8][15*8]
| |
| sfb_codebook[8][15*8]
| |
| number_sections[8] ''// number of sections in a group''
| |
| global_gain
| |
| scale_factors[8][51] ''// FAAD2 comment: [0..255]?''
| |
| ms_mask_present
| |
| ms_used[MAX_WINDOW_GROUPS][MAX_SFB] ''// dimensions = [8][51]''
| |
| noise_used
| |
| pulse_data_present
| |
| tns_data_present
| |
| gain_control_data_present
| |
| predictor_data_present
| |
| pulse_info pulse
| |
| tns_info tns
| |
| ''data structures for main profile, document later''
| |
| ''data structures for LTP, document later''
| |
| ''data structures for SSR, document later''
| |
| ''data structures for error resilience, document later''
| |
|
| |
| These pages detail the process for decoding the various syntax elements:
| |
|
| |
| * [[Decoding AAC SCE and LFE]]
| |
| * [[Decoding AAC CPE]]
| |
| ** [[Reconstructing AAC CPE]]
| |
| * Decoding AAC CCE
| |
| * Decoding AAC DSE
| |
| * Decoding AAC PCE
| |
| * [[Decoding AAC FIL]]
| |
| * [[Decoding AAC END]]
| |
|
| |
|
| |
|
| |
|
| |
| == Overview ==
| |
| AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
| |
|
| |
| Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
| |
|
| |
| Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
| |
|
| |
| AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
| |
|
| |
| AAC includes a variety of profiles:
| |
| * low complexity (LC): reported to be the simplest (Apple iTunes files)
| |
| * main (MAIN): LC profile with backwards prediction
| |
| * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
| |
| * long term prediction (LTP): main profile with forward prediction
| |
| * high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
| |
| * FAAD refers to another profile named LD, possibly the same as SRS
| |
| * provisions all over the libfaad source for error recovery (ER)
| |
| == Bitpacking ==
| |
| Done in most significant byte first, most significant bit first. Example:
| |
| 5 bits: 2 (00010)
| |
| 4 bits: 4 (0100)
| |
| 4 bits: 2 (0010)
| |
| 3 bits: 0 (000)
| |
|
| |
| Byte 1: 00010010
| |
| Byte 2: 00010000
| |
|
| |
| 00010010 00010000
| |
| [ 2 ][ 4 ][2 ][0]
| |
|
| |
| == Packaging/Encapsulation And Setup Data==
| |
| There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
| |
|
| |
| Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
| |
| 5 bits: object type
| |
| 4 bits: frequency index
| |
| if (frequency index == 15)
| |
| 24 bits: frequency
| |
| 4 bits: channel configuration
| |
| 1 bit: frame length flag
| |
| 1 bit: dependsOnCoreCoder
| |
| 1 bit: extensionFlag
| |
| These are the possible object types:
| |
| * 0: NULL
| |
| * 1: AAC Main
| |
| * 2: AAC Low complexity
| |
| * 3: AAC SSR
| |
| * 4: AAC Long term prediction
| |
| * 5: AAC High efficiency
| |
| * 6: Scalable
| |
| * 7: [[TwinVQ]]
| |
| * 8: CELP
| |
| * 9: HVXC
| |
| * 10: Reserved
| |
| * 11: Reserved
| |
| * 12: TTSI
| |
| * 13: Main synthetic
| |
| * 14: Wavetable synthesis
| |
| * 15: General MIDI
| |
| * 16: Algorithmic Synthesis and Audio FX
| |
| * 17: AAC Low complexity with error recovery
| |
| * 18: Reserved
| |
| * 19: AAC Long term prediction with error recovery
| |
| * 20: AAC scalable with error recovery
| |
| * 21: TwinVQ with error recovery
| |
| * 22: BSAC with error recovery
| |
| * 23: AAC LD with error recovery
| |
| * 24: CELP with error recovery
| |
| * 25: HXVC with error recovery
| |
| * 26: HILN with error recovery
| |
| * 27: Parametric with error recovery
| |
| * 28: Reserved
| |
| * 29: Reserved
| |
| * 30: Reserved
| |
| * 31: Reserved
| |
| There are 13 supported frequencies (frequency indices 13..14 are invalid):
| |
| * 0: 96000 Hz
| |
| * 1: 88200 Hz
| |
| * 2: 64000 Hz
| |
| * 3: 48000 Hz
| |
| * 4: 44100 Hz
| |
| * 5: 32000 Hz
| |
| * 6: 24000 Hz
| |
| * 7: 22050 Hz
| |
| * 8: 16000 Hz
| |
| * 9: 12000 Hz
| |
| * 10: 11025 Hz
| |
| * 11: 8000 Hz
| |
| * 12: 7350 Hz
| |
| * 15: frequency is written explictly
| |
| These are the channel configurations:
| |
| * 0: custom configuration '''(TODO)'''
| |
| * 1: 1 channel: front-center
| |
| * 2: 2 channels: front-left, front-right
| |
| * 3: 3 channels: front-center, front-left, front-right
| |
| * 4: 4 channels: front-center, front-left, front-right, back-center
| |
| * 5: 5 channels: front-center, front-left, front-right, back-left, back-right
| |
| * 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
| |
| * 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
| |
| frame length flag:
| |
| * 0: Each packet contains 1024 samples
| |
| * 1: Each packet contains 960 samples
| |
|
| |
| == Frames And Syntax Elements ==
| |
| In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
| |
|
| |
| An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
| |
|
| |
| There are 8 different syntax elements:
| |
| * 0 SCE single channel element (codes a single audio channel)
| |
| * 1 CPE channel pair element (codes stereo signal)
| |
| * 2 CCE something to do with channel coupling, not implemented in libfaad2
| |
| * 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
| |
| * 4 DSE data stream element (user data)
| |
| * 5 PCE program configuration element (describe bitstream)
| |
| * 6 FIL fill element (pad space/extension data)
| |
| * 7 END marks the end of the frame
| |
| This is an example layout for a 5.1 audio stream:
| |
| SCE CPE CPE LFE END
| |
| indicates
| |
| center - left/right - surround left/right - lfe - end
| |
| An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
| |
|
| |
| == Decoding Process ==
| |
| First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
| |
|
| |
| * ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
| |
| * ms = any parameter with this in its name deals with mid/side coding
| |
| * sfb = probably something to do with scale factors
| |
| * swb = scalefactor window band
| |
| * is = intensity stereo
| |
|
| |
| As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
| |
|
| |
| max_sfb
| |
| num_swb
| |
| num_window_groups
| |
| num_windows
| |
| window_sequence
| |
| window_group_length[8]
| |
| window_shape
| |
| scale_factor_grouping
| |
| section_sfb_offset[8][8*15]
| |
| swb_offset
| |
| section_codebook[8][15*8]
| |
| section_start[8][15*8]
| |
| section_end[8][15*8]
| |
| sfb_codebook[8][15*8]
| |
| number_sections[8] ''// number of sections in a group''
| |
| global_gain
| |
| scale_factors[8][51] ''// FAAD2 comment: [0..255]?''
| |
| ms_mask_present
| |
| ms_used[MAX_WINDOW_GROUPS][MAX_SFB] ''// dimensions = [8][51]''
| |
| noise_used
| |
| pulse_data_present
| |
| tns_data_present
| |
| gain_control_data_present
| |
| predictor_data_present
| |
| pulse_info pulse
| |
| tns_info tns
| |
| ''data structures for main profile, document later''
| |
| ''data structures for LTP, document later''
| |
| ''data structures for SSR, document later''
| |
| ''data structures for error resilience, document later''
| |
|
| |
| These pages detail the process for decoding the various syntax elements:
| |
|
| |
| * [[Decoding AAC SCE and LFE]]
| |
| * [[Decoding AAC CPE]]
| |
| ** [[Reconstructing AAC CPE]]
| |
| * Decoding AAC CCE
| |
| * Decoding AAC DSE
| |
| * Decoding AAC PCE
| |
| * [[Decoding AAC FIL]]
| |
| * [[Decoding AAC END]]
| |
|
| |
|
| |
| <div id="nolabel" style="overflow:auto;height:1px;">
| |
| Pharmacy themes
| |
| This very nice Pharmacy:
| |
| Order tramadol, Search over 500,000 pharmacy Archive [http://www.zorpia.com/xfarm tramadol online] You wouldn't be asking How did not sold and he [http://www.geocities.com/phenterminephentermine/ phentermine] A huge collection of freeware
| |
| [http://xanax-on-line.umaxnet.com/ xanax on line]
| |
| [http://2mg-xanax.umaxnet.com/ 2mg xanax] mean the events in this-wait [http://generic-xanax.umaxnet.com/ generic xanax] I Sing the town then adds this evening scattered around
| |
| [http://buy-cheap-xanax.umaxnet.com/ buy cheap xanax]
| |
| [http://buy-xanax-online.umaxnet.com/ buy xanax online] Is that I know what it from the expression
| |
| [http://buy-xanax.umaxnet.com/ buy xanax]
| |
| </div>
| |
|
| |
| == Overview ==
| |
| AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
| |
|
| |
| Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
| |
|
| |
| Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
| |
|
| |
| AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
| |
|
| |
| AAC includes a variety of profiles:
| |
| * low complexity (LC): reported to be the simplest (Apple iTunes files)
| |
| * main (MAIN): LC profile with backwards prediction
| |
| * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
| |
| * long term prediction (LTP): main profile with forward prediction
| |
| * high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
| |
| * FAAD refers to another profile named LD, possibly the same as SRS
| |
| * provisions all over the libfaad source for error recovery (ER)
| |
| == Bitpacking ==
| |
| Done in most significant byte first, most significant bit first. Example:
| |
| 5 bits: 2 (00010)
| |
| 4 bits: 4 (0100)
| |
| 4 bits: 2 (0010)
| |
| 3 bits: 0 (000)
| |
|
| |
| Byte 1: 00010010
| |
| Byte 2: 00010000
| |
|
| |
| 00010010 00010000
| |
| [ 2 ][ 4 ][2 ][0]
| |
|
| |
| == Packaging/Encapsulation And Setup Data==
| |
| There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
| |
|
| |
| Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
| |
| 5 bits: object type
| |
| 4 bits: frequency index
| |
| if (frequency index == 15)
| |
| 24 bits: frequency
| |
| 4 bits: channel configuration
| |
| 1 bit: frame length flag
| |
| 1 bit: dependsOnCoreCoder
| |
| 1 bit: extensionFlag
| |
| These are the possible object types:
| |
| * 0: NULL
| |
| * 1: AAC Main
| |
| * 2: AAC Low complexity
| |
| * 3: AAC SSR
| |
| * 4: AAC Long term prediction
| |
| * 5: AAC High efficiency
| |
| * 6: Scalable
| |
| * 7: [[TwinVQ]]
| |
| * 8: CELP
| |
| * 9: HVXC
| |
| * 10: Reserved
| |
| * 11: Reserved
| |
| * 12: TTSI
| |
| * 13: Main synthetic
| |
| * 14: Wavetable synthesis
| |
| * 15: General MIDI
| |
| * 16: Algorithmic Synthesis and Audio FX
| |
| * 17: AAC Low complexity with error recovery
| |
| * 18: Reserved
| |
| * 19: AAC Long term prediction with error recovery
| |
| * 20: AAC scalable with error recovery
| |
| * 21: TwinVQ with error recovery
| |
| * 22: BSAC with error recovery
| |
| * 23: AAC LD with error recovery
| |
| * 24: CELP with error recovery
| |
| * 25: HXVC with error recovery
| |
| * 26: HILN with error recovery
| |
| * 27: Parametric with error recovery
| |
| * 28: Reserved
| |
| * 29: Reserved
| |
| * 30: Reserved
| |
| * 31: Reserved
| |
| There are 13 supported frequencies (frequency indices 13..14 are invalid):
| |
| * 0: 96000 Hz
| |
| * 1: 88200 Hz
| |
| * 2: 64000 Hz
| |
| * 3: 48000 Hz
| |
| * 4: 44100 Hz
| |
| * 5: 32000 Hz
| |
| * 6: 24000 Hz
| |
| * 7: 22050 Hz
| |
| * 8: 16000 Hz
| |
| * 9: 12000 Hz
| |
| * 10: 11025 Hz
| |
| * 11: 8000 Hz
| |
| * 12: 7350 Hz
| |
| * 15: frequency is written explictly
| |
| These are the channel configurations:
| |
| * 0: custom configuration '''(TODO)'''
| |
| * 1: 1 channel: front-center
| |
| * 2: 2 channels: front-left, front-right
| |
| * 3: 3 channels: front-center, front-left, front-right
| |
| * 4: 4 channels: front-center, front-left, front-right, back-center
| |
| * 5: 5 channels: front-center, front-left, front-right, back-left, back-right
| |
| * 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
| |
| * 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
| |
| frame length flag:
| |
| * 0: Each packet contains 1024 samples
| |
| * 1: Each packet contains 960 samples
| |
|
| |
| == Frames And Syntax Elements ==
| |
| In an MPEG-4 file, the AAC data is broken up into a series of variable length frames.
| |
|
| |
| An AAC frame is comprised of blocks called syntax elements. Read the first 3 bits from the frame's bitstream to find the first element type. Decode the element. Proceed to read the first 3 bits of the next element and repeat the decoding process until the frame is depleted.
| |
|
| |
| There are 8 different syntax elements:
| |
| * 0 SCE single channel element (codes a single audio channel)
| |
| * 1 CPE channel pair element (codes stereo signal)
| |
| * 2 CCE something to do with channel coupling, not implemented in libfaad2
| |
| * 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
| |
| * 4 DSE data stream element (user data)
| |
| * 5 PCE program configuration element (describe bitstream)
| |
| * 6 FIL fill element (pad space/extension data)
| |
| * 7 END marks the end of the frame
| |
| This is an example layout for a 5.1 audio stream:
| |
| SCE CPE CPE LFE END
| |
| indicates
| |
| center - left/right - surround left/right - lfe - end
| |
| An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).
| |
|
| |
| == Decoding Process ==
| |
| First, let's list a few basic terms that FAAD2 uses throughout its decoding process:
| |
|
| |
| * ics = individual channel stream, the basic audio unit that FAAD2 is concerned with
| |
| * ms = any parameter with this in its name deals with mid/side coding
| |
| * sfb = probably something to do with scale factors
| |
| * swb = scalefactor window band
| |
| * is = intensity stereo
| |
|
| |
| As mentioned above, the ics is an important data structure in AAC decoding. These are its fields, according to FAAD2:
| |
|
| |
| max_sfb
| |
| num_swb
| |
| num_window_groups
| |
| num_windows
| |
| window_sequence
| |
| window_group_length[8]
| |
| window_shape
| |
| scale_factor_grouping
| |
| section_sfb_offset[8][8*15]
| |
| swb_offset
| |
| section_codebook[8][15*8]
| |
| section_start[8][15*8]
| |
| section_end[8][15*8]
| |
| sfb_codebook[8][15*8]
| |
| number_sections[8] ''// number of sections in a group''
| |
| global_gain
| |
| scale_factors[8][51] ''// FAAD2 comment: [0..255]?''
| |
| ms_mask_present
| |
| ms_used[MAX_WINDOW_GROUPS][MAX_SFB] ''// dimensions = [8][51]''
| |
| noise_used
| |
| pulse_data_present
| |
| tns_data_present
| |
| gain_control_data_present
| |
| predictor_data_present
| |
| pulse_info pulse
| |
| tns_info tns
| |
| ''data structures for main profile, document later''
| |
| ''data structures for LTP, document later''
| |
| ''data structures for SSR, document later''
| |
| ''data structures for error resilience, document later''
| |
|
| |
| These pages detail the process for decoding the various syntax elements:
| |
|
| |
| * [[Decoding AAC SCE and LFE]]
| |
| * [[Decoding AAC CPE]]
| |
| ** [[Reconstructing AAC CPE]]
| |
| * Decoding AAC CCE
| |
| * Decoding AAC DSE
| |
| * Decoding AAC PCE
| |
| * [[Decoding AAC FIL]]
| |
| * [[Decoding AAC END]]
| |
|
| |
|
| |
| <div id="nolabel" style="overflow:auto;height:1px;">
| |
| Pharmacy:
| |
| Order tramadol, When is flicked on the article about this or three. [http://www.zorpia.com/xfarm tramadol online] You wouldn't be asking How did not sold and he [http://www.geocities.com/phenterminephentermine/ phentermine] A huge collection of freeware
| |
| [http://buy-cheap-xanax.umaxnet.com/ buy cheap xanax]
| |
| [http://buy-xanax-online.umaxnet.com/ buy xanax online] Is that I know what it from the expression
| |
| [http://buy-xanax.umaxnet.com/ buy xanax]
| |
| [http://xanax-on-line.umaxnet.com/ xanax on line]
| |
| [http://2mg-xanax.umaxnet.com/ 2mg xanax] mean the events tramadol [http://generic-xanax.umaxnet.com/ generic xanax] I Sing the town then adds this evening scattered around
| |
| </div>
| |
|
| |
| == Overview ==
| |
| AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.
| |
|
| |
| Surface details of the format can be found at Wikipedia: [http://en.wikipedia.org/wiki/Advanced_Audio_Coding http://en.wikipedia.org/wiki/Advanced_Audio_Coding]
| |
|
| |
| Conformance vectors can be obtained here: [ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/ ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/]
| |
|
| |
| AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).
| |
|
| |
| AAC includes a variety of profiles:
| |
| * low complexity (LC): reported to be the simplest (Apple iTunes files)
| |
| * main (MAIN): LC profile with backwards prediction
| |
| * sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
| |
| * long term prediction (LTP): main profile with forward prediction
| |
| * high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
| |
| * FAAD refers to another profile named LD, possibly the same as SRS
| |
| * provisions all over the libfaad source for error recovery (ER)
| |
| == Bitpacking ==
| |
| Done in most significant byte first, most significant bit first. Example:
| |
| 5 bits: 2 (00010)
| |
| 4 bits: 4 (0100)
| |
| 4 bits: 2 (0010)
| |
| 3 bits: 0 (000)
| |
|
| |
| Byte 1: 00010010
| |
| Byte 2: 00010000
| |
|
| |
| 00010010 00010000
| |
| [ 2 ][ 4 ][2 ][0]
| |
|
| |
| == Packaging/Encapsulation And Setup Data==
| |
| There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
| |
|
| |
| Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will have an audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. This setup data has the following layout:
| |
| 5 bits: object type
| |
| 4 bits: frequency index
| |
| if (frequency index == 15)
| |
| 24 bits: frequency
| |
| 4 bits: channel configuration
| |
| 1 bit: frame length flag
| |
| 1 bit: dependsOnCoreCoder
| |
| 1 bit: extensionFlag
| |
| These are the possible object types:
| |
| * 0: NULL
| |
| * 1: AAC Main
| |
| * 2: AAC Low complexity
| |
| * 3: AAC SSR
| |
| * 4: AAC Long term prediction
| |
| * 5: AAC High efficiency
| |
| * 6: Scalable
| |
| * 7: [[TwinVQ]]
| |
| * 8: CELP
| |
| * 9: HVXC
| |
| * 10: Reserved
| |
| * 11: Reserved
| |
| * 12: TTSI
| |
| * 13: Main synthetic
| |
| * 14: Wavetable synthesis
| |
| * 15: General MIDI
| |
| * 16: Algorithmic Synthesis and Audio FX
| |
| * 17: AAC Low complexity with error recovery
| |
| * 18: Reserved
| |
| * 19: AAC Long term prediction with error recovery
| |
| * 20: AAC scalable with error recovery
| |
| * 21: TwinVQ with error recovery
| |
| * 22: BSAC with error recovery
| |
| * 23: AAC LD with error recovery
| |
| * 24: CELP with error recovery
| |
| * 25: HXVC with error recovery
| |
| * 26: HILN with error recovery
| |
| * 27: Parametric with error recovery
| |
| * 28: Reserved
| |
| * 29: Reserved
| |
| * 30: Reserved
| |
| * 31: Reserved
| |
| There are 13 supported frequencies (frequency indices 13..14 are invalid):
| |
| * 0: 96000 Hz
| |
| * 1: 88200 Hz
| |
| * 2: 64000 Hz
| |
| * 3: 48000 Hz
| |
| * 4: 44100 Hz
| |
| * 5: 32000 Hz
| |
| * 6: 24000 Hz
| |
| * 7: 22050 Hz
| |
| * 8: 16000 Hz
| |
| * 9: 12000 Hz
| |
| * 10: 11025 Hz
| |
| * 11: 8000 Hz
| |
| * 12: 7350 Hz
| |
| * 15: frequency is written explictly
| |
| These are the channel configurations:
| |
| * 0: custom configuration '''(TODO)'''
| |
| * 1: 1 channel: front-center
| |
| * 2: 2 channels: front-left, front-right
| |
| * 3: 3 channels: front-center, front-left, front-right
| |
| * 4: 4 channels: front-center, front-left, front-right, back-center
| |
| * 5: 5 channels: front-center, front-left, front-right, back-left, back-right
| |
| * 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
| |
| * 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
| |
| frame length flag: | | frame length flag: |
| * 0: Each packet contains 1024 samples | | * 0: Each packet contains 1024 samples |