Understanding AAC: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
m (AAC moved to Understanding AAC)
No edit summary
Line 1: Line 1:
This portion of the MultimediaWiki tracks an effort to get an open, freely-distributable, usable, and clear specification for the Advanced Audio Coding (AAC) format.
This portion of the MultimediaWiki tracks an effort to get an open, freely-distributable, usable, and clear specification for the Advanced Audio Coding (AAC) format. The goal is to understand enough details about the format to create new decoder implementations that can handle production bitstreams starting with data packaged inside MPEG-4 files.


The homepage for libfaad has a Wiki that provides some decent details regarding the background coding concepts:
The homepage for libfaad has a Wiki that provides some decent details regarding the background coding concepts:
[http://www.audiocoding.com/modules/wiki/?page=AAC http://www.audiocoding.com/modules/wiki/?page=AAC]
[http://www.audiocoding.com/modules/wiki/?page=AAC http://www.audiocoding.com/modules/wiki/?page=AAC]
More possible details here: [http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-avt-rtp-mpeg2aac-00.txt http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-avt-rtp-mpeg2aac-00.txt]


== Overview ==
== Overview ==
Line 22: Line 24:
* provisions all over the libfaad source for error recovery (ER)
* provisions all over the libfaad source for error recovery (ER)


== Basic Bitstream Decoding ==
== Packaging/Encapsulation ==
There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.
 
Much AAC data is encapsulated in MPEG-4 files which is an extension of the [[Apple QuickTime]] container format. the MPEG-4 file will audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds  atom contains the setup data for associated AAC stream. '''(TODO: need to document the precise format and method for obtaining the setup data.)''' This setup data is generally 2 bytes. Actually, it is generally 13 bits but padded to 16 bits. This setup data has the following layout:
5 bits: object type
4 bits: frequency index
4 bits: channel configuration
These are the possible object types:
* 1: MAIN
* 2: LC (low complexity)
* 3: SSR
* 4: LTP (long term prediction)
* 5: HE_AAC (high efficiency)
* 17: ER_LC (low complexity with error recovery)
* 19: ER_LTP (long term prediction with error recovery)
* 23: LD
* 27: DRM_ER_LC 27
There are 12 supported frequencies (frequency indices 12..15 are invalid):
* 0: 96000 Hz
* 1: 88200 Hz
* 2: 64000 Hz
* 3: 48000 Hz
* 4: 44100 Hz
* 5: 32000 Hz
* 6: 24000 Hz
* 7: 22050 Hz
* 8: 16000 Hz
* 9: 12000 Hz
* 10: 11025 Hz
* 11: 8000 Hz
These are the channel configurations:
'''(TODO)'''
 
== Syntax Elements ==
An AAC frame is comprised of blocks called syntax elements. There are 8 different syntax elements:
* 0 SCE  single channel element (codes a single audio channel)
* 1 CPE  channel pair element (codes stereo signal)
* 2 CCE  something to do with channel coupling, not implemented in libfaad2
* 3 LFE  low-frequency effects? referenced as "special effects" in RTP doc
* 4 DSE  data stream element (user data)
* 5 PCE  program configuration element (describe bitstream)
* 6 FIL  fill element (pad space/extension data)
* 7 END  marks the end of the frame
This is an example layout for a 5.1 audio stream:
SCE CPE CPE LFE END
indicates
center - left/right - surround left/right - lfe - end
An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).

Revision as of 22:02, 24 December 2005

This portion of the MultimediaWiki tracks an effort to get an open, freely-distributable, usable, and clear specification for the Advanced Audio Coding (AAC) format. The goal is to understand enough details about the format to create new decoder implementations that can handle production bitstreams starting with data packaged inside MPEG-4 files.

The homepage for libfaad has a Wiki that provides some decent details regarding the background coding concepts: http://www.audiocoding.com/modules/wiki/?page=AAC

More possible details here: http://www.ietf.org/proceedings/99nov/I-D/draft-ietf-avt-rtp-mpeg2aac-00.txt

Overview

AAC is a perceptual audio codec which means that it throws away certain information during the compression process, information that has been deemed less important.

Surface details of the format can be found at Wikipedia: http://en.wikipedia.org/wiki/Advanced_Audio_Coding

Conformance vectors can be obtained here: ftp://mpaudconf:adif2mp4@ftp.iis.fhg.de/

AAC is a variable bitrate (VBR) block-based codec where each block decodes to 1024 time-domain samples. Allegedly, each frame stands alone and does not depend on previous frames (whereas many perceptual audio codecs overlap data with the previous frame).

AAC includes a variety of profiles:

  • low complexity (LC): reported to be the simplest (Apple iTunes files)
  • main (MAIN): LC profile with backwards prediction
  • sample-rate scalability (SRS): submitted by Sony and reportedly similar to ATRAC/3
  • long term prediction (LTP): main profile with forward prediction
  • high efficiency (HE, HE-AAC, aacPlus): uses spectral band replication (SBR) and may use parametric stereo
  • FAAD refers to another profile named LD, possibly the same as SRS
  • provisions all over the libfaad source for error recovery (ER)

Packaging/Encapsulation

There is a variety of methods for packaging AAC data from transport. 2 methods used in packaging raw streams are to use ADTS and ADIF headers. The libfaad knowledge base also makes reference to LATM and LOAS packaging.

Much AAC data is encapsulated in MPEG-4 files which is an extension of the Apple QuickTime container format. the MPEG-4 file will audio 'trak' atom which will contain a 'stsd' description atom which will contain an 'mp4a' atom which will contain an 'esds' atom. Part of the esds atom contains the setup data for associated AAC stream. (TODO: need to document the precise format and method for obtaining the setup data.) This setup data is generally 2 bytes. Actually, it is generally 13 bits but padded to 16 bits. This setup data has the following layout:

5 bits: object type
4 bits: frequency index
4 bits: channel configuration

These are the possible object types:

  • 1: MAIN
  • 2: LC (low complexity)
  • 3: SSR
  • 4: LTP (long term prediction)
  • 5: HE_AAC (high efficiency)
  • 17: ER_LC (low complexity with error recovery)
  • 19: ER_LTP (long term prediction with error recovery)
  • 23: LD
  • 27: DRM_ER_LC 27

There are 12 supported frequencies (frequency indices 12..15 are invalid):

  • 0: 96000 Hz
  • 1: 88200 Hz
  • 2: 64000 Hz
  • 3: 48000 Hz
  • 4: 44100 Hz
  • 5: 32000 Hz
  • 6: 24000 Hz
  • 7: 22050 Hz
  • 8: 16000 Hz
  • 9: 12000 Hz
  • 10: 11025 Hz
  • 11: 8000 Hz

These are the channel configurations: (TODO)

Syntax Elements

An AAC frame is comprised of blocks called syntax elements. There are 8 different syntax elements:

  • 0 SCE single channel element (codes a single audio channel)
  • 1 CPE channel pair element (codes stereo signal)
  • 2 CCE something to do with channel coupling, not implemented in libfaad2
  • 3 LFE low-frequency effects? referenced as "special effects" in RTP doc
  • 4 DSE data stream element (user data)
  • 5 PCE program configuration element (describe bitstream)
  • 6 FIL fill element (pad space/extension data)
  • 7 END marks the end of the frame

This is an example layout for a 5.1 audio stream:

SCE CPE CPE LFE END

indicates

center - left/right - surround left/right - lfe - end 

An ID within the respective CPE blocks indicates its channel assignments (front vs. surround).