MPEG-4 Audio: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
No edit summary
m (14496-4 to 14496-26)
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:
* Samples: http://samples.mplayerhq.hu/MPEG-4/
* Samples: http://samples.mplayerhq.hu/MPEG-4/
* Samples: http://samples.mplayerhq.hu/A-codecs/AAC/
* Samples: http://samples.mplayerhq.hu/A-codecs/AAC/
* Samples: [http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_IEC_14496-4_2004_Conformance_Testing/audio_conformance/mpeg4audio-conformance/compressedMp4/ sample repo at standards.iso.org]
* Samples: [http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_IEC_14496-26_2010_Bitstreams/ sample repo at standards.iso.org]
* Sample Docs: [http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_IEC_14496-4_2004_Conformance_Testing/audio_conformance/mpeg4audio-conformance/doc/fileNameConventions.html sample docs]
* Sample Docs: [http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_IEC_14496-26_2010_Bitstreams/DVD1/mpeg4audio-conformance/doc/fileNameConventions.html sample docs]


Specification links:
Specification links:
*MPEG-4 Audio: [http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=42739 ISO/IEC 14496-3:2005] plus many addenda and corrigenda (soon to be 14496-3:2009)
*MPEG-4 Audio: [http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=53943 ISO/IEC 14496-3:2009]
*Conformance: [http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=36084 ISO/IEC 14496-4:2004] (soon to be 14496-26:2009)
*Conformance: [http://standards.iso.org/ittf/PubliclyAvailableStandards/c053750_ISO_IEC_14496-26_2010.zip ISO/IEC 14496-26:2010]


== MPEG-4 Audio ==
== MPEG-4 Audio ==
MPEG-4 includes a system for handling a diverse group of audio formats in a uniform matter. Each format is assigned a unique Audio Object Type (AOT) to represent it. The common format Global header shared by all AOTs is called the Audio Specific Config.  
MPEG-4 includes a system for handling a diverse group of audio formats in a uniform matter. Each format is assigned a unique Audio Object Type (AOT) to represent it. The common format Global header shared by all AOTs is called the Audio Specific Config.  
== Subparts ==
*Subpart 0: Overview
*Subpart 1: Main ([[MP4|Systems]] Interaction)
*Subpart 2: Speech coding - HVXC
*Subpart 3: Speech coding - CELP
*Subpart 4: General Audio coding (GA) - [[AAC]], TwinVQ, BSAC
*Subpart 5: Structured Audio (SA)
*Subpart 6: Text To Speech Interface (TTSI)
*Subpart 7: Parametric Audio Coding - HILN
*Subpart 8: Parametric coding for high quality audio - SSC (and [[Parametric Stereo]])
*Subpart 9: [[MP3|MPEG-1/2 Audio]] in MPEG-4
*Subpart 10: Lossless coding of oversampled audio - DST
*Subpart 11: Audio lossless coding - [[ALS]]
*Subpart 12: Scalable lossless coding - [[MPEG-4 SLS|SLS]]


== Audio Specific Config ==
== Audio Specific Config ==
Line 31: Line 46:
*4: [[AAC]] LTP (Long Term Prediction)
*4: [[AAC]] LTP (Long Term Prediction)
*5: SBR ([[Spectral Band Replication]])
*5: SBR ([[Spectral Band Replication]])
*6: AAC Scalable
*6: [[AAC]] Scalable
*7: TwinVQ
*7: [[TwinVQ]]
*8: [[CELP]] (Code Excited Linear Prediction)
*8: [[CELP]] (Code Excited Linear Prediction)
*9: HXVC (Harmonic Vector eXcitation Coding)
*9: HXVC (Harmonic Vector eXcitation Coding)
Line 42: Line 57:
*15: General MIDI
*15: General MIDI
*16: Algorithmic Synthesis and Audio Effects
*16: Algorithmic Synthesis and Audio Effects
*17: ER (Error Resilient) AAC LC
*17: ER (Error Resilient) [[AAC]] LC
*18: Reserved
*18: Reserved
*19: ER AAC LTP
*19: ER [[AAC]] LTP
*20: ER AAC Scalable
*20: ER [[AAC]] Scalable
*21: ER TwinVQ
*21: ER [[TwinVQ]]
*22: ER BSAC (Bit-Sliced Arithmetic Coding)
*22: ER [[BSAC]] (Bit-Sliced Arithmetic Coding)
*23: ER AAC LD (Low Delay)
*23: ER [[AAC]] LD (Low Delay)
*24: ER CELP
*24: ER [[CELP]]
*25: ER HVXC
*25: ER HVXC
*26: ER HILN (Harmonic and Individual Lines plus Noise)
*26: ER HILN (Harmonic and Individual Lines plus Noise)
Line 55: Line 70:
*28: SSC (SinuSoidal Coding)
*28: SSC (SinuSoidal Coding)
*29: PS ([[Parametric Stereo]])
*29: PS ([[Parametric Stereo]])
*30: [[Advanced Audio Coding#MPEG_Surround|MPEG Surround]]
*30: [[MPEG Surround]]
*31: (Escape value)
*31: (Escape value)
*32: Layer-1
*32: [[MP1|Layer-1]]
*33: [[MP2|Layer-2]]
*33: [[MP2|Layer-2]]
*34: [[MP3|Layer-3]]
*34: [[MP3|Layer-3]]
Line 63: Line 78:
*36: [[MPEG-4 ALS|ALS]] (Audio Lossless)
*36: [[MPEG-4 ALS|ALS]] (Audio Lossless)
*37: [[MPEG-4 SLS|SLS]] (Scalable LosslesS)
*37: [[MPEG-4 SLS|SLS]] (Scalable LosslesS)
*38: SLS non-core
*38: [[MPEG-4 SLS|SLS]] non-core
*39: ER AAC ELD (Enhanced Low Delay)
*39: ER [[AAC]] ELD (Enhanced Low Delay)
*40: SMR (Symbolic Music Representation) Simple
*40: SMR (Symbolic Music Representation) Simple
*41: SMR Main
*41: SMR Main
*42: [[USAC]] (Unified Speech and Audio Coding) (no [[SBR]])
*43: SAOC (Spatial Audio Object Coding)
*44: LD [[MPEG Surround]]
*45: [[USAC]]


== Sampling Frequencies ==
== Sampling Frequencies ==

Revision as of 11:32, 7 December 2011

Specification links:

MPEG-4 Audio

MPEG-4 includes a system for handling a diverse group of audio formats in a uniform matter. Each format is assigned a unique Audio Object Type (AOT) to represent it. The common format Global header shared by all AOTs is called the Audio Specific Config.

Subparts

  • Subpart 0: Overview
  • Subpart 1: Main (Systems Interaction)
  • Subpart 2: Speech coding - HVXC
  • Subpart 3: Speech coding - CELP
  • Subpart 4: General Audio coding (GA) - AAC, TwinVQ, BSAC
  • Subpart 5: Structured Audio (SA)
  • Subpart 6: Text To Speech Interface (TTSI)
  • Subpart 7: Parametric Audio Coding - HILN
  • Subpart 8: Parametric coding for high quality audio - SSC (and Parametric Stereo)
  • Subpart 9: MPEG-1/2 Audio in MPEG-4
  • Subpart 10: Lossless coding of oversampled audio - DST
  • Subpart 11: Audio lossless coding - ALS
  • Subpart 12: Scalable lossless coding - SLS

Audio Specific Config

The Audio Specific Config is the global header for MPEG-4 Audio:

5 bits: object type
if (object type == 31)
    6 bits + 32: object type
4 bits: frequency index
if (frequency index == 15)
    24 bits: frequency
4 bits: channel configuration
var bits: AOT Specific Config

Audio Object Types

MPEG-4 Audio Object Types:

  • 0: Null
  • 1: AAC Main
  • 2: AAC LC (Low Complexity)
  • 3: AAC SSR (Scalable Sample Rate)
  • 4: AAC LTP (Long Term Prediction)
  • 5: SBR (Spectral Band Replication)
  • 6: AAC Scalable
  • 7: TwinVQ
  • 8: CELP (Code Excited Linear Prediction)
  • 9: HXVC (Harmonic Vector eXcitation Coding)
  • 10: Reserved
  • 11: Reserved
  • 12: TTSI (Text-To-Speech Interface)
  • 13: Main Synthesis
  • 14: Wavetable Synthesis
  • 15: General MIDI
  • 16: Algorithmic Synthesis and Audio Effects
  • 17: ER (Error Resilient) AAC LC
  • 18: Reserved
  • 19: ER AAC LTP
  • 20: ER AAC Scalable
  • 21: ER TwinVQ
  • 22: ER BSAC (Bit-Sliced Arithmetic Coding)
  • 23: ER AAC LD (Low Delay)
  • 24: ER CELP
  • 25: ER HVXC
  • 26: ER HILN (Harmonic and Individual Lines plus Noise)
  • 27: ER Parametric
  • 28: SSC (SinuSoidal Coding)
  • 29: PS (Parametric Stereo)
  • 30: MPEG Surround
  • 31: (Escape value)
  • 32: Layer-1
  • 33: Layer-2
  • 34: Layer-3
  • 35: DST (Direct Stream Transfer)
  • 36: ALS (Audio Lossless)
  • 37: SLS (Scalable LosslesS)
  • 38: SLS non-core
  • 39: ER AAC ELD (Enhanced Low Delay)
  • 40: SMR (Symbolic Music Representation) Simple
  • 41: SMR Main
  • 42: USAC (Unified Speech and Audio Coding) (no SBR)
  • 43: SAOC (Spatial Audio Object Coding)
  • 44: LD MPEG Surround
  • 45: USAC

Sampling Frequencies

There are 13 supported frequencies:

  • 0: 96000 Hz
  • 1: 88200 Hz
  • 2: 64000 Hz
  • 3: 48000 Hz
  • 4: 44100 Hz
  • 5: 32000 Hz
  • 6: 24000 Hz
  • 7: 22050 Hz
  • 8: 16000 Hz
  • 9: 12000 Hz
  • 10: 11025 Hz
  • 11: 8000 Hz
  • 12: 7350 Hz
  • 13: Reserved
  • 14: Reserved
  • 15: frequency is written explictly

Channel Configurations

These are the channel configurations:

  • 0: Defined in AOT Specifc Config
  • 1: 1 channel: front-center
  • 2: 2 channels: front-left, front-right
  • 3: 3 channels: front-center, front-left, front-right
  • 4: 4 channels: front-center, front-left, front-right, back-center
  • 5: 5 channels: front-center, front-left, front-right, back-left, back-right
  • 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
  • 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
  • 8-15: Reserved