QuickTime container

From MultimediaWiki
Jump to navigation Jump to search

Known FOURCCs

The following sections list FOURCCs known to appear in Apple QuickTime files. Note that sometimes the FOURCC is only 3 characters and there is a space (ASCII 0x20) to round out the full 4 characters.

Video FOURCCs

  • 8BPS "Apple Planar RGB"
  • SVQ1 "Sorenson Video¬ Compressor"
  • SVQ3 "Sorenson Video 3 Compressor"
  • WRLE "Apple BMP"
  • avc1 "H.264 Encoder"
  • cvid "Apple Cinepak"
  • dv5n "Apple DVCPRO50 - NTSC"
  • dv5p "Apple DVCPRO50 - PAL"
  • dvc "Apple DV/DVCPRO - NTSC"
  • dvcp "Apple DV - PAL"
  • dvpp "Apple DVCPRO - PAL"
  • h261 "Apple H.261"
  • h263 "H.263"
  • h263 "Apple VC H.263"
  • icod "Apple Intermediate Codec"
  • jpeg "Apple Photo - JPEG"
  • mjp2 "JPEG 2000 Encoder"
  • mjpa "Apple Motion JPEG A"
  • mjpb "Apple Motion JPEG B"
  • mp4v "Apple MPEG4 Compressor"
  • png "Apple PNG"
  • pxlt "Apple Pixlet Video"
  • raw "Apple None"
  • rle "Apple Animation"
  • rpza "Apple Video"
  • smc "Apple Graphics"
  • tga "Apple TGA"
  • tiff "Apple TIFF"
  • yuv2 "Apple Component Video - YUV422"
  • .... "DivX 4 Decoder"
  • .SGI "Apple SGI"
  • 2vuy "Apple YUV422 Codec (2vuy)"
  • 2vuy "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
  • 3IV2 "3ivx Decoder"
  • 3IVD "3ivx Decoder"
  • 3iv2 "3ivx Decoder"
  • 3ivd "3ivx Decoder"
  • 8BPS "Apple Planar RGB"
  • AP41 "DivX 3 Decoder"
  • BLZ0 "XVID Decoder"
  • COL0 "DivX 3 Decoder"
  • COL1 "DivX 3 Decoder"
  • DAVC "H264 Decoder"
  • DIV1 "MS-MPEG4 v1 Decoder"
  • DIV2 "MS-MPEG4 v2 Decoder"
  • DIV3 "DivX 3 Decoder"
  • DIV4 "DivX 3 Decoder"
  • DIV5 "DivX 3 Decoder"
  • DIV6 "DivX 3 Decoder"
  • DIVX "DivX 4 Decoder"
  • DX50 "DivX 5 Decoder"
  • FLV1 "Sorenson H.263 Decoder"
  • FMP4 "MPEG-4 Decoder"
  • FSV1 "Flash Screen Video Decoder"
  • H264 "H264 Decoder"
  • M4S2 "DivX 4 Decoder"
  • M4s2 "WMV Image Codec"
  • MP42 "MS-MPEG4 v2 Decoder"
  • MP43 "DivX 3 Decoder"
  • MP4S "DivX 4 Decoder"
  • MPG3 "DivX 3 Decoder"
  • MPG4 "MS-MPEG4 v1 Decoder"
  • Mjpg "VCM Image Codec"
  • Mp42 "WMV Image Codec"
  • Mp43 "WMV Image Codec"
  • Mp4S "WMV Image Codec"
  • PNTG "Apple MacPaint"
  • RMP4 "MPEG-4 Decoder"
  • SEDG "MPEG-4 Decoder"
  • SMP4 "MPEG-4 Decoder"
  • SVQ1 "Sorenson Video¬ Decompressor"
  • SVQ3 "Sorenson Video 3 Decompressor"
  • UMP4 "DivX 4 Decoder"
  • VP62 "TrueMotion VP6 Decoder"
  • VP6F "TrueMotion VP6 Decoder"
  • VSSH "H264 Decoder"
  • WMV1 "WMV Image Codec"
  • WMV2 "WMV Image Codec"
  • WMV3 "WMV Image Codec"
  • WRLE "Apple BMP"
  • WV1F "MPEG-4 Decoder"
  • X264 "H264 Decoder"
  • XVID "XVID Decoder"
  • XVIX "XVID Decoder"
  • XviD "XVID Decoder"
  • ac16 "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
  • ac32 "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
  • acBG "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
  • avc1 "H.264 Decoder"
  • avr "Apple AVR JPEG"
  • b16g "Apple 16-bit Gray"
  • b32a "Apple 32-bit Gray with Alpha"
  • b48r "Apple 48-bit RGB"
  • b64a "Apple 64-bit ARGB"
  • base "Apple 64-bit ARGB"
  • blit "Apple 64-bit ARGB"
  • blnd "Alpha Compositor"
  • blur "Blur"
  • brco "Brightness and Contrast"
  • chan "Channel Compositor"
  • ckey "Chroma Key"
  • clou "Cloud"
  • cmyk "Apple CMYK"
  • col0 "DivX 3 Decoder"
  • col1 "DivX 3 Decoder"
  • cupa "QTVR Cubic Codec"
  • cvid "Apple Cinepak"
  • div1 "MS-MPEG4 v1 Decoder"
  • div2 "MS-MPEG4 v2 Decoder"
  • div3 "DivX 3 Decoder"
  • div4 "DivX 3 Decoder"
  • div5 "DivX 3 Decoder"
  • div6 "DivX 3 Decoder"
  • divx "DivX 4 Decoder"
  • dmb1 "Apple OpenDML JPEG"
  • drmi "AVC0 Media"
  • dslv "Cross Fade"
  • dv5n "Apple DVCPRO50"
  • dv5n "DVCPRO50"
  • dv5p "Apple DVCPRO50"
  • dv5p "DVCPRO50"
  • dvc "Apple DV"
  • dvcp "Apple DV"
  • dvpp "Apple DVCPRO"
  • edge "Edge Detection"
  • embs "Emboss"
  • fire "Fire"
  • flic "Apple FLC"
  • fmns "Film Noise"
  • fpix "FlashPix Image"
  • gain "Alpha Gain"
  • geff "Special Effects and Filters"
  • genk "General Convolution"
  • gif "Apple GIF"
  • glas "Glass"
  • h261 "Apple H.261"
  • h263 "H.263"
  • h263 "Apple VC H.263"
  • h264 "H264 Decoder"
  • hslb "HSL Balance"
  • j420 "Apple YUV420 Codec"
  • jpeg "Apple Photo - JPEG"
  • kpcd "Apple Photo CD"
  • lens "Lens Flare"
  • ltpa "QTVR Cylindrical Codec"
  • m4s2 "DivX 4 Decoder"
  • matt "Gradient Wipe"
  • mjp2 "JPEG 2000 Decoder"
  • mjpa "Apple Motion JPEG A"
  • mjpb "Apple Motion JPEG B"
  • mp42 "MS-MPEG4 v2 Decoder"
  • mp43 "DivX 3 Decoder"
  • mp4s "DivX 4 Decoder"
  • mp4v "Apple MPEG4 Decompressor"
  • mpg3 "DivX 3 Decoder"
  • mpg4 "MS-MPEG4 v1 Decoder"
  • mplo "Implode"
  • msvc "Apple - Microsoft Video 1"
  • myuv "Apple YUV420 Codec"
  • path "Apple Curve"
  • pdf "PDF Image"
  • png "Apple PNG"
  • push "Push"
  • pxlt "Apple Pixlet Video"
  • qdrw "Apple QuickDraw"
  • r408 "Apple r408"
  • raw "Apple None"
  • raw "DV"
  • raw "Apple RGB to YUV"
  • rgbb "RGB Balance"
  • ripl "Ripple"
  • rle "Apple Animation"
  • rpza "Apple Video"
  • scal "Apple Scaling Codec"
  • shrp "Sharpen"
  • slid "Slide"
  • smc "Apple Graphics"
  • smp2 "Iris"
  • smp3 "Radial"
  • smp4 "Matrix Wipe"
  • smpt "Wipe"
  • solr "Color Style"
  • sync "ColorSync"
  • syv9 "Apple Sorenson YUV9 Codec"
  • text "Apple Text ATSUI Codec"
  • tga "Apple TGA"
  • tiff "Apple TIFF"
  • tint "Color Tint"
  • trav "Traveling Matte"
  • v408 "Apple v408"
  • x264 "H264 Decoder"
  • xplo "Explode"
  • xvid "XVID Decoder"
  • y420 "Apple YUV420 Codec"
  • yuv2 "Apple Component Video - YUV422"
  • yuvs "Apple YUV422 Codec (yuvs)"
  • yuvs "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
  • yuvu "Apple YUV422 Codec (yuvu)"
  • yuvx "Apple YUV422 Codec"
  • zoom "Zoom"
imdc:2Vuy:Ajav "AJA Kona 2Vuy Codec"
imdc:2vuy:AjaV "AJA Kona 2vuy VideoOut Codec"
imdc:2Vuy:AjaV "AJA Kona 2Vuy VideoOut Codec"
imdc:2vuy:appl "Digital Cinema Desktop Transfer Codec"
imdc:2vuy:AVSI "Aurora 8bit eXtreme? UC"
imdc:2Vuy:AVSS "Aurora 8bit Advanced UC"
imdc:2VUY:AVSS "Aurora 8bit UC Legacy"
imdc:2Vuy:BMAG "Blackmagic 2Vuy 8 Bit"
imdc:2vuy:BMAG "Blackmagic 8 Bit"
imdc:2vuy:DV64 "Digital Voodoo SD? 8 Bit"
imdc:2vuy:KeyG "Apple FCP Uncompressed 8-bit 4:2:2"
imdc:BGGR:ASC "ASC Bayer Decompressor"
imdc:C310:GL3N "10-bit Cineon Decompressor"
imdc:cini:bsmt "Cineon Codec"
imdc:D210:GL3N "DPX 10-bit Y'CbCr 4:2:2 Decompressor"
imdc:DV10:AVSS "Aurora DV 10 Bit UC"
imdc:DV10:BMAG "Blackmagic DV10 10 Bit"
imdc:DV10:DV64 "Digital Voodoo SD 10 Bit"
imdc:DV10:DVOO "Digital Voodoo SD 10 Bit"
imdc:DVOO:AVSS "Aurora DV 8 Bit UC"
imdc:DVOO:BMAG "Blackmagic DVOO 8 Bit"
imdc:DVOO:DV64 "Digital Voodoo SD? 8 Bit"
imdc:DVOO:DVOO "Digital Voodoo SD? 8 Bit"
imdc:dx45:DARC "DivX 6.0"
imdc:DX45:DARC "DivX 6.0"
imdc:Mczm:Thry "Microcosm Codec"
imdc:MIFF:Maya "Maya IFF Image Codec"
imdc:NO16:Thry "None16 Codec"
imdc:pRiz:appl "LiveType Codec Decompressor"
imdc:R10g:Ajav "AJA Kona 10-bit Log RGB Codec"
imdc:R10k:Ajav "AJA Kona 10-bit RGB Codec"
imdc:r408:AVSI "Aurora r408 Transfer Decompressor"
imdc:Shr0:BtJz "Sheer"
imdc:Shr1:BtJz "Sheer RGB[A] 8b"
imdc:Shr2:BtJz "Sheer Y'CbCr[A] 8bv 4:4:4[:4]"
imdc:Shr3:BtJz "Sheer Y'CbCr[A] 8bv 4:2:2[:4]"
imdc:Shr4:BtJz "Sheer Y'CbCr 8bw 4:2:2"
imdc:Shr5:BtJz "Sheer Y'CbCr[A] 10bv 4:4:4[:4]"
imdc:Shr6:BtJz "Sheer Y'CbCr[A] 10bv 4:2:2[:4]"
imdc:Shr7:BtJz "Sheer RGB[A] 10b"
imdc:v408:AVSI "Aurora v408 Transfer Decompressor"

Audio FOURCCs

  • MAC3:appl "MACE 3:1"
  • MAC6:appl "MACE 6:1"
  • QDM2:QDes "QDesign Music 2"
  • Qclp:QCOM "Qualcomm PureVoice¬"
  • alac:appl "Apple Lossless"
  • alaw:appl "ALaw 2:1"
  • fl32:appl "32-bit Floating Point"
  • fl64:appl "64-bit Floating Point"
  • ima4:appl "IMA 4:1"
  • in24:appl "24-bit Integer"
  • in32:appl "32-bit Integer"
  • mp4a:appl "MPEG-4 Audio"
  • samr:appl "AMR Narrowband"
  • sowt:appl "16-bit Little Endian"
  • twos:appl "16-bit Big Endian"
  • ulaw:appl "uLaw 2:1"
  • .mp3:FhG "MPEG Layer-3 Audio"
  • MAC3:appl "MACE 3:1"
  • MAC6:appl "MACE 6:1"
  • QDM2:QDes "QDesign Music 2"
  • QDMC:QDes "QDesign Music 1 Decoder"
  • Qclp:QCOM "Qualcomm PureVoice¬"
  • Qclq:QCOM "Qualcomm QCELP"
  • TS..:TELE "Microsoft ADPCM"
  • TS..:TELE "Microsoft G.711 aLaw"
  • TS..:TELE "Microsoft G.711 uLaw"
  • TS..:TELE "Microsoft IMA ADPCM"
  • TS.E:TELE "Microsoft G.726"
  • TS.U:TELE "Microsoft MPEG Layer-3"
  • WMA1:TELE "Windows Media Audio 7"
  • WMA2:TELE "Windows Media Audio 9 Standard"
  • WMA3:TELE "Windows Media Audio 9 Professional"
  • agsm:appl "Apple GSM 10:1"
  • alac:appl "Apple Lossless"
  • alaw:appl "ALaw 2:1"
  • drms:appl "DRM"
  • dvca:appl "DV"
  • dvi :appl "DVI 4:1"
  • fl32:appl "32-bit Floating Point"
  • fl64:appl "64-bit Floating Point"
  • ima4:appl "IMA 4:1"
  • in24:appl "24-bit Integer"
  • in32:appl "32-bit Integer"
  • lpc :appl "LPC 23:1"
  • mp4a:appl "MPEG-4 Audio"
  • ms..:appl "MS ADPCM"
  • ms..:appl "DVI IMA"
  • ms.1:appl "MS-GSM 6.10"
  • ms.U:FhG "MPEG Layer-3 Audio"
  • samr:appl "AMR Narrowband"
  • sowt:appl "16-bit Little Endian"
  • twos:appl "16-bit Big Endian"
  • ulaw:appl "?Law 2:1"
  • vdva:appl "DV"

Microsoft ID FOURCCs

These FOURCCs indicate that the audio information stsd atom also transports a Microsoft-style WAVEFORMATEX header.

Technical Description

The Apple Quicktime file format is an extremely well-defined file format. A little too well-defined, in fact. Some would even call it "over-engineered". The official Quicktime documentation is a magnificently detailed beast that gives equal time to explaining all parts of the spec, no matter how important or ignored a particular component may be in the actual implementation. The official spec can be a lot to digest at once and this document is intended to help interested programmers come up to speed on the Quicktime internals much more quickly.

This document emphasizes the components of the Quicktime file format that a programmer would need to know in order to write a general purpose Quicktime file decoder. This document also contains a discussion of decoding strategies.

Note that this document will probably never be complete since there is so much flexibility in the Quicktime format. But it is designed to cover the majority of QT files ever produced.

Byte Ordering

The first important fact to know about Quicktime files when writing a decoder is that all multi-byte numbers are big endian owing to Apple's Motorola heritage.

Atoms: The Fundamental Quicktime Building Blocks

Apple's Quicktime designers were thinking differently when they came up with the notion of an "atom" as "something that can contain other atoms". Atoms are chunks of data in that comprise a Quicktime file. Sometimes they contain data and sometimes they contain other atoms.

An atom consists of a size, a type, and a data payload. An atom is laid out as follows:

bytes 0-3    atom size (including 8-byte size and type preamble)
bytes 4-7    atom type
bytes 8..n   data

The 4 bytes allotted for the atom size field limit the maximum size of an atom to 4 GB. Quicktime also has a provision to allow atoms with 64-bit atom size fields by setting the size field 1 and adding the 8-byte size field after the atom type:

bytes 0-3    always 0x00000001
bytes 4-7    atom type
bytes 8-15   atom size (including 16-byte size and type preamble)
bytes 16..n  data

This is a logical exception since an atom always needs to be at least 8 bytes in length to account for the preamble. Therefore, if the size field is 1, load the 64-bit atom size from just after the atom type field.

If, on the other hand, the size field is 0, then the atom extends to the end of the file.

Known Top-Level Atoms

These are all of the QuickTime atoms that are known to be legal top-level atoms.

  • 'ftyp'
  • 'pdin'
  • 'moov'
  • 'moof'
  • 'mfra'
  • 'free'
  • 'skip'
  • 'junk'
  • 'wide'
  • 'pnot'
  • 'pict'
  • 'meta'
  • 'meco'
  • 'uuid' : Used by Sony's MSNV brand of MP4
  • 'mdat'

QuickTime File Types

Somewhere along the line the 'ftyp' atom was added as a possible top-level QuickTime atom. It is supposed to appear first in a QuickTime file. There are the known ftyp values:

  • 'qt '
  • 'isom'
  • 'mp41'
  • 'mp42'
  • '3gp1'
  • '3gp2'
  • '3gp3'
  • '3gp4'
  • '3gp5'
  • '3g2a'
  • 'mmp4' : Mobile MPEG-4
  • 'M4A '
  • 'M4P '
  • 'M4V '
  • 'mjp2' : Motion JPEG 2000
  • 'MSNV' : Sonys private brand; Used for example to encode MP4s for the PSP
  • 'FACE'

A more authoritative list can be found at http://ftyps.com/ .

General File Organization

In the abstract atom hierarchy, this is how a Quicktime file is laid out:

moov
  mvhd
  trak
    tkhd
    edts
      elst
    mdia
      mdhd
      minf
        stbl
          stsd
          stco
          co64
          stts
          stss
          stsc
          stsz
  trak
  trak
  ..
mdat
  [data]
  [data]
  [...]

Note that this is not an exhaustive tree of all possible or known atoms; these are only the atoms that have been empirically determined as "interesting" for the purposes of writing a general-purpose decoder that can handle most Quicktime files.

All Quicktime files need to have a moov atom and a mdat atom at the top level. There are other top level atoms as well (e.g. the 'ftyp' atom), which generally are not interesting and can safely be skipped if encountered. The moov atom contains instructions for playing the data in the file. The mdat atom contains the data that will be played.

Meta Data

A 'meta' atom contains atoms containing human-readable textual data with meta information regarding the file. These atoms are marked with 4 bytes of course but the first byte is a value of 0xA9. The remaining 3 characters can be:

  • nam: Name of song or video
  • cpy: Copyright information
  • des: File description
  • cmt: General comment
  • alb: Album name
  • gen: Custom Genre
  • ART: Artist name
  • too: Encoder
  • wrt: Writer
  • day: Content created year

Decompressing Compressed moov Atoms With zlib

The prospect of having to decode compressed moov atoms in Quicktime files seems to give many programmers pause. This need not be the case. When a compressed moov atom is detected, the free, open source zlib compression library can be called upon to do all the hard work.

In the abstract atom hierarchy, a compressed moov atom is laid out like this:

moov
  cmov
    dcom
    cmvd

On disk, a compressed moov atom will look this this:

bytes 0-3:   atom size (including 8-byte size and type preamble)
bytes 4-7:   atom type ('moov', movie header)
bytes 8-11:  atom size (including 8-byte size and type preamble)
bytes 12-15: atom type ('cmov', compressed movie header)
bytes 16-19: atom size (this should be 12 bytes)
bytes 20-23: atom type ('dcom', decompressor)
bytes 24-27: decompression library used (usually 'zlib')
bytes 28-31: atom size (including 8-byte size and type preamble)
bytes 32-35: atom type ('cmvd', compressed movie header data)
bytes 36-39: size of decompressed data
bytes 40-n:  compressed data

Note that this structure makes it theoretically possible to use other libraries to compress moov atoms, but zlib is most commonly used.

Here is a lazy algorithm for decompressing a compressed moov atom:

  1. check if bytes 12-15 contain 'cmov'; if yes:
  2. allocate a buffer for the decompressed moov atom, the size of which is specified by bytes 36-39
  3. initialize the zlib library, initialize a z_stream structure with pointers to the compressed and decompressed buffers, and all the other necessary variables
  4. call zlib to decompress the atom
  5. free the compressed moov atom, process the newly-decompressed moov atom (which will begin with a proper size and 'moov' type)

As an aside, one might wonder about the rationale behind compressing moov atoms. The data inside QT files can reach gargantuan sizes, and the moov atom will be rather tiny in comparison. Why bother saving a few tens of kilobytes on the moov atom? One suggestion I have received is data integrity: Compression with zlib offers CRC validation. If an error occurs in the data stream while transmitting the compressed moov atom, a problem will be detected during decompression.

Audio Handling

Constant Bitrate Audio

Constant Bitrate Audio Width Header

Variable Bitrate Audio

Palette Handling

QuickTime Atom Reference

cmov

cmov
Function: compressed moov atom
Contained In: moov
Can Contain: cmvd, dcom


cmvd

cmvd
Function: compressed moov atom data
Contained In: cmov
Can Contain: leaf atom


co64

co64
Function: 64-bit chunk offsets
Contained In: stbl
Can Contain: leaf atom

The co64 atom for a track lists the offsets for the various chunks that comprise a media track. These offsets are absolute offsets within the file starting from offset 0. Note that this atom allows for 64-bit offsets which is necessary for QuickTime files exceeding 4 gigabytes. Smaller files can save space in this table by using the stco atom which only allows for 32-bit offsets.

 QT Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   total entries in offset table (n)
 8 bytes   chunk offset 0
 8 bytes   chunk offset 1
  ..
  ..
 8 bytes   chunk offset n-1

ctts

ctts
Function: 32 bits difference PTS-DTS
Contained In: stbl
Can Contain: tbd

When storing video stream with B-Frames, PTS (Presentation timestamp) may be larger than DTS (Decoder timestamp). It happens because b-frame requires frames following after it do be decoded. Value of this atom is also called Composition Time Offset as, for example, in FLV format.

 QT Atom Preamble
 1 byte    version, 0
 3 bytes   flags
 4 bytes   entry count
 4 bytes   sample count, having following offset
 4 bytes   offset

dcom

dcom
Function: compressed moov compression method
Contained In: cmov
Can Contain: leaf atom


edts

edts
Function: edit samples
Contained In: trak
Can Contain: elst


elst

elst
Function: edit list
Contained In: edts
Can Contain: leaf atom

The elst atom contains the edit list. The edit list contains information about the times and durations that pieces of a media track are to be presented during playback. There are many Quicktime file decoders that choose to ignore this atom. This is not a good idea. The edit list atom must be taken into account to guarantee proper A/V sync on certain files.

An edit list atom has the following structure:

 QuickTime Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   number of edit list entries
  <edit list entries>

An individual edit list entry is 12 bytes in size and has the following structure:

 bytes 0-3   edit duration (in global timescale units)
 bytes 4-7   edit media time (in trak timescale units)
 bytes 8-11  playback speed

esds

esds
Function: Elementary Stream Descriptors
Contained In: tbd
Can Contain: tbd

fiel

fiel
Function: field ordering http://developer.apple.com/quicktime/icefloe/dispatch019.html#fiel
Contained In: tbd
Can Contain: none

free

free
Function: free space
Contained In: top level
Can Contain: tbd

ftyp

ftyp
Function: file type
Contained In: top level
Can Contain: tbd


gmhd

gmhd
Function: generic media header
Contained In: minf
Can Contain: leaf atom

hdlr

hdlr
Function: handler type
Contained In: mdia or minf
Can Contain: leaf atom
   version
   flags
   component_type
   subtype
   manufacturer
   res_flags
   res_flags_mask
   name

The component_type can denote this track is 'dhlr' for data or 'mhlr' for media. The subtype is a 4 letter code identifying the specific handler - for example 'vide' for video, 'soun' for sound, 'alis' for a file alias, and more. The hdlr atom under mdia seems more useful than the descendant of minf.

iods

iods
Function: ????
Contained In: tbd
Can Contain: tbd


junk

junk
Function: junk space
Contained In: top level
Can Contain: tbd


mdat

mdat
Function: media data
Contained In: top level
Can Contain: media data

Media data contains the actual video and audio samples. The only way to interpret the raw samples is with the metadata from moov.

mdhd

mdhd
Function: media header
Contained In: mdia
Can Contain: leaf atom

The mdhd atom is the media header atom, containing some parameters for the current stream. There are at least two versions available of the mdhd atom (which can be found in the first byte after the preamble).

An mdhd version 0 atom has the following structure:

 QuickTime Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   creation time
 4 bytes   modification time
 4 bytes   time scale
 4 bytes   duration
 2 bytes   language
 2 bytes   quality

For version 1 instead a few entries have changed sizes from 4 to 8 bytes:

 QuickTime Atom Preamble
 1 byte    version
 3 bytes   flags
 8 bytes   creation time
 8 bytes   modification time
 4 bytes   time scale
 8 bytes   duration
 2 bytes   language
 2 bytes   quality

The language value is a three letters ISO 639 language code represented with three 5-bit values (each of which is the ASCII value of the letter minus 0x60).

mdia

mdia
Function: media
Contained In: trak
Can Contain: mdhd, minf


minf

minf
Function: media information
Contained In: mdia
Can Contain: gmhd, smhd, stbl, vmhd


moov

moov
Function: movie header
Contained In: top level
Can Contain: mvhd


mvhd

mvhd
Function: movie header
Contained In: moov
Can Contain: tbd


pict

pict
Function: ????
Contained In: top level
Can Contain: tbd


pnot

pnot
Function: image preview
Contained In: top level
Can Contain: tbd

This atom contains information about the preview image. This image or poster can be visible in for example a file browser.

rdrf

rdrf
Function: data reference
Contained In: rmda
Can Contain: none allowed

This atom defines where the reference movie can be found. The location can be given as an alias or as a URL.

Only one allowed data reference atom per movie reference descriptor atom.


rmcd

rmcd
Function: component detect
Contained In: rmda
Can Contain: none allowed

This atom specifies required Quicktime components, such as codecs, needed and can also specify the minimum version of the component.

Multiple component detect atoms are allowed in a reference movie descriptor atom and all minimum versions of all the components must be met.


rmcs

rmcs
Function: CPU speed atom
Contained In: rmda
Can Contain: none allowed

This atom describes the minimum amount of computer power required to play this reference movie. The computer power is given as a relative number on an unknown, from the documentation, scale. Examples from Apple's documentation include 100 being equivalent to a 166MHz Pentium and 500 for a 400MHz G4 PowerPC. The Apple documentation estimates that a gigahertz computer with a graphics accelerator might give a power number as high as 1000 and that future computers will allow even higher numbers. It seems that maybe this number isn't very useful since there doesn't seem to be anything to help determine what kind of computing power multi-gigahertz machines, that provide more computing power per Hz than previous generations did provides on this ill-defined scale.

If the CPU speed atom is given then a computer that does not meet the specifications will play the reference movie with the next lower power requirement that it meets.

An application should assume that the reference movie with the highest valued CPU speed will have the highest quality.

Only one allowed CPU speed atom per movie reference descriptor atom.


rmda

rmda
Function: reference movie descriptor atom
Contained In: rmra
Can Contain: rdrf, rmdr, rmcs, rmvc, rmcd, rmqu

The reference movie descriptor contains atoms that describe where to find an alternate movie. It also may contain atoms for items such as system requirements and movie quality.

Multiple reference movie descriptors are usually found in a reference movie atom.


rmdr

rmdr
Function: minimum data rate
Contained In: rmda
Can Contain: none allowed

The data rate atom indicates the minimum connection speed needed to display this reference movie. The data rate is given in bits per second.

Only one data rate atom is allowed per container reference movie descriptor atom.


rmqu

rmqu
Function: quality atom
Contained In: rmda
Can Contain: none allowed

The quality atom acts as a tie breaker. It gives a relative quality number that helps the application decide which reference movie to play if all other quality-defining atoms are equal. A higher value indicates a higher quality.

Only one quality atom is allowed per container reference movie descriptor atom.


rmra

rmra
Function: reference movie
Contained In: moov
Can Contain: rmda

The reference movie atom typically contains data about alternate movies. Which alternate movie plays depends upon conditions such as the system requirements listed in the contained atoms. There can be only one reference movie atom per movie.


rmvc

rmvc
Function: version check
Contained In: rmda
Can Contain: none allowed

This atom indicates the minimum version of the software package, such as Quicktime or Quicktime VR, that is needed to play the reference movie. The application to match against is given as a Macintosh Gestalt type (e.g. 'qtim').

Multiple version check atom are allowed per reference movie descriptor atom. The application must meet all minimum version check requirements to play the reference movie.


skip

skip
Function: skipped space
Contained In: top level
Can Contain: tbd


smhd

smhd
Function: sound media header
Contained In: minf
Can Contain: leaf atom


stbl

stbl
Function: sample table
Contained In: minf
Can Contain: co64, ctts, stco, stsc, stsd, stss, stsz, stts

stco

stco
Function: sample table chunk offset
Contained In: stbl
Can Contain: leaf atom

The stco atom for a track lists the offsets for the various chunks that comprise a media track. These offsets are absolute offsets within the file starting from offset 0. Note that this atom only allows for 32-bit offsets. Files requiring 64-bit offsets must use the co64 atom.

 QT Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   total entries in offset table (n)
 4 bytes   chunk offset 0
 4 bytes   chunk offset 1
  ..
  ..
 4 bytes   chunk offset n-1

stsc

stsc
Function: sample table sample to chunk map
Contained In: stbl
Can Contain: leaf atom


stsd

stsd
Function: sample table sample description
Contained In: stbl
Can Contain: leaf atom


stss

stss
Function: sample table sync samples
Contained In: stbl
Can Contain: leaf atom

The stss atom is the sample table sync sample atom. This atom contains a list of all samples in the track that are marked as sync samples. Sync samples are also known as keyframes or intra-coded frames. These samples indicate which video frames can be completely decoded on their own, without any information from other video frames, thus making the frames safe to jump to randomly.

An stss atom has the following structure:

 QuickTime Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   number of sync samples (n)
 4 bytes   sync sample 1
 4 bytes   sync sample 2
  ..
  ..
 4 bytes   sync sample n

Each entry in the sync sample table indicates the ID of a sample that is a sync sample. Note that this table begins numbering from 1 rather than 0.

As an example, if the stss atom of a video trak has 4 entries and those entries are 1, 9, 19, and 34, that means that video frames 1, 9, 19, and 34 (or 0, 8, 18, and 33 if your frames are numbered beginning at 0) are sync samples.

If a trak has no stss atom then all of the samples in the track are implicitly sync samples.

stsz

stsz
Function: sample table sizes
Contained In: stbl
Can Contain: leaf atom

The stsz atom is the sample table size size atom. This atom contains the sizes of all of the samples in a trak.

 QuickTime Atom Preamble
 1 byte     version
 3 bytes    flags
 4 bytes    uniform size of each sample
 4 bytes    number of sample sizes (n)
 4 bytes    sample 0 size
 4 bytes    sample 1 size
  ..
  ..
 4 bytes    sample (n-1) size

The stsz atom can operate in one of two modes. First, it is possible that all of the samples in a trak have the same size. In this case, the uniform size fieldis set to the constant size. The number of sample sizes field is set to the total number samples in the trak, and there is no sample size table following. This mode is commonly used in the stsz atom of audio traks. For example, in an audio file with length of 2 seconds that has a sample rate of 22050 Hz, the uniform size field will be set to 1, indicating that the the size of each sample is 1. The number of sample sizes field will be set to 44100 (22050 samples/sec * 2 sec = 44100 samples).

In the second mode, all of the samples are a different size (logically, this mode would have to be used even if all of the samples were the same size except for one). In this case, the uniform size field is set to 0. The number of sample sizes field contains the number of entries in the sample size table. Each entry in the sample size table contains the size of a sample in the trak.

stts

stts
Function: sample table time to sample map
Contained In: stbl
Can Contain: leaf atom

For example stts exists in track type vide or alis. The table is an array of pairs which denote (sample_count, sample_time_delta). So you can calculate the total amount of frames in the video track by summing sample_count. Also, the total duration of the track should be:

   duration = (sample_count1 * sample_time_delta1 + ... + sample_countN * sample_time_deltaN ) / timescale

tkhd

tkhd
Function: track header
Contained In: trak
Can Contain: leaf atom


trak

trak
Function: track header
Contained In: moov
Can Contain: tkhd


uuid

uuid
Function: Used by the PSP MSNV brand of MP4
Contained In: tbd
Can Contain: tbd

vmhd

vmhd
Function: video media header
Contained In: minf
Can Contain: leaf atom


wide

wide
Function: skipped data
Contained In: top level
Can Contain: tbd

wfex

wfex
Function: wraps a Microsoft WAVEFORMATEX structure
Contained In: stsd
Can Contain: none?

References