QuickTime container
- Extensions: mov, qt, mp4, m4v, m4a, m4p, m4b, m4r, k3g, skm, 3gp, 3g2
- Company: Apple
- Official specification: http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/index.html (mirrored PDF)
- ISO base media file format: ISO/IEC 14496-12:2008+Cor.1+Cor.2
- MIME Types:
- mov, qt: video/quicktime or video/x-quicktime
- m4a, m4b: audio/x-m4a
Known FOURCCs
The following sections list FOURCCs known to appear in Apple QuickTime files. Note that sometimes the FOURCC is only 3 characters and there is a space (ASCII 0x20) to round out the full 4 characters.
Video FOURCCs
- 'raw '
- 'rle '
- 'smc '
- 'rpza'
- '8bps'
- 'qdrw'
- 'cvid'
- 'svq1'
- 'svq3'
- 'DIVX'
- 'h263'
- 'mp4v'
- 'mx5p': MPEG2 IMX 635/50 50mb/s produced by Final Cut Pro
- 'mx3n': MPEG2 IMX 635/50 30mb/s produced by Final Cut Pro
- 'dvpp': DVCPRO PAL produced by Final Cut Pro
- 'dv5p': DVCPRO50 produced by Final Cut Pro
- 'hdv3': HDV produced by Final Cut Pro
- 8BPS "Apple Planar RGB"
- SVQ1 "Sorenson Video¬ Compressor"
- SVQ3 "Sorenson Video 3 Compressor"
- WRLE "Apple BMP"
- avc1 "H.264 Encoder"
- cvid "Apple Cinepak"
- dv5n "Apple DVCPRO50 - NTSC"
- dv5p "Apple DVCPRO50 - PAL"
- dvc "Apple DV/DVCPRO - NTSC"
- dvcp "Apple DV - PAL"
- dvpp "Apple DVCPRO - PAL"
- h261 "Apple H.261"
- h263 "H.263"
- h263 "Apple VC H.263"
- icod "Apple Intermediate Codec"
- jpeg "Apple Photo - JPEG"
- mjp2 "JPEG 2000 Encoder"
- mjpa "Apple Motion JPEG A"
- mjpb "Apple Motion JPEG B"
- mp4v "Apple MPEG4 Compressor"
- png "Apple PNG"
- pxlt "Apple Pixlet Video"
- raw "Apple None"
- rle "Apple Animation"
- rpza "Apple Video"
- smc "Apple Graphics"
- tga "Apple TGA"
- tiff "Apple TIFF"
- yuv2 "Apple Component Video - YUV422"
- .... "DivX 4 Decoder"
- .SGI "Apple SGI"
- 2vuy "Apple YUV422 Codec (2vuy)"
- 2vuy "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
- 3IV2 "3ivx Decoder"
- 3IVD "3ivx Decoder"
- 3iv2 "3ivx Decoder"
- 3ivd "3ivx Decoder"
- 8BPS "Apple Planar RGB"
- AP41 "DivX 3 Decoder"
- BLZ0 "XVID Decoder"
- COL0 "DivX 3 Decoder"
- COL1 "DivX 3 Decoder"
- DAVC "H264 Decoder"
- DIV1 "MS-MPEG4 v1 Decoder"
- DIV2 "MS-MPEG4 v2 Decoder"
- DIV3 "DivX 3 Decoder"
- DIV4 "DivX 3 Decoder"
- DIV5 "DivX 3 Decoder"
- DIV6 "DivX 3 Decoder"
- DIVX "DivX 4 Decoder"
- DX50 "DivX 5 Decoder"
- FLV1 "Sorenson H.263 Decoder"
- FMP4 "MPEG-4 Decoder"
- FSV1 "Flash Screen Video Decoder"
- H264 "H264 Decoder"
- M4S2 "DivX 4 Decoder"
- M4s2 "WMV Image Codec"
- MP42 "MS-MPEG4 v2 Decoder"
- MP43 "DivX 3 Decoder"
- MP4S "DivX 4 Decoder"
- MPG3 "DivX 3 Decoder"
- MPG4 "MS-MPEG4 v1 Decoder"
- Mjpg "VCM Image Codec"
- Mp42 "WMV Image Codec"
- Mp43 "WMV Image Codec"
- Mp4S "WMV Image Codec"
- PNTG "Apple MacPaint"
- RMP4 "MPEG-4 Decoder"
- SEDG "MPEG-4 Decoder"
- SMP4 "MPEG-4 Decoder"
- SVQ1 "Sorenson Video¬ Decompressor"
- SVQ3 "Sorenson Video 3 Decompressor"
- UMP4 "DivX 4 Decoder"
- VP62 "TrueMotion VP6 Decoder"
- VP6F "TrueMotion VP6 Decoder"
- VSSH "H264 Decoder"
- WMV1 "WMV Image Codec"
- WMV2 "WMV Image Codec"
- WMV3 "WMV Image Codec"
- WRLE "Apple BMP"
- WV1F "MPEG-4 Decoder"
- X264 "H264 Decoder"
- XVID "XVID Decoder"
- XVIX "XVID Decoder"
- XviD "XVID Decoder"
- ac16 "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
- ac32 "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
- acBG "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
- avc1 "H.264 Decoder"
- avr "Apple AVR JPEG"
- b16g "Apple 16-bit Gray"
- b32a "Apple 32-bit Gray with Alpha"
- b48r "Apple 48-bit RGB"
- b64a "Apple 64-bit ARGB"
- base "Apple 64-bit ARGB"
- blit "Apple 64-bit ARGB"
- blnd "Alpha Compositor"
- blur "Blur"
- brco "Brightness and Contrast"
- chan "Channel Compositor"
- ckey "Chroma Key"
- clou "Cloud"
- cmyk "Apple CMYK"
- col0 "DivX 3 Decoder"
- col1 "DivX 3 Decoder"
- cupa "QTVR Cubic Codec"
- cvid "Apple Cinepak"
- div1 "MS-MPEG4 v1 Decoder"
- div2 "MS-MPEG4 v2 Decoder"
- div3 "DivX 3 Decoder"
- div4 "DivX 3 Decoder"
- div5 "DivX 3 Decoder"
- div6 "DivX 3 Decoder"
- divx "DivX 4 Decoder"
- dmb1 "Apple OpenDML JPEG"
- drmi "AVC0 Media"
- dslv "Cross Fade"
- dv5n "Apple DVCPRO50"
- dv5n "DVCPRO50"
- dv5p "Apple DVCPRO50"
- dv5p "DVCPRO50"
- dvc "Apple DV"
- dvcp "Apple DV"
- dvpp "Apple DVCPRO"
- edge "Edge Detection"
- embs "Emboss"
- fire "Fire"
- flic "Apple FLC"
- fmns "Film Noise"
- fpix "FlashPix Image"
- gain "Alpha Gain"
- geff "Special Effects and Filters"
- genk "General Convolution"
- gif "Apple GIF"
- glas "Glass"
- h261 "Apple H.261"
- h263 "H.263"
- h263 "Apple VC H.263"
- h264 "H264 Decoder"
- hslb "HSL Balance"
- j420 "Apple YUV420 Codec"
- jpeg "Apple Photo - JPEG"
- kpcd "Apple Photo CD"
- lens "Lens Flare"
- ltpa "QTVR Cylindrical Codec"
- m4s2 "DivX 4 Decoder"
- matt "Gradient Wipe"
- mjp2 "JPEG 2000 Decoder"
- mjpa "Apple Motion JPEG A"
- mjpb "Apple Motion JPEG B"
- mp42 "MS-MPEG4 v2 Decoder"
- mp43 "DivX 3 Decoder"
- mp4s "DivX 4 Decoder"
- mp4v "Apple MPEG4 Decompressor"
- mpg3 "DivX 3 Decoder"
- mpg4 "MS-MPEG4 v1 Decoder"
- mplo "Implode"
- msvc "Apple - Microsoft Video 1"
- myuv "Apple YUV420 Codec"
- path "Apple Curve"
- pdf "PDF Image"
- png "Apple PNG"
- push "Push"
- pxlt "Apple Pixlet Video"
- qdrw "Apple QuickDraw"
- r408 "Apple r408"
- raw "Apple None"
- raw "DV"
- raw "Apple RGB to YUV"
- rgbb "RGB Balance"
- ripl "Ripple"
- rle "Apple Animation"
- rpza "Apple Video"
- scal "Apple Scaling Codec"
- shrp "Sharpen"
- slid "Slide"
- smc "Apple Graphics"
- smp2 "Iris"
- smp3 "Radial"
- smp4 "Matrix Wipe"
- smpt "Wipe"
- solr "Color Style"
- sync "ColorSync"
- syv9 "Apple Sorenson YUV9 Codec"
- text "Apple Text ATSUI Codec"
- tga "Apple TGA"
- tiff "Apple TIFF"
- tint "Color Tint"
- trav "Traveling Matte"
- v408 "Apple v408"
- x264 "H264 Decoder"
- xplo "Explode"
- xvid "XVID Decoder"
- y420 "Apple YUV420 Codec"
- yuv2 "Apple Component Video - YUV422"
- yuvs "Apple YUV422 Codec (yuvs)"
- yuvs "YUV 4:2:2 Hardware Acceleration Codec (yuvs)"
- yuvu "Apple YUV422 Codec (yuvu)"
- yuvx "Apple YUV422 Codec"
- zoom "Zoom"
imdc:2Vuy:Ajav "AJA Kona 2Vuy Codec" imdc:2vuy:AjaV "AJA Kona 2vuy VideoOut Codec" imdc:2Vuy:AjaV "AJA Kona 2Vuy VideoOut Codec" imdc:2vuy:appl "Digital Cinema Desktop Transfer Codec" imdc:2vuy:AVSI "Aurora 8bit eXtreme? UC" imdc:2Vuy:AVSS "Aurora 8bit Advanced UC" imdc:2VUY:AVSS "Aurora 8bit UC Legacy" imdc:2Vuy:BMAG "Blackmagic 2Vuy 8 Bit" imdc:2vuy:BMAG "Blackmagic 8 Bit" imdc:2vuy:DV64 "Digital Voodoo SD? 8 Bit" imdc:2vuy:KeyG "Apple FCP Uncompressed 8-bit 4:2:2" imdc:BGGR:ASC "ASC Bayer Decompressor" imdc:C310:GL3N "10-bit Cineon Decompressor" imdc:cini:bsmt "Cineon Codec" imdc:D210:GL3N "DPX 10-bit Y'CbCr 4:2:2 Decompressor" imdc:DV10:AVSS "Aurora DV 10 Bit UC" imdc:DV10:BMAG "Blackmagic DV10 10 Bit" imdc:DV10:DV64 "Digital Voodoo SD 10 Bit" imdc:DV10:DVOO "Digital Voodoo SD 10 Bit" imdc:DVOO:AVSS "Aurora DV 8 Bit UC" imdc:DVOO:BMAG "Blackmagic DVOO 8 Bit" imdc:DVOO:DV64 "Digital Voodoo SD? 8 Bit" imdc:DVOO:DVOO "Digital Voodoo SD? 8 Bit" imdc:dx45:DARC "DivX 6.0" imdc:DX45:DARC "DivX 6.0" imdc:Mczm:Thry "Microcosm Codec" imdc:MIFF:Maya "Maya IFF Image Codec" imdc:NO16:Thry "None16 Codec" imdc:pRiz:appl "LiveType Codec Decompressor" imdc:R10g:Ajav "AJA Kona 10-bit Log RGB Codec" imdc:R10k:Ajav "AJA Kona 10-bit RGB Codec" imdc:r408:AVSI "Aurora r408 Transfer Decompressor" imdc:Shr0:BtJz "Sheer" imdc:Shr1:BtJz "Sheer RGB[A] 8b" imdc:Shr2:BtJz "Sheer Y'CbCr[A] 8bv 4:4:4[:4]" imdc:Shr3:BtJz "Sheer Y'CbCr[A] 8bv 4:2:2[:4]" imdc:Shr4:BtJz "Sheer Y'CbCr 8bw 4:2:2" imdc:Shr5:BtJz "Sheer Y'CbCr[A] 10bv 4:4:4[:4]" imdc:Shr6:BtJz "Sheer Y'CbCr[A] 10bv 4:2:2[:4]" imdc:Shr7:BtJz "Sheer RGB[A] 10b" imdc:v408:AVSI "Aurora v408 Transfer Decompressor"
Audio FOURCCs
- 'raw '
- 'twos'
- 'sowt'
- 'mac3'
- 'mac6'
- 'ima4'
- 'ulaw'
- 'alaw'
- '.mp3'
- 'OggS'
- 'QDMC'
- 'QDM2'
- 'mp4a'
- 'drms'
- 'in24'
- 'in32'
- 'fl32'
- 'fl64'
- 'Qclp'
- 'agsm'
- MAC3:appl "MACE 3:1"
- MAC6:appl "MACE 6:1"
- QDM2:QDes "QDesign Music 2"
- Qclp:QCOM "Qualcomm PureVoice¬"
- alac:appl "Apple Lossless"
- alaw:appl "ALaw 2:1"
- fl32:appl "32-bit Floating Point"
- fl64:appl "64-bit Floating Point"
- ima4:appl "IMA 4:1"
- in24:appl "24-bit Integer"
- in32:appl "32-bit Integer"
- mp4a:appl "MPEG-4 Audio"
- samr:appl "AMR Narrowband"
- sowt:appl "16-bit Little Endian"
- twos:appl "16-bit Big Endian"
- ulaw:appl "uLaw 2:1"
- .mp3:FhG "MPEG Layer-3 Audio"
- MAC3:appl "MACE 3:1"
- MAC6:appl "MACE 6:1"
- QDM2:QDes "QDesign Music 2"
- QDMC:QDes "QDesign Music 1 Decoder"
- Qclp:QCOM "Qualcomm PureVoice¬"
- Qclq:QCOM "Qualcomm QCELP"
- TS..:TELE "Microsoft ADPCM"
- TS..:TELE "Microsoft G.711 aLaw"
- TS..:TELE "Microsoft G.711 uLaw"
- TS..:TELE "Microsoft IMA ADPCM"
- TS.E:TELE "Microsoft G.726"
- TS.U:TELE "Microsoft MPEG Layer-3"
- WMA1:TELE "Windows Media Audio 7"
- WMA2:TELE "Windows Media Audio 9 Standard"
- WMA3:TELE "Windows Media Audio 9 Professional"
- agsm:appl "Apple GSM 10:1"
- alac:appl "Apple Lossless"
- alaw:appl "ALaw 2:1"
- drms:appl "DRM"
- dvca:appl "DV"
- dvi :appl "DVI 4:1"
- fl32:appl "32-bit Floating Point"
- fl64:appl "64-bit Floating Point"
- ima4:appl "IMA 4:1"
- in24:appl "24-bit Integer"
- in32:appl "32-bit Integer"
- lpc :appl "LPC 23:1"
- mp4a:appl "MPEG-4 Audio"
- ms..:appl "MS ADPCM"
- ms..:appl "DVI IMA"
- ms.1:appl "MS-GSM 6.10"
- ms.U:FhG "MPEG Layer-3 Audio"
- samr:appl "AMR Narrowband"
- sowt:appl "16-bit Little Endian"
- twos:appl "16-bit Big Endian"
- ulaw:appl "?Law 2:1"
- vdva:appl "DV"
Microsoft ID FOURCCs
These FOURCCs indicate that the audio information stsd atom also transports a Microsoft-style WAVEFORMATEX header.
- 'm', 's', 0x00, 0x02: Microsoft ADPCM
- 'm', 's', 0x00, 0x11: Microsoft IMA ADPCM
- 'm', 's', 0x00, 0x55: MP3
Technical Description
The Apple Quicktime file format is an extremely well-defined file format. A little too well-defined, in fact. Some would even call it "over-engineered". The official Quicktime documentation is a magnificently detailed beast that gives equal time to explaining all parts of the spec, no matter how important or ignored a particular component may be in the actual implementation. The official spec can be a lot to digest at once and this document is intended to help interested programmers come up to speed on the Quicktime internals much more quickly.
This document emphasizes the components of the Quicktime file format that a programmer would need to know in order to write a general purpose Quicktime file decoder. This document also contains a discussion of decoding strategies.
Note that this document will probably never be complete since there is so much flexibility in the Quicktime format. But it is designed to cover the majority of QT files ever produced.
Byte Ordering
The first important fact to know about Quicktime files when writing a decoder is that all multi-byte numbers are big endian owing to Apple's Motorola heritage.
Atoms: The Fundamental Quicktime Building Blocks
Apple's Quicktime designers were thinking differently when they came up with the notion of an "atom" as "something that can contain other atoms". Atoms are chunks of data in that comprise a Quicktime file. Sometimes they contain data and sometimes they contain other atoms.
An atom consists of a size, a type, and a data payload. An atom is laid out as follows:
bytes 0-3 atom size (including 8-byte size and type preamble) bytes 4-7 atom type bytes 8..n data
The 4 bytes allotted for the atom size field limit the maximum size of an atom to 4 GB. Quicktime also has a provision to allow atoms with 64-bit atom size fields by setting the size field 1 and adding the 8-byte size field after the atom type:
bytes 0-3 always 0x00000001 bytes 4-7 atom type bytes 8-15 atom size (including 16-byte size and type preamble) bytes 16..n data
This is a logical exception since an atom always needs to be at least 8 bytes in length to account for the preamble. Therefore, if the size field is 1, load the 64-bit atom size from just after the atom type field.
If, on the other hand, the size field is 0, then the atom extends to the end of the file.
Known Top-Level Atoms
These are all of the QuickTime atoms that are known to be legal top-level atoms.
- 'ftyp'
- 'pdin'
- 'moov'
- 'moof'
- 'mfra'
- 'free'
- 'skip'
- 'junk'
- 'wide'
- 'pnot'
- 'pict'
- 'meta'
- 'meco'
- 'uuid' : Used by Sony's MSNV brand of MP4
- 'mdat'
QuickTime File Types
Somewhere along the line the 'ftyp' atom was added as a possible top-level QuickTime atom. It is supposed to appear first in a QuickTime file. There are the known ftyp values:
- 'qt '
- 'isom'
- 'mp41'
- 'mp42'
- '3gp1'
- '3gp2'
- '3gp3'
- '3gp4'
- '3gp5'
- '3g2a'
- 'mmp4' : Mobile MPEG-4
- 'M4A '
- 'M4P '
- 'M4V '
- 'mjp2' : Motion JPEG 2000
- 'MSNV' : Sonys private brand; Used for example to encode MP4s for the PSP
- 'FACE'
A more authoritative list can be found at http://ftyps.com/ .
General File Organization
In the abstract atom hierarchy, this is how a Quicktime file is laid out:
moov mvhd trak tkhd edts elst mdia mdhd minf stbl stsd stco co64 stts stss stsc stsz trak trak .. mdat [data] [data] [...]
Note that this is not an exhaustive tree of all possible or known atoms; these are only the atoms that have been empirically determined as "interesting" for the purposes of writing a general-purpose decoder that can handle most Quicktime files.
All Quicktime files need to have a moov atom and a mdat atom at the top level. There are other top level atoms as well (e.g. the 'ftyp' atom), which generally are not interesting and can safely be skipped if encountered. The moov atom contains instructions for playing the data in the file. The mdat atom contains the data that will be played.
Meta Data
A 'meta' atom contains atoms containing human-readable textual data with meta information regarding the file. These atoms are marked with 4 bytes of course but the first byte is a value of 0xA9. The remaining 3 characters can be:
- nam: Name of song or video
- cpy: Copyright information
- des: File description
- cmt: General comment
- alb: Album name
- gen: Custom Genre
- ART: Artist name
- too: Encoder
- wrt: Writer
- day: Content created year
Decompressing Compressed moov Atoms With zlib
The prospect of having to decode compressed moov atoms in Quicktime files seems to give many programmers pause. This need not be the case. When a compressed moov atom is detected, the free, open source zlib compression library can be called upon to do all the hard work.
In the abstract atom hierarchy, a compressed moov atom is laid out like this:
moov cmov dcom cmvd
On disk, a compressed moov atom will look this this:
bytes 0-3: atom size (including 8-byte size and type preamble) bytes 4-7: atom type ('moov', movie header) bytes 8-11: atom size (including 8-byte size and type preamble) bytes 12-15: atom type ('cmov', compressed movie header) bytes 16-19: atom size (this should be 12 bytes) bytes 20-23: atom type ('dcom', decompressor) bytes 24-27: decompression library used (usually 'zlib') bytes 28-31: atom size (including 8-byte size and type preamble) bytes 32-35: atom type ('cmvd', compressed movie header data) bytes 36-39: size of decompressed data bytes 40-n: compressed data
Note that this structure makes it theoretically possible to use other libraries to compress moov atoms, but zlib is most commonly used.
Here is a lazy algorithm for decompressing a compressed moov atom:
- check if bytes 12-15 contain 'cmov'; if yes:
- allocate a buffer for the decompressed moov atom, the size of which is specified by bytes 36-39
- initialize the zlib library, initialize a z_stream structure with pointers to the compressed and decompressed buffers, and all the other necessary variables
- call zlib to decompress the atom
- free the compressed moov atom, process the newly-decompressed moov atom (which will begin with a proper size and 'moov' type)
As an aside, one might wonder about the rationale behind compressing moov atoms. The data inside QT files can reach gargantuan sizes, and the moov atom will be rather tiny in comparison. Why bother saving a few tens of kilobytes on the moov atom? One suggestion I have received is data integrity: Compression with zlib offers CRC validation. If an error occurs in the data stream while transmitting the compressed moov atom, a problem will be detected during decompression.
Audio Handling
Constant Bitrate Audio
Constant Bitrate Audio Width Header
Variable Bitrate Audio
Palette Handling
QuickTime Atom Reference
cmov
cmov | |
Function: | compressed moov atom |
Contained In: | moov |
Can Contain: | cmvd, dcom |
cmvd
cmvd | |
Function: | compressed moov atom data |
Contained In: | cmov |
Can Contain: | leaf atom |
co64
co64 | |
Function: | 64-bit chunk offsets |
Contained In: | stbl |
Can Contain: | leaf atom |
The co64 atom for a track lists the offsets for the various chunks that comprise a media track. These offsets are absolute offsets within the file starting from offset 0. Note that this atom allows for 64-bit offsets which is necessary for QuickTime files exceeding 4 gigabytes. Smaller files can save space in this table by using the stco atom which only allows for 32-bit offsets.
QT Atom Preamble 1 byte version 3 bytes flags 4 bytes total entries in offset table (n) 8 bytes chunk offset 0 8 bytes chunk offset 1 .. .. 8 bytes chunk offset n-1
ctts
ctts | |
Function: | 32 bits difference PTS-DTS |
Contained In: | stbl |
Can Contain: | tbd |
When storing video stream with B-Frames, PTS (Presentation timestamp) may be larger than DTS (Decoder timestamp). It happens because b-frame requires frames following after it do be decoded. Value of this atom is also called Composition Time Offset as, for example, in FLV format.
QT Atom Preamble 1 byte version, 0 3 bytes flags 4 bytes entry count 4 bytes sample count, having following offset 4 bytes offset
dcom
dcom | |
Function: | compressed moov compression method |
Contained In: | cmov |
Can Contain: | leaf atom |
edts
edts | |
Function: | edit samples |
Contained In: | trak |
Can Contain: | elst |
elst
elst | |
Function: | edit list |
Contained In: | edts |
Can Contain: | leaf atom |
The elst atom contains the edit list. The edit list contains information about the times and durations that pieces of a media track are to be presented during playback. There are many Quicktime file decoders that choose to ignore this atom. This is not a good idea. The edit list atom must be taken into account to guarantee proper A/V sync on certain files.
An edit list atom has the following structure:
QuickTime Atom Preamble 1 byte version 3 bytes flags 4 bytes number of edit list entries <edit list entries>
An individual edit list entry is 12 bytes in size and has the following structure:
bytes 0-3 edit duration (in global timescale units) bytes 4-7 edit media time (in trak timescale units) bytes 8-11 playback speed
esds
esds | |
Function: | Elementary Stream Descriptors |
Contained In: | tbd |
Can Contain: | tbd |
fiel
fiel | |
Function: | field ordering http://developer.apple.com/quicktime/icefloe/dispatch019.html#fiel |
Contained In: | tbd |
Can Contain: | none |
free
free | |
Function: | free space |
Contained In: | top level |
Can Contain: | tbd |
ftyp
ftyp | |
Function: | file type |
Contained In: | top level |
Can Contain: | tbd |
gmhd
gmhd | |
Function: | generic media header |
Contained In: | minf |
Can Contain: | leaf atom |
hdlr
hdlr | |
Function: | handler type |
Contained In: | mdia or minf |
Can Contain: | leaf atom |
version flags component_type subtype manufacturer res_flags res_flags_mask name
The component_type can denote this track is 'dhlr' for data or 'mhlr' for media. The subtype is a 4 letter code identifying the specific handler - for example 'vide' for video, 'soun' for sound, 'alis' for a file alias, and more. The hdlr atom under mdia seems more useful than the descendant of minf.
iods
iods | |
Function: | ???? |
Contained In: | tbd |
Can Contain: | tbd |
junk
junk | |
Function: | junk space |
Contained In: | top level |
Can Contain: | tbd |
mdat
mdat | |
Function: | media data |
Contained In: | top level |
Can Contain: | media data |
Media data contains the actual video and audio samples. The only way to interpret the raw samples is with the metadata from moov.
mdhd
mdhd | |
Function: | media header |
Contained In: | mdia |
Can Contain: | leaf atom |
The mdhd atom is the media header atom, containing some parameters for the current stream. There are at least two versions available of the mdhd atom (which can be found in the first byte after the preamble).
An mdhd version 0 atom has the following structure:
QuickTime Atom Preamble 1 byte version 3 bytes flags 4 bytes creation time 4 bytes modification time 4 bytes time scale 4 bytes duration 2 bytes language 2 bytes quality
For version 1 instead a few entries have changed sizes from 4 to 8 bytes:
QuickTime Atom Preamble 1 byte version 3 bytes flags 8 bytes creation time 8 bytes modification time 4 bytes time scale 8 bytes duration 2 bytes language 2 bytes quality
The language value is a three letters ISO 639 language code represented with three 5-bit values (each of which is the ASCII value of the letter minus 0x60).
mdia
mdia | |
Function: | media |
Contained In: | trak |
Can Contain: | mdhd, minf |
minf
minf | |
Function: | media information |
Contained In: | mdia |
Can Contain: | gmhd, smhd, stbl, vmhd |
moov
moov | |
Function: | movie header |
Contained In: | top level |
Can Contain: | mvhd |
mvhd
mvhd | |
Function: | movie header |
Contained In: | moov |
Can Contain: | tbd |
pict
pict | |
Function: | ???? |
Contained In: | top level |
Can Contain: | tbd |
pnot
pnot | |
Function: | image preview |
Contained In: | top level |
Can Contain: | tbd |
This atom contains information about the preview image. This image or poster can be visible in for example a file browser.
rdrf
rdrf | |
Function: | data reference |
Contained In: | rmda |
Can Contain: | none allowed |
This atom defines where the reference movie can be found. The location can be given as an alias or as a URL.
Only one allowed data reference atom per movie reference descriptor atom.
rmcd
rmcd | |
Function: | component detect |
Contained In: | rmda |
Can Contain: | none allowed |
This atom specifies required Quicktime components, such as codecs, needed and can also specify the minimum version of the component.
Multiple component detect atoms are allowed in a reference movie descriptor atom and all minimum versions of all the components must be met.
rmcs
rmcs | |
Function: | CPU speed atom |
Contained In: | rmda |
Can Contain: | none allowed |
This atom describes the minimum amount of computer power required to play this reference movie. The computer power is given as a relative number on an unknown, from the documentation, scale. Examples from Apple's documentation include 100 being equivalent to a 166MHz Pentium and 500 for a 400MHz G4 PowerPC. The Apple documentation estimates that a gigahertz computer with a graphics accelerator might give a power number as high as 1000 and that future computers will allow even higher numbers. It seems that maybe this number isn't very useful since there doesn't seem to be anything to help determine what kind of computing power multi-gigahertz machines, that provide more computing power per Hz than previous generations did provides on this ill-defined scale.
If the CPU speed atom is given then a computer that does not meet the specifications will play the reference movie with the next lower power requirement that it meets.
An application should assume that the reference movie with the highest valued CPU speed will have the highest quality.
Only one allowed CPU speed atom per movie reference descriptor atom.
rmda
rmda | |
Function: | reference movie descriptor atom |
Contained In: | rmra |
Can Contain: | rdrf, rmdr, rmcs, rmvc, rmcd, rmqu |
The reference movie descriptor contains atoms that describe where to find an alternate movie. It also may contain atoms for items such as system requirements and movie quality.
Multiple reference movie descriptors are usually found in a reference movie atom.
rmdr
rmdr | |
Function: | minimum data rate |
Contained In: | rmda |
Can Contain: | none allowed |
The data rate atom indicates the minimum connection speed needed to display this reference movie. The data rate is given in bits per second.
Only one data rate atom is allowed per container reference movie descriptor atom.
rmqu
rmqu | |
Function: | quality atom |
Contained In: | rmda |
Can Contain: | none allowed |
The quality atom acts as a tie breaker. It gives a relative quality number that helps the application decide which reference movie to play if all other quality-defining atoms are equal. A higher value indicates a higher quality.
Only one quality atom is allowed per container reference movie descriptor atom.
rmra
rmra | |
Function: | reference movie |
Contained In: | moov |
Can Contain: | rmda |
The reference movie atom typically contains data about alternate movies. Which alternate movie plays depends upon conditions such as the system requirements listed in the contained atoms. There can be only one reference movie atom per movie.
rmvc
rmvc | |
Function: | version check |
Contained In: | rmda |
Can Contain: | none allowed |
This atom indicates the minimum version of the software package, such as Quicktime or Quicktime VR, that is needed to play the reference movie. The application to match against is given as a Macintosh Gestalt type (e.g. 'qtim').
Multiple version check atom are allowed per reference movie descriptor atom. The application must meet all minimum version check requirements to play the reference movie.
skip
skip | |
Function: | skipped space |
Contained In: | top level |
Can Contain: | tbd |
smhd
smhd | |
Function: | sound media header |
Contained In: | minf |
Can Contain: | leaf atom |
stbl
stbl | |
Function: | sample table |
Contained In: | minf |
Can Contain: | co64, ctts, stco, stsc, stsd, stss, stsz, stts |
stco
stco | |
Function: | sample table chunk offset |
Contained In: | stbl |
Can Contain: | leaf atom |
The stco atom for a track lists the offsets for the various chunks that comprise a media track. These offsets are absolute offsets within the file starting from offset 0. Note that this atom only allows for 32-bit offsets. Files requiring 64-bit offsets must use the co64 atom.
QT Atom Preamble 1 byte version 3 bytes flags 4 bytes total entries in offset table (n) 4 bytes chunk offset 0 4 bytes chunk offset 1 .. .. 4 bytes chunk offset n-1
stsc
stsc | |
Function: | sample table sample to chunk map |
Contained In: | stbl |
Can Contain: | leaf atom |
stsd
stsd | |
Function: | sample table sample description |
Contained In: | stbl |
Can Contain: | leaf atom |
stss
stss | |
Function: | sample table sync samples |
Contained In: | stbl |
Can Contain: | leaf atom |
The stss atom is the sample table sync sample atom. This atom contains a list of all samples in the track that are marked as sync samples. Sync samples are also known as keyframes or intra-coded frames. These samples indicate which video frames can be completely decoded on their own, without any information from other video frames, thus making the frames safe to jump to randomly.
An stss atom has the following structure:
QuickTime Atom Preamble 1 byte version 3 bytes flags 4 bytes number of sync samples (n) 4 bytes sync sample 1 4 bytes sync sample 2 .. .. 4 bytes sync sample n
Each entry in the sync sample table indicates the ID of a sample that is a sync sample. Note that this table begins numbering from 1 rather than 0.
As an example, if the stss atom of a video trak has 4 entries and those entries are 1, 9, 19, and 34, that means that video frames 1, 9, 19, and 34 (or 0, 8, 18, and 33 if your frames are numbered beginning at 0) are sync samples.
If a trak has no stss atom then all of the samples in the track are implicitly sync samples.
stsz
stsz | |
Function: | sample table sizes |
Contained In: | stbl |
Can Contain: | leaf atom |
The stsz atom is the sample table size size atom. This atom contains the sizes of all of the samples in a trak.
QuickTime Atom Preamble 1 byte version 3 bytes flags 4 bytes uniform size of each sample 4 bytes number of sample sizes (n) 4 bytes sample 0 size 4 bytes sample 1 size .. .. 4 bytes sample (n-1) size
The stsz atom can operate in one of two modes. First, it is possible that all of the samples in a trak have the same size. In this case, the uniform size fieldis set to the constant size. The number of sample sizes field is set to the total number samples in the trak, and there is no sample size table following. This mode is commonly used in the stsz atom of audio traks. For example, in an audio file with length of 2 seconds that has a sample rate of 22050 Hz, the uniform size field will be set to 1, indicating that the the size of each sample is 1. The number of sample sizes field will be set to 44100 (22050 samples/sec * 2 sec = 44100 samples).
In the second mode, all of the samples are a different size (logically, this mode would have to be used even if all of the samples were the same size except for one). In this case, the uniform size field is set to 0. The number of sample sizes field contains the number of entries in the sample size table. Each entry in the sample size table contains the size of a sample in the trak.
stts
stts | |
Function: | sample table time to sample map |
Contained In: | stbl |
Can Contain: | leaf atom |
For example stts exists in track type vide or alis. The table is an array of pairs which denote (sample_count, sample_time_delta). So you can calculate the total amount of frames in the video track by summing sample_count. Also, the total duration of the track should be:
duration = (sample_count1 * sample_time_delta1 + ... + sample_countN * sample_time_deltaN ) / timescale
tkhd
tkhd | |
Function: | track header |
Contained In: | trak |
Can Contain: | leaf atom |
trak
trak | |
Function: | track header |
Contained In: | moov |
Can Contain: | tkhd |
uuid
uuid | |
Function: | Used by the PSP MSNV brand of MP4 |
Contained In: | tbd |
Can Contain: | tbd |
vmhd
vmhd | |
Function: | video media header |
Contained In: | minf |
Can Contain: | leaf atom |
wide
wide | |
Function: | skipped data |
Contained In: | top level |
Can Contain: | tbd |
wfex
wfex | |
Function: | wraps a Microsoft WAVEFORMATEX structure |
Contained In: | stsd |
Can Contain: | none? |
References
- Quicktime File Format Specification
- zlib compression/decompression library
- Advanced Video Coding (AVC) file format
- ISO 14496 specifications
- ISO 14496 draft specifications
- List of mp4/mov atoms
- MP4 layout
- Some different Quicktime codec samples.
- QuickTime for DirectShow
- Complete List of all known MP4 / QuickTime 'ftyp' designations
- QT Codec list