QuickTime container

From MultimediaWiki
Jump to navigation Jump to search
  • Extensions: mov, qt, mp4, m4v, m4a, m4p, m4b
  • Company: Apple


The following sections list FOURCCs known to appear in Apple QuickTime files. Note that sometimes the FOURCC is only 3 characters and there is a space (ASCII 0x20) to round out the full 4 characters.



Microsoft ID FOURCCs

These FOURCCs indicate that the audio information stsd atom also transports a Microsoft-style WAVEFORMATEX header.

Technical Description

The Apple Quicktime file format is an extremely well-defined file format. A little too well-defined, in fact. Some would even call it "over-engineered". The official Quicktime documentation is a magnificently detailed beast that gives equal time to explaining all parts of the spec, no matter how important or ignored a particular component may be in the actual implementation. The official spec can be a lot to digest at once and this document is intended to help interested programmers come up to speed on the Quicktime internals much more quickly.

This document emphasizes the components of the Quicktime file format that a programmer would need to know in order to write a general purpose Quicktime file decoder. This document also contains a discussion of decoding strategies.

Note that this document will probably never be complete since there is so much flexibility in the Quicktime format. But it is designed to cover the majority of QT files ever produced.

Byte Ordering

The first important fact to know about Quicktime files when writing a decoder is that all multi-byte numbers are big endian owing to Apple's Motorola heritage.

Atoms: The Fundamental Quicktime Building Blocks

Apple's Quicktime designers were thinking differently when they came up with the notion of an "atom" as "something that can contain other atoms". Atoms are chunks of data in that comprise a Quicktime file. Sometimes they contain data and sometimes they contain other atoms.

An atom consists of a size, a type, and a data payload. An atom is laid out as follows:

bytes 0-3    atom size (including 8-byte size and type preamble)
bytes 4-7    atom type
bytes 8..n   data

The 4 bytes allotted for the atom size field limit the maximum size of an atom to 4 GB. Quicktime also has a provision to allow atoms with 64-bit atom size fields by setting the size field 1 and adding the 8-byte size field after the atom type:

bytes 0-3    always 0x00000001
bytes 4-7    atom type
bytes 8-15   atom size (including 16-byte size and type preamble)
bytes 16..n  data

This is a logical exception since an atom always needs to be at least 8 bytes in length to account for the preamble. Therefore, if the size field is 1, load the 64-bit atom size from just after the atom type field.

Known Top-Level Atoms

These are all of the QuickTime atoms that are known to be legal top-level atoms.

  • 'moov'
  • 'mdat'
  • 'free'
  • 'junk'
  • 'pnot'
  • 'skip'
  • 'wide'
  • 'pict'
  • 'ftyp'

QuickTime File Types

Somewhere along the line the 'ftyp' atom was added as a possible top-level QuickTime atom. It is supposed to appear first in a QuickTime file. There are the known ftyp values:

  • 'qt '
  • 'isom'
  • 'mp41'
  • 'mp42'
  • '3gp1'
  • '3gp2'
  • '3gp3'
  • '3gp4'
  • '3gp5'
  • '3g2a'
  • 'mmp4': Mobile MPEG-4
  • 'M4A '
  • 'M4P '
  • 'mjp2': Motion JPEG 2000

General File Organization

In the abstract atom hierarchy, this is how a Quicktime file is laid out:


Note that this is not an exhaustive tree of all possible or known atoms; these are only the atoms that have been empirically determined as "interesting" for the purposes of writing a general-purpose decoder that can handle most Quicktime files.

All Quicktime files need to have a moov atom and a mdat atom at the top level. There are other top level atoms as well, which generally are not interesting and can safely be skipped if encountered. The moov atom contains instructions for playing the data in the file. The mdat atom contains the data that will be played.

Meta Data

A 'meta' atom contains atoms containing human-readable textual data with meta information regarding the file. These atoms are marked with 4 bytes of course but the first byte is a value of 0xA9. The remaining 3 characters can be:

  • nam: Name of song or video
  • cpy: Copyright information
  • des: File description
  • cmt: General comment
  • alb: Album name
  • gen: ? Name of generating program?
  • ART: Artist name
  • too: ?
  • wrt: ?
  • day: ? Modification date?

Decompressing Compressed moov Atoms With zlib

The prospect of having to decode compressed moov atoms in Quicktime files seems to give many programmers pause. This need not be the case. When a compressed moov atom is detected, the free, open source zlib compression library can be called upon to do all the hard work.

In the abstract atom hierarchy, a compressed moov atom is laid out like this:


On disk, a compressed moov atom will look this this:

bytes 0-3:   atom size (including 8-byte size and type preamble)
bytes 4-7:   atom type ('moov', movie header)
bytes 8-11:  atom size (including 8-byte size and type preamble)
bytes 12-15: atom type ('cmov', compressed movie header)
bytes 16-19: atom size (this should be 12 bytes)
bytes 20-23: atom type ('dcom', decompressor)
bytes 24-27: decompression library used (usually 'zlib')
bytes 28-31: atom size (including 8-byte size and type preamble)
bytes 32-35: atom type ('cmvd', compressed movie header data)
bytes 36-39: size of decompressed data
bytes 40-n:  compressed data

Note that this structure makes it theoretically possible to use other libraries to compress moov atoms, but zlib is most commonly used.

Here is a lazy algorithm for decompressing a compressed moov atom:

  1. check if bytes 12-15 contain 'cmov'; if yes:
  2. allocate a buffer for the decompressed moov atom, the size of which is specified by bytes 36-39
  3. initialize the zlib library, initialize a z_stream structure with pointers to the compressed and decompressed buffers, and all the other necessary variables
  4. call zlib to decompress the atom
  5. free the compressed moov atom, process the newly-decompressed moov atom (which will begin with a proper size and 'moov' type)

As an aside, one might wonder about the rationale behind compressing moov atoms. The data inside QT files can reach gargantuan sizes, and the moov atom will be rather tiny in comparison. Why bother saving a few tens of kilobytes on the moov atom? One suggestion I have received is data integrity: Compression with zlib offers CRC validation. If an error occurs in the data stream while transmitting the compressed moov atom, a problem will be detected during decompression.

Audio Handling

Constant Bitrate Audio

Constant Bitrate Audio Width Header

Variable Bitrate Audio

Palette Handling

QuickTime Atom Reference


Function: compressed moov atom
Contained In: moov
Can Contain: cmvd, dcom


Function: compressed moov atom data
Contained In: cmov
Can Contain: leaf atom


Function: 64-bit chunk offsets
Contained In: stbl
Can Contain: leaf atom

The co64 atom for a track lists the offsets for the various chunks that comprise a media track. These offsets are absolute offsets within the file starting from offset 0. Note that this atom allows for 64-bit offsets which is necessary for QuickTime files exceeding 4 gigabytes. Smaller files can save space in this table by using the stco atom which only allows for 32-bit offsets.

 QT Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   total entries in offset table (n)
 8 bytes   chunk offset 0
 8 bytes   chunk offset 1
 8 bytes   chunk offset n-1


Function: ????
Contained In: stbl
Can Contain: tbd


Function: compressed moov compression method
Contained In: cmov
Can Contain: leaf atom


Function: edit samples
Contained In: trak
Can Contain: elst


Function: edit list
Contained In: edts
Can Contain: leaf atom

The elst atom contains the edit list. The edit list contains information about the times and durations that pieces of a media track are to be presented during playback. There are many Quicktime file decoders that choose to ignore this atom. This is not a good idea. The edit list atom must be taken into account to guarantee proper A/V sync on certain files.

An edit list atom has the following structure:

 QuickTime Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   number of edit list entries
  <edit list entries>

An individual edit list entry is 12 bytes in size and has the following structure:

 bytes 0-3   edit duration (in global timescale units)
 bytes 4-7   edit media time (in trak timescale units)
 bytes 8-11  playback speed


Function: ????
Contained In: tbd
Can Contain: tbd


Function: free space
Contained In: top level
Can Contain: tbd


Function: file type
Contained In: top level
Can Contain: tbd


Function: generic media header
Contained In: minf
Can Contain: leaf atom


Function: ????
Contained In: tbd
Can Contain: tbd


Function: junk space
Contained In: top level
Can Contain: tbd


Function: media data
Contained In: top level
Can Contain: media data


Function: media header
Contained In: mdia
Can Contain: leaf atom


Function: media
Contained In: trak
Can Contain: mdhd, minf


Function: media information
Contained In: mdia
Can Contain: gmhd, smhd, stbl, vmhd


Function: movie header
Contained In: top level
Can Contain: mvhd


Function: movie header
Contained In: moov
Can Contain: tbd


Function: ????
Contained In: top level
Can Contain: tbd


Function: ????
Contained In: top level
Can Contain: tbd


Function: ????
Contained In: rmda
Can Contain: leaf atom


Function: reference movie descriptor atom
Contained In: rmra
Can Contain: rdrf, rmdr, rmvc


Function: ????
Contained In: rmda
Can Contain: leaf atom


Function: reference movie
Contained In: moov
Can Contain: rmda

The reference movie atom typically contains data about alternate movies. Which alternate movie plays depends upon conditions such as the system requirements listed in the contained atoms. There can be only one reference movie atom per movie.


Function: ????
Contained In: rmda
Can Contain: leaf atom


Function: skipped space
Contained In: top level
Can Contain: tbd


Function: sound media header
Contained In: minf
Can Contain: leaf atom


Function: sample table
Contained In: minf
Can Contain: co64, ctts, stco, stsc, stsd, stss, stsz, stts


Function: sample table chunk offset
Contained In: stbl
Can Contain: leaf atom

The stco atom for a track lists the offsets for the various chunks that comprise a media track. These offsets are absolute offsets within the file starting from offset 0. Note that this atom only allows for 32-bit offsets. Files requiring 64-bit offsets must use the co64 atom.

 QT Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   total entries in offset table (n)
 4 bytes   chunk offset 0
 4 bytes   chunk offset 1
 4 bytes   chunk offset n-1


Function: sample table sample to chunk map
Contained In: stbl
Can Contain: leaf atom


Function: sample table sample description
Contained In: stbl
Can Contain: leaf atom


Function: sample table sync samples
Contained In: stbl
Can Contain: leaf atom

The stss atom is the sample table sync sample atom. This atom contains a list of all samples in the track that are marked as sync samples. Sync samples are also known as keyframes or intra-coded frames. These samples indicate which video frames can be completely decoded on their own, without any information from other video frames, thus making the frames safe to jump to randomly.

An stss atom has the following structure:

 QuickTime Atom Preamble
 1 byte    version
 3 bytes   flags
 4 bytes   number of sync samples (n)
 4 bytes   sync sample 1
 4 bytes   sync sample 2
 4 bytes   sync sample n

Each entry in the sync sample table indicates the ID of a sample that is a sync sample. Note that this table begins numbering from 1 rather than 0.

As an example, if the stss atom of a video trak has 4 entries and those entries are 1, 9, 19, and 34, that means that video frames 1, 9, 19, and 34 (or 0, 8, 18, and 33 if your frames are numbered beginning at 0) are sync samples.

If a trak has no stss atom then all of the samples in the track are implicitly sync samples.


Function: sample table sizes
Contained In: stbl
Can Contain: leaf atom

The stsz atom is the sample table size size atom. This atom contains the sizes of all of the samples in a trak.

 QuickTime Atom Preamble
 1 byte     version
 3 bytes    flags
 4 bytes    uniform size of each sample
 4 bytes    number of sample sizes (n)
 4 bytes    sample 0 size
 4 bytes    sample 1 size
 4 bytes    sample (n-1) size

The stsz atom can operate in one of two modes. First, it is possible that all of the samples in a trak have the same size. In this case, the uniform size fieldis set to the constant size. The number of sample sizes field is set to the total number samples in the trak, and there is no sample size table following. This mode is commonly used in the stsz atom of audio traks. For example, in an audio file with length of 2 seconds that has a sample rate of 22050 Hz, the uniform size field will be set to 1, indicating that the the size of each sample is 1. The number of sample sizes field will be set to 44100 (22050 samples/sec * 2 sec = 44100 samples).

In the second mode, all of the samples are a different size (logically, this mode would have to be used even if all of the samples were the same size except for one). In this case, the uniform size field is set to 0. The number of sample sizes field contains the number of entries in the sample size table. Each entry in the sample size table contains the size of a sample in the trak.


Function: sample table time to sample map
Contained In: stbl
Can Contain: leaf atom


Function: track header
Contained In: trak
Can Contain: leaf atom


Function: track header
Contained In: moov
Can Contain: tkhd


Function: video media header
Contained In: minf
Can Contain: leaf atom


Function: skipped data
Contained In: top level
Can Contain: tbd
