QuickTime container

From MultimediaWiki
Revision as of 21:50, 13 January 2006 by Multimedia Mike (talk | contribs)
Jump to navigation Jump to search
  • Extensions: mov, qt, mp4, m4v, m4a, m4p
  • Company: Apple

Technical Description

Introduction

The Apple Quicktime file format is an extremely well-defined file format. A little too well-defined, in fact. Some would even call it "over-engineered". The official Quicktime documentation is a magnificently detailed beast that gives equal time to explaining all parts of the spec, no matter how important or ignored a particular component may be in the actual implementation. The official spec can be a lot to digest at once and this document is intended to help interested programmers come up to speed on the Quicktime internals much more quickly.

This document emphasizes the components of the Quicktime file format that a programmer would need to know in order to write a general purpose Quicktime file decoder. This document also contains a discussion of decoding strategies.

Note that this document will probably never be complete since there is so much flexibility in the Quicktime format. But it is designed to cover the majority of QT files ever produced.

Byte Ordering

The first important fact to know about Quicktime files when writing a decoder is that all multi-byte numbers are big endian owing to Apple's Motorola heritage.

Atoms: The Fundamental Quicktime Building Blocks

Apple's Quicktime designers were thinking differently when they came up with the notion of an "atom" as "something that can contain other atoms". Atoms are chunks of data in that comprise a Quicktime file. Sometimes they contain data and sometimes they contain other atoms.

An atom consists of a size, a type, and a data payload. An atom is laid out as follows:

bytes 0-3    atom size (including 8-byte size and type preamble)
bytes 4-7    atom type
bytes 8..n   data

The 4 bytes allotted for the atom size field limit the maximum size of an atom to 4 GB. Quicktime also has a provision to allow atoms with 64-bit atom size fields by setting the size field 1 and adding the 8-byte size field after the atom type:

bytes 0-3    always 0x00000001
bytes 4-7    atom type
bytes 8-15   atom size (including 16-byte size and type preamble)
bytes 16..n  data

This is a logical exception since an atom always needs to be at least 8 bytes in length to account for the preamble. Therefore, if the size field is 1, load the 64-bit atom size from just after the atom type field.

Decompressing Compressed moov Atoms With zlib

The prospect of having to decode compressed moov atoms in Quicktime files seems to give many programmers pause. This need not be the case. When a compressed moov atom is detected, the free, open source zlib compression library can be called upon to do all the hard work.

In the abstract atom hierarchy, a compressed moov atom is laid out like this:

moov
  cmov
    dcom
    cmvd

On disk, a compressed moov atom will look this this:

bytes 0-3:   atom size (including 8-byte size and type preamble)
bytes 4-7:   atom type ('moov', movie header)
bytes 8-11:  atom size (including 8-byte size and type preamble)
bytes 12-15: atom type ('cmov', compressed movie header)
bytes 16-19: atom size (this should be 12 bytes)
bytes 20-23: atom type ('dcom', decompressor)
bytes 24-27: decompression library used (usually 'zlib')
bytes 28-31: atom size (including 8-byte size and type preamble)
bytes 32-35: atom type ('cmvd', compressed movie header data)
bytes 36-39: size of decompressed data
bytes 40-n:  compressed data

Note that this structure makes it theoretically possible to use other libraries to compress moov atoms, but zlib is most commonly used.

Here is a lazy algorithm for decompressing a compressed moov atom:

  1. check if bytes 12-15 contain 'cmov'; if yes:
  2. allocate a buffer for the decompressed moov atom, the size of which is specified by bytes 36-39
  3. initialize the zlib library, initialize a z_stream structure with pointers to the compressed and decompressed buffers, and all the other necessary variables
  4. call zlib to decompress the atom
  5. free the compressed moov atom, process the newly-decompressed moov atom (which will begin with a proper size and 'moov' type)

As an aside, one might wonder about the rationale behind compressing moov atoms. The data inside QT files can reach gargantuan sizes, and the moov atom will be rather tiny in comparison. Why bother saving a few tens of kilobytes on the moov atom? One suggestion I have received is data integrity: Compression with zlib offers CRC validation. If an error occurs in the data stream while transmitting the compressed moov atom, a problem will be detected during decompression.

References

Quicktime File Format Specification: [1]