Apple QuickTime RLE

From MultimediaWiki
Jump to: navigation, search

This page is based on the document 'Description of the Apple Quicktime Animation (RLE) Format' by Mike Melanson found here: http://multimedia.cx/qtrle.txt.

The Apple Quicktime Animation compression algorithm is a simple run-length encoding compression scheme that can be used in Apple Quicktime files. Source data to be compressed in this format can be either 1, 2, 4, 8, 16, 24, or 32 bpp. The specific bit depth for a particular file is stored in the video sample table sample description (stsd) atom of the Quicktime moov atom as is any palette data for lower bit depths.

General Data Format

All multi-byte values are stored as big-endian numbers.

This is the layout of an Apple Animation chunk:

 4 bytes    chunk size
 2 bytes    header
 [8 bytes]  optional information, depending on header
 n bytes    compressed lines

The first 4 bytes comprise the chunk length. This field also carries some other unknown flags, since at least one of the high bits is sometimes set.

If the overall length of the chunk is less than 8, treat the frame as a NOP, which means that the frame is the same as the one before it.

Next, there is a header of either 0x0000 or 0x0008. A header value with bit 3 set (header & 0x0008) indicates that information follows revealing at which line the decode process is to begin:

 2 bytes    starting line at which to begin updating frame
 2 bytes    unknown
 2 bytes    the number of lines to update
 2 bytes    unknown

If the header is 0x0000, then the decode begins from the first line and continues through the entire height of the image.

After the header comes the individual RLE-compressed lines. An individual compressed line is comprised of a skip code, followed by a series of RLE codes and pixel data:

 1 byte     skip code
 1 byte     RLE code
 n bytes    pixel data
 1 byte     RLE code
 n bytes    pixel data
 ..
 ..

Each line begins with a byte that defines the number of pixels to skip in a particular line in the output line before outputting new pixel data. Actually, the skip count is set to one more than the number of pixels to skip. For example, a skip byte of 15 means "skip 14 pixels", while a skip byte of 1 means "don't skip any pixels". If the skip byte is 0, then the frame decode is finished. Therefore, the maximum skip byte value of 255 allows for a maximum of 254 pixels to be skipped.

After the skip byte is the first RLE code, which is a single signed byte. The RLE code can have the following meanings:

  • equal to 0: There is another single-byte skip code in the stream. Again, the actual number of pixels to skip is 1 less than the skip code.
  • equal to -1: End of the RLE-compressed line
  • greater than 0: Run of pixel data is copied directly from the encoded stream to the output frame.
  • less than -1: Repeat pixel data -(RLE code) times.

Exactly what happens during a run operation (code > 0) or a repeat operation (code < -1) depends on the color depth of the data. The specific operation of each of the 7 color depths is described next.

1-Bit RLE Data

The details for this variant are known and implemented in FFmpeg, but not yet documented.

2-Bit RLE Data

The details for this variant are known and implemented in FFmpeg, but not yet documented.

4-Bit RLE Data

Pixels are shuffled in groups of 2 or 8. Each pixel is a palette index (the palette is determined by the Quicktime file transporting the data). If (code > 0), copy (4 * code) pixel pairs from the encoded stream to the output. The precise algorithm is:

 count = code * 4
 while (count--)
   get next byte from encoded stream
   output upper 4 bits of byte as next pixel
   output lower 4 bits of byte as next pixel

Thus, if code = 5, extract 20 bytes from the encoded stream and render 40 pixels to the output frame.

If (code < -1), extract the next 8 pixels from the encoded stream and render the entire group -(code) times to the output frame. The pixels, numbered 0..7, are packed as:

 00001111 22223333 44445555 66667777

8-Bit RLE Data

Pixels are shuffled in groups of 4. Each pixel is a palette index (the palette is determined by the Quicktime file transporting the data). If (code > 0), copy (4 * code) pixels from the encoded stream to the output.

If (code < -1), extract the next 4 pixels from the encoded stream and render the entire group -(code) times to the output frame.

16-Bit RLE Data

Each pixel is represented by a 16-bit RGB value with 5 bits used for each of the red, green, and blue color components and 1 unused bit to round the value out to 16 bits:

 xrrrrrgg gggbbbbb

Pixel data is rendered to the output frame one pixel at a time. If (code > 0), copy the run of (code) pixels from the encoded stream to the output.

If (code < -1), unpack the next 16-bit RGB value from the encoded stream and render it to the output frame -(code) times.


24-Bit RLE Data

Each pixel is represented by a 24-bit RGB value with 8 bits (1 byte) used for each of the red, green, and blue color components:

 rrrrrrrr gggggggg bbbbbbbb

Pixel data is rendered to the output frame one pixel at a time. If (code > 0), copy the run of (code) pixels from the encoded stream to the output.

If (code < -1), unpack the next 24-bit RGB value from the encoded stream and render it to the output frame -(code) times.

32-Bit RLE Data

Each pixel is represented by a 32-bit ARGB value with 8 bits (1 byte) used for each of the alpha (?), red, green, and blue color components:

 aaaaaaaa rrrrrrrr gggggggg bbbbbbbb

Pixel data is rendered to the output frame one pixel at a time. If (code > 0), copy the run of (code) pixels from the encoded stream to the output.

If (code < -1), unpack the next 32-bit ARGB value from the encoded stream and render it to the output frame -(code) times.

References