Duck TrueMotion 1

From MultimediaWiki
Jump to navigation Jump to search

Duck TrueMotion 1 (TM1) is the first video codec developed by The Duck Corporation. It uses differential pulse code modulation and interframe differencing. The primary application for which Duck TrueMotion 1 was developed is gaming software, typical on PC or Sega Saturn titles.

Underlying Concepts

TM1 operates on bidirectional delta quantization. The mathematical premise of delta coding is simple addition. For example, take the following sequence of numbers:

 82 84 81 80 86 88 85

All of the numbers are reasonably large (on a scale of 0..255 in this example). However, they are quite similar to each other. Using delta coding, only the differences between successive numbers are stored:

 82 +2 -3 -1 +6 +2 -3

The first number is still large but since the remaining numbers are clustered close to each other, the deltas are relatively small. Thus, the deltas require less information to encode.

TM1 takes this concept and extends it to 2 dimensions. A particular pixel is assumed to have a pixel directly above it and a pixel directly to the left (if neither of these pixels exist in the frame, e.g., for the top-left corner pixel, 0 is used in place of the non-existant pixels). The current pixel is decoded as the sum of the up and left pixels, plus a delta from the encoded video bitstream.

 encoded video bitstream: ...D5 D6 D7 D8...
 
 decoded video frame:
   A B C D
   E F G H
   

In this example, the encoded video bitstream is sitting at delta D5 when it is time to decode pixel E. A is the pixel above E. There is no pixel to the left of E, so 0 is used as the left value. Thus:

 E = A + 0 + D5

In the case of pixel F, both the up and left pixel values (called the vertical and horizontal predictors, respectively) are available:

 F = B + E + D6

That is the general idea behind decoding TM1 data. The actual decoding algorithm is more involved. The TM1 bytestream actually contains a series of indices that point into tables with the delta values to be applied to the vertical and horizontal predictor pixels. These tables only specify small deltas to be added to pixel predictors. When a larger delta is needed because the delta between 2 numbers is too large, then a special bytestream code indicates that the next delta is to be multiplied by 5 before it is applied.

TM1 operates on 4x4 macroblocks of pixels. Each block in a frame can be broken into 4 2x2 blocks, 2 halves (either 4x2 or 2x4), or encoded as a 4x4 block. The block type is encoded in the frame header.

While the TM1 algorithm operates on RGB colorspace data at the input and output level, it borrows some ideas from YUV colorspaces. For more information of RGB and YUV colorspaces, see the References section.

TM1 uses a modified colorspace that embodies luminance (Y) and chrominance (C) information encoded as RGB deltas. Since Y information is more important to the human eye than C information, the Y data must be updated more frequently than the C data (i.e., more Y predictors than C predictors are applied to the image). For a every pixel within a given block in a macroblock, a Y predictor must be applied. However, only one C predictor is applied for each block in the macroblock.

TrueMotion v1 Frame Format and Header

All multi-byte numbers are stored in little endian format.

An encoded intraframe (keyframe) of TM1 data is laid out as:

 header  |  predictor indices

An encoded interframe is laid out as:

 header  |  block change bits  |  predictor indices

The difference between the 2 types of frames is that an interframe has a section of bits that specify which blocks in the frame are unchanged from the previous frame.

A TM1 header is quasi-encrypted with a logical XOR operation. This is probably done to provide some obfuscation of the header and thwart casual inspection of the data format.

An encoded TM1 frame begins with the one byte that indicates the length of the decrypted header, only with a dummy high bit and rotated left by 5. To obtain the actual length from byte B:

 L = ((B >> 5) | (B << 3)) & 0x7F

Then, decrypt the header by starting with byte 1 in the encoded frame (indexing from 0) and XORing each byte with its successive byte. Assuming the header is of length L bytes as computed above, and that the encoded header starts at buffer[1] (buffer[0] had the rotated length), the decode process is:

 for (i = 1; i < L; i++)
   decoded_header[i - 1] = buffer[i] ^ buffer[i + 1];

The decoded header data structure is laid out as follows:

 byte 0       compression method
 byte 1       delta set
 byte 2       vector set
 bytes 3-4    frame height
 bytes 5-6    frame width
 bytes 7-8    checksum
 byte 9       version
 byte 10      header type
 byte 11      flags
 byte 12      control
 bytes 13-14  x offset
 bytes 15-16  y offset
 bytes 17-18  width
 bytes 19-20  height
 

The compression method field indicates the type of compression used to encode this frame. There are 2 general types: 16-bit and 24-bit. Further, each has 4 block sizes to select from. The valid compression types are

 0, 9, 11, 13, 15: NOP frames; frame is unchanged from previous frame
 1:  16-bit 4x4 (V)
 2:  16-bit 4x4 (H)
 3:  16-bit 4x2 (V)
 4:  16-bit 4x2 (H)
 5:  16-bit 2x4 (V)
 6:  16-bit 2x4 (H)
 7:  16-bit 2x2 (V)
 8:  16-bit 2x2 (H)
 10: 24-bit 4x4 (H)
 12: 24-bit 4x2 (H)
 14: 24-bit 2x4 (H)
 16: 24-bit 2x2 (H)
>16: invalid compression type

The (H) and (V) designations come from the original Duck source code. It is unclear what they mean, except for the common horizontal and vertical designations common in video terminology.

The delta set and vector set fields are used to generate the set of predictor tables that will be used to decode this frame.

The height and width fields should be the same as those specified in the AVI file that contains the data.

The checksum field appears to contain the frame's sequence number modulo 512. The first frame is 0x0000 and the next frame is 0x0001. Frame #511 has a checksum of 0x01FF while frame #512 wraps around to 0x0000.

If the version field is less than 2, then the frame is intracoded (this may indicate that early versions of the coding method was purely intracoded). If the version field is greater than or equal to 2, then if the header type field is 2 or 3, the flags field has bit 4 set (flags & 0x10) to indicate an intraframe; else if the header type field is greater than 3, then the header is invalid; else the frame is intracoded.

The purpose of the control field is unclear.

The x & y offset, width, and height fields apparently pertain to a sprite mode that is not covered in this document.

16-bit Data

Decoding a 16-bit frame requires 2 tables. The tables can change from one frame to the next and must be rebuilt if the header specified a new table. One table contains the C predictors while the other contains the Y predictors.

Each table consists of 1024 32-bit entries. Each group of 4 entries corresponds to a byte index from 0..255. Each entry contains a double pixel predictor shifted left by one. If the very last bit (bit 0) in the 32-bit entry is 0, then there is another predictor for that index. If the last bit is 1, then this predictor is the last one for this list.

For example...

 A B C D ...
 E F G H ...
 I J K L ...
 M N O P ...

(UNFINISHED)

24-bit Data

The process of decoding a 24-bit frame is similar to that of decoding a 16-bit frame. However, the frame is decoded into 2 separate planes, a Y plane and a C plane, and recombined into a final RGB map at render time.

(UNFINISHED)

Duck TrueMotion v1 Tables

http://svn.mplayerhq.hu/ffmpeg/trunk/libavcodec/truemotion1data.h?view=markup&rev=4679

Games Using Duck TrueMotion 1

These software titles are known to use the Duck TrueMotion 1 video codec to encode full motion video: