Duck TrueMotion 1

From MultimediaWiki
(Redirected from TM10)
Jump to navigation Jump to search


Duck TrueMotion 1 (TM1) is the first video codec developed by The Duck Corporation. It uses differential pulse code modulation and interframe differencing. The primary application for which Duck TrueMotion 1 was developed is gaming software, typically on PC or Sega Saturn titles.

Underlying Concepts

TM1 operates on bidirectional delta quantization. The mathematical premise of delta coding is simple addition. For example, take the following sequence of numbers:

 82 84 81 80 86 88 85

All of the numbers are reasonably large (on a scale of 0..255 in this example). However, they are quite similar to each other. Using delta coding, only the differences between successive numbers are stored:

 82 +2 -3 -1 +6 +2 -3

The first number is still large but since the remaining numbers are clustered close to each other, the deltas are relatively small. Thus, the deltas require less information to encode.

TM1 takes this concept and extends it to 2 dimensions. A particular pixel is assumed to have a pixel directly above it and a pixel directly to the left (if neither of these pixels exist in the frame, e.g., for the top-left corner pixel, 0 is used in place of the non-existant pixels). The current pixel is decoded as the sum of the up and left pixels, plus a delta from the encoded video bitstream.

 encoded video bitstream: ...D5 D6 D7 D8...
 
 decoded video frame:
   A B C D
   E F G H
   

In this example, the encoded video bitstream is sitting at delta D5 when it is time to decode pixel E. A is the pixel above E. There is no pixel to the left of E, so 0 is used as the left value. Thus:

 E = A + 0 + D5

In the case of pixel F, both the up and left pixel values (called the vertical and horizontal predictors, respectively) are available:

 F = B + E + D6

That is the general idea behind decoding TM1 data. The actual decoding algorithm is more involved. The TM1 bytestream actually contains a series of indices that point into tables with the delta values to be applied to the vertical and horizontal predictor pixels. These tables only specify small deltas to be added to pixel predictors. When a larger delta is needed because the delta between 2 numbers is too large, then a special bytestream code indicates that the next delta is to be multiplied by 5 before it is applied.

TM1 operates on 4x4 macroblocks of pixels. Each block in a frame can be broken into 4 2x2 blocks, 2 halves (either 4x2 or 2x4), or encoded as a 4x4 block. The block type is encoded in the frame header.

While the TM1 algorithm operates on RGB colorspace data at the input and output level, it borrows some ideas from YUV colorspaces. For more information of RGB and YUV colorspaces, see the References section.

TM1 uses a modified colorspace that embodies luminance (Y) and chrominance (C) information encoded as RGB deltas. Since Y information is more important to the human eye than C information, the Y data must be updated more frequently than the C data (i.e., more Y predictors than C predictors are applied to the image). For a every pixel within a given block in a macroblock, a Y predictor must be applied. However, only one C predictor is applied for each block in the macroblock.

TrueMotion v1 Frame Format and Header

All multi-byte numbers are stored in little endian format.

An encoded intraframe (keyframe) of TM1 data is laid out as:

 header  |  predictor indices

An encoded interframe is laid out as:

 header  |  block change bits  |  predictor indices

The difference between the 2 types of frames is that an interframe has a section of bits that specify which blocks in the frame are unchanged from the previous frame.

A TM1 header is quasi-encrypted with a logical XOR operation. This is probably done to provide some obfuscation of the header and thwart casual inspection of the data format.

An encoded TM1 frame begins with the one byte that indicates the length of the decrypted header, only with a dummy high bit and rotated left by 5. To obtain the actual length from byte B:

 L = ((B >> 5) | (B << 3)) & 0x7F

Then, decrypt the header by starting with byte 1 in the encoded frame (indexing from 0) and XORing each byte with its successive byte. Assuming the header is of length L bytes as computed above, and that the encoded header starts at buffer[1] (buffer[0] had the rotated length), the decode process is:

 for (i = 1; i < L; i++)
   decoded_header[i - 1] = buffer[i] ^ buffer[i + 1];

The decoded header data structure is laid out as follows (depending on the version and header type not all of the subsequent fields may be present though):

 byte 0       compression method
 byte 1       delta set
 byte 2       vector set
 bytes 3-4    frame height
 bytes 5-6    frame width
 bytes 7-8    checksum
 byte 9       version
 byte 10      header type
 byte 11      flags
 byte 12      control
 bytes 13-14  x offset
 bytes 15-16  y offset
 bytes 17-18  width
 bytes 19-20  height
 

The compression method field indicates the type of compression used to encode this frame. There are 2 general types: 16-bit and 24-bit. Further, each has 4 block sizes to select from. The valid compression types are

 0, 9, 11, 13, 15: NOP frames; frame is unchanged from previous frame
 1:  16-bit 4x4 (V)
 2:  16-bit 4x4 (H)
 3:  16-bit 4x2 (V)
 4:  16-bit 4x2 (H)
 5:  16-bit 2x4 (V)
 6:  16-bit 2x4 (H)
 7:  16-bit 2x2 (V)
 8:  16-bit 2x2 (H)
 10: 24-bit 4x4 (H)
 12: 24-bit 4x2 (H)
 14: 24-bit 2x4 (H)
 16: 24-bit 2x2 (H)
>16: invalid compression type (or TrueMotion RT)

The (H) and (V) designations come from the original Duck source code. It is unclear what they mean, except for the common horizontal and vertical designations common in video terminology.

The delta set and vector set fields are used to generate the set of predictor tables that will be used to decode this frame.

The height and width fields should be the same as those specified in the AVI file that contains the data.

The checksum field appears to contain the frame's sequence number modulo 512. The first frame is 0x0000 and the next frame is 0x0001. Frame #511 has a checksum of 0x01FF while frame #512 wraps around to 0x0000.

If the version field is less than 2, then the frame is intracoded (this may indicate that early versions of the coding method was purely intracoded). If the version field is greater than or equal to 2, then if the header type field is 2 or 3, the flags field has bit 4 set (flags & 0x10) to indicate an intraframe; else if the header type field is greater than 3, then the header is invalid; else the frame is intracoded.

The control field contains the information for which platform the video file was mastered (e.g. PC or SEGA console).

The x & y offset, width, and height fields pertain to a sprite mode.

Overall frame coding

Frame is split into 4x4 pixel blocks (or 2x4 pixel blocks for 24-bit mode; each pixel should be repeated twice during the reconstruction phase). Depending on the coding mode, each block may contain 1-4 chroma deltas (for each sub-block) and luma deltas for each pixel. Data is still coded per-line though. Delta values obtained during compression are substituted with the values from the corresponding delta table (with possible escapes, more about them below), grouped into pairs and sent to the Tunstall code compressor with fixed codebook (i.e. the coding method that replaces a sequence of input codes with one fixed-width output value, TM1 codebooks replace sequences of 2-8 delta pairs with a single byte).

In case the delta value is too large, it is coded as a sum of of small delta and escape delta value. In this case Tunstall coder flushes output sequence, sends zero byte to signal escape and codes delta indices for the escape values. In 16-bit mode those values are the ordinary delta values multiplied by five, in case of 24-bit coding they come from the so-called fat tables instead.

16-bit Data

Deltas in 16-bit mode are applied to two pixels at once, so horizontal predictor contains two pixels as well. Luma delta pair codes deltas that should be applied to all components in the pixel, chroma delta codes deltas for red and blue components of the pixel pair (i.e. first delta codes red offset for both pixels, second delta codes blue offset for both pixels).

Sprite mode

Sprite mode augments 16-bit intra-only coding mode with 4x4 sub-blocks with some additional modes signalled by two bits per block. First bit tells whether the block is coded like the ordinary TM1 data, second bit signals that the block is either coded with the transparency information (transmitted as a delta pair right after luma delta pair) or that the block is completely transparent.

A working implementation of sprite support can be found in NihAV: https://git.nihav.org/?p=nihav.git;a=blob;f=nihav-duck/src/codecs/truemotion1.rs;hb=HEAD

24-bit Data

In 24-bit mode delta pairs code components of a single pixel. Because of the nature of the compression, deltas should be applied as a single 32-bit word to the ((R << 16) | (G << 8) | B) packed pixel value. Luma delta pair should be unpacked as ((lo << 16) | (lo << 8) | hi), chroma delta pair should be unpacked as ((lo << 16) | hi).

Duck TrueMotion v1 Tables

FFmpeg implementation

Games Using Duck TrueMotion 1

These software titles are known to use the Duck TrueMotion 1 video codec to encode full motion video: