Video XL

From MultimediaWiki
Jump to navigation Jump to search

This page is based on the document 'Simple YUV Coding Formats' by Mike Melanson found at http://multimedia.cx/simple-yuv.txt.

This is a video codec used in hardware products by Miro Video and Pinnacle.

The Miro Video XL codec uses differential coding on a reduced-precision YUV 4:1:1 colorspace image. Each Y, U, or V component is only 7 bits (where 8 is more typical). Each group of 32 bits in the bitstream represents 6 5-bit delta table indices (with 2 unused bits). There is one index for each of the next 4 Y samples on the line and one index for each of the color samples.

The Pinnacle Video XL codec is apparently the same algorithm as the Miro codec except that the frames are 8 bytes longer. However, the same decoding process applies.

Data Format

For each block of 4 pixels on a line, fetch the next 32 bits as a little endian number and then swap the 16 bit words to achieve the correct bit orientation for decoding. To illustrate more clearly, this is the arrangement of the next 4 8-bit bytes (A, B, C, and D) on disk:

 aaaaaaaa bbbbbbbb cccccccc dddddddd

Load the 4 bytes into a program variable so that the bytes are in this order:

 dddddddd cccccccc bbbbbbbb aaaaaaaa

Then, swap the upper and lower 16-bit words to achieve this order:

31                                 0
 bbbbbbbb aaaaaaaa dddddddd cccccccc

Further, the 32-bit blocks are stored in reverse order. So, for example, if an image is 16 pixels wide, it would have 4 pixel groups per line. Each pixel group would be represented by a 32-bit doubleword, swapped and mangled as described previously. The doublewords would be stored in the bytestream as:

 D3 D2 D1 D0
 

D0 represents the first 4 pixels on the line and D3 represents the final 4 pixels on the line. Thus, a decoder must jump forward in the bytestream and work backwards through the bytestream while decoding in the forward direction on a particular line, then jump forward again in the bytestream when decoding the next line.

The 32 bits of the doubleword represent the following values:

 bit 31:      unused
 bits 30-26:  V delta index
 bits 25-21:  U delta index
 bits 20-16:  Y3 delta index
 bit 15:      unused
 bits 14-10:  Y2 delta index
 bits 9-5:    Y1 delta index
 bits 4-0:    Y0 delta index

Each delta index value is used to index into this table and the referenced value is added to the previous element on the same plane, either Y, U, or V:

const int xl_delta_table[32] = {
   0,   1,   2,   3,   4,   5,   6,   7,
   8,   9,  12,  15,  20,  25,  34,  46,
  64,  82,  94, 103, 108, 113, 116, 119,
 120, 121, 122, 123, 124, 125, 126, 127
};

Remember that the YUV components only have 7 bits of precision. Thus, the second half of the table values all count as negative values.

At the beginning of a line, the Y0, U, and V delta indices actually represent the top 5 bits of the absolute 7-bit component value.

The final, concise decoding algorithm operates as follows:

 foreach line in image
   foreach 32-bit doubleword, working from right -> left in bytestream
     load doubleword as little-endian number, swap 16-bit words
     if this is the first pixel group in line
       next Y value = (Y0 delta index) << 2
       next U value = (U delta index) << 2
       next V value = (V delta index) << 2
     else
       next Y value = last Y value + xl_delta_table[Y0 delta index]
       next U value = last U value + xl_delta_table[U delta index]
       next V value = last V value + xl_delta_table[V delta index]
     next Y value = last Y value + xl_delta_table[Y1 delta index]
     next Y value = last Y value + xl_delta_table[Y2 delta index]
     next Y value = last Y value + xl_delta_table[Y3 delta index]

Since the components only have 7 bits of meaningful precision, it will likely be necessary to shift each of the components left once more to achieve 8 bits of output precision.