Mimic: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
(Committed in r12491)
(Mimic uses YVU order)
Line 8: Line 8:


== Overview ==
== Overview ==
The Mimic codec operates in a native [[YUV 4:2:0]] colorspace. The codec employs both intraframes and interframes. Each of the 3 planes, Y, U, and V, is encoded separately. Each plane is broken up into a series of 8x8 blocks. In an intraframe each block, progressing from left -> right, top -> bottom, is transformed using a discrete cosine transform (DCT), quantized, and re-ordered in a zigzag pattern. Finally the transformed non-zero coefficients and the runs of zeros between them are encoded into a bitstream using variable length codes (VLCs). Interframes encode a bit for each block to indicate that the block is unchanged from the block at the same position from the previous frame, or that the block is completely recoded using the same algorithm as each block in the intraframe.
The Mimic codec operates in a native [[YUV 4:2:0]] colorspace. The codec employs both intraframes and interframes. Each of the 3 planes, Y, U, and V, is encoded separately, in the YVU order. Each plane is broken up into a series of 8x8 blocks. In an intraframe each block, progressing from left -> right, top -> bottom, is transformed using a discrete cosine transform (DCT), quantized, and re-ordered in a zigzag pattern. Finally the transformed non-zero coefficients and the runs of zeros between them are encoded into a bitstream using variable length codes (VLCs). Interframes encode a bit for each block to indicate that the block is unchanged from the block at the same position from the previous frame, or that the block is completely recoded using the same algorithm as each block in the intraframe.


Note that this process bears some similarity to [[JPEG]] coding. Notably absent are macroblocks as well as delta coding of DC coefficients.
Note that this process bears some similarity to [[JPEG]] coding. Notably absent are macroblocks as well as delta coding of DC coefficients.
Line 26: Line 26:
   bytes 17-19  unknown
   bytes 17-19  unknown


The encoded frame begins at byte 20 (counting from 0). To decode an intraframe, iterate through each plane, Y, U, and V. For each plane, iterate through all the 8x8 blocks from left -> right, top -> bottom.
The encoded frame begins at byte 20 (counting from 0). To decode an intraframe, iterate through each plane, Y, V, and U. For each plane, iterate through all the 8x8 blocks from left -> right, top -> bottom.


For each block:
For each block:

Revision as of 18:50, 20 March 2008

A video encoding used by MSN Messenger for webcam conversations.

Open source codec library: libmimic; Note that this website is does not exist as of April 6, 2006. However, the Farsight project incorporates the libmimic source.

Overview

The Mimic codec operates in a native YUV 4:2:0 colorspace. The codec employs both intraframes and interframes. Each of the 3 planes, Y, U, and V, is encoded separately, in the YVU order. Each plane is broken up into a series of 8x8 blocks. In an intraframe each block, progressing from left -> right, top -> bottom, is transformed using a discrete cosine transform (DCT), quantized, and re-ordered in a zigzag pattern. Finally the transformed non-zero coefficients and the runs of zeros between them are encoded into a bitstream using variable length codes (VLCs). Interframes encode a bit for each block to indicate that the block is unchanged from the block at the same position from the previous frame, or that the block is completely recoded using the same algorithm as each block in the intraframe.

Note that this process bears some similarity to JPEG coding. Notably absent are macroblocks as well as delta coding of DC coefficients.

Data Format

Each frame begins with a 20-byte header. All multi-byte numbers in the frame header are in little endian format:

 bytes 0-1    unknown
 bytes 2-3    quality setting
 bytes 4-5    frame width
 bytes 6-7    frame height
 bytes 8-11   unknown
 bytes 12-15  frame type
   0 = intraframe
   non-zero = interframe
 byte 16      number of coefficients coded in each block in the frame
 bytes 17-19  unknown

The encoded frame begins at byte 20 (counting from 0). To decode an intraframe, iterate through each plane, Y, V, and U. For each plane, iterate through all the 8x8 blocks from left -> right, top -> bottom.

For each block:

  • decode n coefficients from the VLC bitstream, where n is obtained from the frame header
  • dequantize the coefficients
  • de-zigzag the coefficients
  • transform the coefficients using an inverse DCT
  • saturate the transformed samples to an unsigned byte range (0..255)

The process for decoding an interframe is similar as for an intraframe. Iterate through the planes and the blocks in the same manner. For each block:

  • read 1 bit from the bitstream
  • if the bit is 1 then the block is unchanged from the block at the same position in the previous frame
  • if the bit is 0 then the block is to be decoded using the same process as in an intraframe

Bitstream Packing

The Mimic bitstream is packed into 32-bit integers which are then stored in memory and transferred over the network wire in little endian format. To begin reading a packed Mimic bitstream, read the first 32-bit number from memory in little endian format. Read the bits from right -> left within the integer. When those 32 bits are exhausted, the next 4 bytes are read from memory in little endian byte order and the process is repeated.

Decoding Coefficients

Each 8x8 block is coded in the bitstream as a DC coefficient and some number (up to 63) AC coefficients. Begin the decode process by clearing all coefficients to 0. Then proceed to decode n coefficients, according to the number set in the frame header. If there are 15 coefficients coded, that translates to 1 DC coefficient and 14 AC coefficients.

The DC coefficient is always stored as the next 8 bits in the bitstream.

For each of the remaining AC coefficients, decode a VLC from the bitstream as the number of zero coefficients to skip in the transform block. Then, decode another VLC as the quantized AC coefficient.

TODO: import VLC tables into separate page

De-zigzag

This is the zigzag table used in the Mimic coding method:

 unsigned char zigzag[64] = {
    0,  8,  1,  2,  9, 16, 24, 17,
   10,  3,  4, 11, 18, 25, 32, 40,
   33, 26, 19, 12,  5,  6, 13, 20,
   27, 34, 41, 48, 56, 49, 42, 35,
   28, 21, 14,  7, 15, 22, 29, 36,
   43, 50, 57, 58, 51, 44, 37, 30,
   23, 31, 38, 45, 52, 59, 39, 46,
   53, 60, 61, 54, 47, 55, 62, 63
 };

To de-zigzag decoded coefficient n from the bitstream into a 64-element transform matrix:

 transform_matrix[zigzag[n]] = decoded_coefficient[n]

Dequantization

Using the quality setting decoded from a Mimic frame's header, compute the block's dequantization factor as:

 f = (10000 - quality_setting) * 10 * 9.9999997e-5

If the block being dequantized belongs to a chrominance plane then saturate the dequantization factor between 2.0..10.0. If the block belongs to the luminance/Y plane, saturate the dequantization factor between 1.0..10.0.

To dequantize the matrix of 64 coefficients, multiply the DC coefficient (element 0) by 2 and multiply the AC coefficients at indices 1 and 8 by 4. Multiply the remainder of the AC coefficients by the computed quantization factor.

Inverse Discrete Cosine Transform

To transform a 8x8 matrix, declare 8 temporary integers, v1..v8. For each row (1..8) p in the matrix:

  • va = (p[0] << 11) + (p[4] << 11);
  • vb = ((p[2] << 2) * 392) + (((p[2] << 2) + (p[6] << 2)) * 277);
  • v1 = va + vb + 512;
  • v2 = va - vb + 512;
  • va = (p[0] << 11) - (p[4] << 11);
  • vb = (((p[2] << 2) + (p[6] << 2)) * 277) - ((p[6] << 2) * 946);
  • v3 = va + vb + 512;
  • v4 = va - vb + 512;
  • va = (p[1] << 9) + (p[3] * 724) + (p[7] << 9);
  • vb = (p[1] << 9) + (p[5] * 724) - (p[7] << 9);
  • v5 = (((va + vb) * 213) - (vb * 71)) >> 6;
  • v6 = (((va + vb) * 213) - (va * 355)) >> 6;
  • va = (p[1] << 9) - (p[3] * 724) + (p[7] << 9);
  • vb = (p[1] << 9) - (p[5] * 724) - (p[7] << 9);
  • v7 = (((va + vb) * 251) - (va * 201)) >> 6;
  • v8 = (((va + vb) * 251) - (vb * 301)) >> 6;
  • p[0] = (v1 + v5) >> 10;
  • p[1] = (v3 + v7) >> 10;
  • p[2] = (v4 + v8) >> 10;
  • p[3] = (v2 + v6) >> 10;
  • p[4] = (v2 - v6) >> 10;
  • p[5] = (v4 - v8) >> 10;
  • p[6] = (v3 - v7) >> 10;
  • p[7] = (v1 - v5) >> 10;

Next is the column transform. Treat p[] as the entire 64-element array. For each column (1..8):

  • va = (p[0] << 9) + (p[32] << 9);
  • vb = ((p[16] + p[48]) * 277) + (p[16] * 392);
  • v1 = va + vb + 1024;
  • v2 = va - vb + 1024;
  • va = (p[0] << 9) - (p[32] << 9);
  • vb = ((p[16] + p[48]) * 277) - (p[48] * 946);
  • v3 = va + vb + 1024;
  • v4 = va - vb + 1024;
  • va = ((p[8] << 7) + (p[24] * 181) + (p[56] << 7)) >> 6;
  • vb = ((p[8] << 7) + (p[40] * 181) - (p[56] << 7)) >> 6;
  • v5 = ((va + vb) * 213) - (vb * 71);
  • v6 = ((va + vb) * 213) - (va * 355);
  • va = ((p[8] << 7) - (p[24] * 181) + (p[56] << 7)) >> 6;
  • vb = ((p[8] << 7) - (p[40] * 181) - (p[56] << 7)) >> 6;
  • v7 = ((va + vb) * 251) - (va * 201);
  • v8 = ((va + vb) * 251) - (vb * 301);
  • p[0] = (v1 + v5) >> 11;
  • p[8] = (v3 + v7) >> 11;
  • p[16] = (v4 + v8) >> 11;
  • p[24] = (v2 + v6) >> 11;
  • p[32] = (v2 - v6) >> 11;
  • p[40] = (v4 - v8) >> 11;
  • p[48] = (v3 - v7) >> 11;
  • p[56] = (v1 - v5) >> 11;

Post Processing

The open source libmimic package contains an impressive amount of post processing code as well. It is not known if any of these functions are actually employed.