From MultimediaWiki
Jump to: navigation, search

A video encoding used by MSN Messenger for webcam conversations.

Open source codec library: libmimic; Note that this website is does not exist as of April 6, 2006. However, the Farsight project incorporates the libmimic source.

FFmpeg has a native decoder for Mimic since r12491.


The Mimic codec operates in a native YUV 4:2:0 colorspace. The codec employs both intraframes and interframes. Each of the 3 planes, Y, U, and V, is encoded separately, in the YVU order. Each plane is broken up into a series of 8x8 blocks. In an intraframe each block, progressing from left -> right, bottom -> top, is transformed using a discrete cosine transform (DCT), quantized, and re-ordered in a zigzag pattern. Finally the transformed non-zero coefficients and the runs of zeros between them are encoded into a bitstream using variable length codes (VLCs). Interframes encode a bit for each block to indicate that the block is unchanged from the block at the same position from the previous frame, or that the block is completely recoded using the same algorithm as each block in the intraframe. Luma interframes encode another bit for each block to indicate that the block is unchanged from any of the previous 15 frames. Another 4 bits follow to indicate which frame the back reference refers to.

Note that this process bears some similarity to JPEG coding. Notably absent are macroblocks as well as delta coding of DC coefficients.

Data Format

Each frame begins with a 20-byte header. All multi-byte numbers in the frame header are in little endian format:

 bytes 0-1    unknown
 bytes 2-3    quality setting
 bytes 4-5    frame width
 bytes 6-7    frame height
 bytes 8-11   unknown
 bytes 12-15  frame type
   0 = intraframe
   non-zero = interframe
 byte 16      number of coefficients coded in each block in the frame
 bytes 17-19  unknown

The encoded frame begins at byte 20 (counting from 0). To decode an intraframe, iterate through each plane, Y, V, and U. For each plane, iterate through all the 8x8 blocks from left -> right, bottom -> top.

For each block:

  • decode n coefficients from the VLC bitstream, where n is obtained from the frame header
  • dequantize the coefficients
  • de-zigzag the coefficients
  • transform the coefficients using an inverse DCT
  • saturate the transformed samples to an unsigned byte range (0..255)

The process for decoding an interframe is similar as for an intraframe. Iterate through the planes and the blocks in the same manner. For each block follow this pseudo-code:

 read 1 bit
 if bit == 1 for luma plane or bit == 0 for chroma planes
     copy block from previous frame
     if luma plane
         read 1 bit
         if bit == 1
             read 4 bits
             copy block from backreference read in bits
     if chroma plane or no backreference
         decode block as in intraframe

Bitstream Packing

The Mimic bitstream is packed into 32-bit integers which are then stored in memory and transferred over the network wire in little endian format. To begin reading a packed Mimic bitstream, read the first 32-bit number from memory in little endian format. Read the bits from right -> left within the integer. When those 32 bits are exhausted, the next 4 bytes are read from memory in little endian byte order and the process is repeated.

As an alternative reading method, byteswap each 32-bit number in the entire input bytestream and use a standard left -> right bitstream reader.

Decoding Coefficients

Each 8x8 block is coded in the bitstream as a DC coefficient and some number (up to 63) AC coefficients. Begin the decode process by clearing all coefficients to 0. Then proceed to decode n coefficients, according to the number set in the frame header. If there are 15 coefficients coded, that translates to 1 DC coefficient and 14 AC coefficients.

The DC coefficient is always stored as the next 8 bits in the bitstream.

For each of the remaining AC coefficients, decode a VLC from the bitstream as the number of zero coefficients to skip in the transform block. Then, decode another VLC as the quantized AC coefficient.

TODO: import VLC tables into separate page


This is the zigzag table used in the Mimic coding method:

 unsigned char zigzag[64] = {
    0,  8,  1,  2,  9, 16, 24, 17,
   10,  3,  4, 11, 18, 25, 32, 40,
   33, 26, 19, 12,  5,  6, 13, 20,
   27, 34, 41, 48, 56, 49, 42, 35,
   28, 21, 14,  7, 15, 22, 29, 36,
   43, 50, 57, 58, 51, 44, 37, 30,
   23, 31, 38, 45, 52, 59, 39, 46,
   53, 60, 61, 54, 47, 55, 62, 63

To de-zigzag decoded coefficient n from the bitstream into a 64-element transform matrix:

 transform_matrix[zigzag[n]] = decoded_coefficient[n]


Using the quality setting decoded from a Mimic frame's header, compute the block's dequantization factor as:

 qscale = (10000 - quality_setting) / 1001

If the block being dequantized belongs to a chrominance plane then saturate the dequantization factor between 2.0..10.0. If the block belongs to the luminance/Y plane, saturate the dequantization factor between 1.0..10.0.

To dequantize the matrix of 64 coefficients, multiply the DC coefficient (element 0) by 2 and multiply the AC coefficients at indices 1 and 8 by 4. Multiply the remainder of the AC coefficients by the computed quantization factor.

Inverse Discrete Cosine Transform

The IDCT is compatible with JPEG's. It is just different by a factor of 4. By multiplying the input data by 4 and passing the block to JPEG's IDCT, you get the same output as libmimic's code.

Post Processing

The open source libmimic package contains an impressive amount of post processing code as well.