# Difference between revisions of "Mimic"

(MSN Messanger webcam codec) |
(Proper interframe description (in pseudo-code)) |
||

(18 intermediate revisions by 4 users not shown) | |||

Line 1: | Line 1: | ||

* FourCC: ML20 | * FourCC: ML20 | ||

− | * Company: Logitech/Microsoft | + | * Company: [[Logitech]]/[[Microsoft]] |

+ | * Samples: http://samples.mplayerhq.hu/V-codecs/ML20/ | ||

A video encoding used by MSN Messenger for webcam conversations. | A video encoding used by MSN Messenger for webcam conversations. | ||

− | Open source codec library: [http://www.jblinux.net/libmimic/ libmimic] | + | Open source codec library: [http://www.jblinux.net/libmimic/ libmimic]; Note that this website is does not exist as of April 6, 2006. However, the [http://farsight.sourceforge.net Farsight] project incorporates the libmimic source. |

+ | |||

+ | FFmpeg has a native decoder for Mimic since r12491. | ||

+ | |||

+ | == Overview == | ||

+ | The Mimic codec operates in a native [[YUV 4:2:0]] colorspace. The codec employs both intraframes and interframes. Each of the 3 planes, Y, U, and V, is encoded separately, in the YVU order. Each plane is broken up into a series of 8x8 blocks. In an intraframe each block, progressing from left -> right, bottom -> top, is transformed using a discrete cosine transform (DCT), quantized, and re-ordered in a zigzag pattern. Finally the transformed non-zero coefficients and the runs of zeros between them are encoded into a bitstream using variable length codes (VLCs). Interframes encode a bit for each block to indicate that the block is unchanged from the block at the same position from the previous frame, or that the block is completely recoded using the same algorithm as each block in the intraframe. Luma interframes encode another bit for each block to indicate that the block is unchanged from any of the previous 15 frames. Another 4 bits follow to indicate which frame the back reference refers to. | ||

+ | |||

+ | Note that this process bears some similarity to [[JPEG]] coding. Notably absent are macroblocks as well as delta coding of DC coefficients. | ||

+ | |||

+ | == Data Format == | ||

+ | Each frame begins with a 20-byte header. All multi-byte numbers in the frame header are in little endian format: | ||

+ | |||

+ | bytes 0-1 unknown | ||

+ | bytes 2-3 quality setting | ||

+ | bytes 4-5 frame width | ||

+ | bytes 6-7 frame height | ||

+ | bytes 8-11 unknown | ||

+ | bytes 12-15 frame type | ||

+ | 0 = intraframe | ||

+ | non-zero = interframe | ||

+ | byte 16 number of coefficients coded in each block in the frame | ||

+ | bytes 17-19 unknown | ||

+ | |||

+ | The encoded frame begins at byte 20 (counting from 0). To decode an intraframe, iterate through each plane, Y, V, and U. For each plane, iterate through all the 8x8 blocks from left -> right, bottom -> top. | ||

+ | |||

+ | For each block: | ||

+ | * decode ''n'' coefficients from the VLC bitstream, where ''n'' is obtained from the frame header | ||

+ | * dequantize the coefficients | ||

+ | * de-zigzag the coefficients | ||

+ | * transform the coefficients using an inverse DCT | ||

+ | * saturate the transformed samples to an unsigned byte range (0..255) | ||

+ | |||

+ | The process for decoding an interframe is similar as for an intraframe. Iterate through the planes and the blocks in the same manner. For each block follow this pseudo-code: | ||

+ | read 1 bit | ||

+ | if bit == 1 for luma plane or bit == 0 for chroma planes | ||

+ | copy block from previous frame | ||

+ | else | ||

+ | if luma plane | ||

+ | read 1 bit | ||

+ | if bit == 1 | ||

+ | read 4 bits | ||

+ | copy block from backreference read in bits | ||

+ | endif | ||

+ | endif | ||

+ | if chroma plane or no backreference | ||

+ | decode block as in intraframe | ||

+ | endif | ||

+ | |||

+ | == Bitstream Packing == | ||

+ | The Mimic bitstream is packed into 32-bit integers which are then stored in memory and transferred over the network wire in little endian format. To begin reading a packed Mimic bitstream, read the first 32-bit number from memory in little endian format. Read the bits from right -> left within the integer. When those 32 bits are exhausted, the next 4 bytes are read from memory in little endian byte order and the process is repeated. | ||

+ | |||

+ | As an alternative reading method, byteswap each 32-bit number in the entire input bytestream and use a standard left -> right bitstream reader. | ||

+ | |||

+ | == Decoding Coefficients == | ||

+ | Each 8x8 block is coded in the bitstream as a DC coefficient and some number (up to 63) AC coefficients. Begin the decode process by clearing all coefficients to 0. Then proceed to decode ''n'' coefficients, according to the number set in the frame header. If there are 15 coefficients coded, that translates to 1 DC coefficient and 14 AC | ||

+ | coefficients. | ||

+ | |||

+ | The DC coefficient is always stored as the next 8 bits in the bitstream. | ||

+ | |||

+ | For each of the remaining AC coefficients, decode a VLC from the bitstream as the number of zero coefficients to skip in the transform block. Then, decode another VLC as the quantized AC coefficient. | ||

+ | |||

+ | '''TODO: import VLC tables into separate page''' | ||

+ | |||

+ | == De-zigzag == | ||

+ | This is the zigzag table used in the Mimic coding method: | ||

+ | unsigned char zigzag[64] = { | ||

+ | 0, 8, 1, 2, 9, 16, 24, 17, | ||

+ | 10, 3, 4, 11, 18, 25, 32, 40, | ||

+ | 33, 26, 19, 12, 5, 6, 13, 20, | ||

+ | 27, 34, 41, 48, 56, 49, 42, 35, | ||

+ | 28, 21, 14, 7, 15, 22, 29, 36, | ||

+ | 43, 50, 57, 58, 51, 44, 37, 30, | ||

+ | 23, 31, 38, 45, 52, 59, 39, 46, | ||

+ | 53, 60, 61, 54, 47, 55, 62, 63 | ||

+ | }; | ||

+ | To de-zigzag decoded coefficient ''n'' from the bitstream into a 64-element transform matrix: | ||

+ | |||

+ | transform_matrix[zigzag[n]] = decoded_coefficient[n] | ||

+ | |||

+ | == Dequantization == | ||

+ | Using the quality setting decoded from a Mimic frame's header, compute the block's dequantization factor as: | ||

+ | |||

+ | qscale = (10000 - quality_setting) / 1001 | ||

+ | |||

+ | If the block being dequantized belongs to a chrominance plane then saturate the dequantization factor between 2.0..10.0. If the block belongs to the luminance/Y plane, saturate the dequantization factor between 1.0..10.0. | ||

+ | |||

+ | To dequantize the matrix of 64 coefficients, multiply the DC coefficient (element 0) by 2 and multiply the AC coefficients at indices 1 and 8 by 4. Multiply the remainder of the AC coefficients by the computed quantization factor. | ||

+ | |||

+ | == Inverse Discrete Cosine Transform == | ||

+ | The IDCT is compatible with JPEG's. It is just different by a factor of 4. By multiplying the input data by 4 and passing the block to JPEG's IDCT, you get the same output as libmimic's code. | ||

+ | |||

+ | == Post Processing == | ||

+ | The open source libmimic package contains an impressive amount of post processing code as well. | ||

[[Category: Video Codecs]] | [[Category: Video Codecs]] |

## Latest revision as of 15:37, 8 April 2008

- FourCC: ML20
- Company: Logitech/Microsoft
- Samples: http://samples.mplayerhq.hu/V-codecs/ML20/

A video encoding used by MSN Messenger for webcam conversations.

Open source codec library: libmimic; Note that this website is does not exist as of April 6, 2006. However, the Farsight project incorporates the libmimic source.

FFmpeg has a native decoder for Mimic since r12491.

## Contents

## Overview

The Mimic codec operates in a native YUV 4:2:0 colorspace. The codec employs both intraframes and interframes. Each of the 3 planes, Y, U, and V, is encoded separately, in the YVU order. Each plane is broken up into a series of 8x8 blocks. In an intraframe each block, progressing from left -> right, bottom -> top, is transformed using a discrete cosine transform (DCT), quantized, and re-ordered in a zigzag pattern. Finally the transformed non-zero coefficients and the runs of zeros between them are encoded into a bitstream using variable length codes (VLCs). Interframes encode a bit for each block to indicate that the block is unchanged from the block at the same position from the previous frame, or that the block is completely recoded using the same algorithm as each block in the intraframe. Luma interframes encode another bit for each block to indicate that the block is unchanged from any of the previous 15 frames. Another 4 bits follow to indicate which frame the back reference refers to.

Note that this process bears some similarity to JPEG coding. Notably absent are macroblocks as well as delta coding of DC coefficients.

## Data Format

Each frame begins with a 20-byte header. All multi-byte numbers in the frame header are in little endian format:

bytes 0-1 unknown bytes 2-3 quality setting bytes 4-5 frame width bytes 6-7 frame height bytes 8-11 unknown bytes 12-15 frame type 0 = intraframe non-zero = interframe byte 16 number of coefficients coded in each block in the frame bytes 17-19 unknown

The encoded frame begins at byte 20 (counting from 0). To decode an intraframe, iterate through each plane, Y, V, and U. For each plane, iterate through all the 8x8 blocks from left -> right, bottom -> top.

For each block:

- decode
*n*coefficients from the VLC bitstream, where*n*is obtained from the frame header - dequantize the coefficients
- de-zigzag the coefficients
- transform the coefficients using an inverse DCT
- saturate the transformed samples to an unsigned byte range (0..255)

The process for decoding an interframe is similar as for an intraframe. Iterate through the planes and the blocks in the same manner. For each block follow this pseudo-code:

read 1 bit if bit == 1 for luma plane or bit == 0 for chroma planes copy block from previous frame else if luma plane read 1 bit if bit == 1 read 4 bits copy block from backreference read in bits endif endif if chroma plane or no backreference decode block as in intraframe endif

## Bitstream Packing

The Mimic bitstream is packed into 32-bit integers which are then stored in memory and transferred over the network wire in little endian format. To begin reading a packed Mimic bitstream, read the first 32-bit number from memory in little endian format. Read the bits from right -> left within the integer. When those 32 bits are exhausted, the next 4 bytes are read from memory in little endian byte order and the process is repeated.

As an alternative reading method, byteswap each 32-bit number in the entire input bytestream and use a standard left -> right bitstream reader.

## Decoding Coefficients

Each 8x8 block is coded in the bitstream as a DC coefficient and some number (up to 63) AC coefficients. Begin the decode process by clearing all coefficients to 0. Then proceed to decode *n* coefficients, according to the number set in the frame header. If there are 15 coefficients coded, that translates to 1 DC coefficient and 14 AC
coefficients.

The DC coefficient is always stored as the next 8 bits in the bitstream.

For each of the remaining AC coefficients, decode a VLC from the bitstream as the number of zero coefficients to skip in the transform block. Then, decode another VLC as the quantized AC coefficient.

**TODO: import VLC tables into separate page**

## De-zigzag

This is the zigzag table used in the Mimic coding method:

unsigned char zigzag[64] = { 0, 8, 1, 2, 9, 16, 24, 17, 10, 3, 4, 11, 18, 25, 32, 40, 33, 26, 19, 12, 5, 6, 13, 20, 27, 34, 41, 48, 56, 49, 42, 35, 28, 21, 14, 7, 15, 22, 29, 36, 43, 50, 57, 58, 51, 44, 37, 30, 23, 31, 38, 45, 52, 59, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63 };

To de-zigzag decoded coefficient *n* from the bitstream into a 64-element transform matrix:

transform_matrix[zigzag[n]] = decoded_coefficient[n]

## Dequantization

Using the quality setting decoded from a Mimic frame's header, compute the block's dequantization factor as:

qscale = (10000 - quality_setting) / 1001

If the block being dequantized belongs to a chrominance plane then saturate the dequantization factor between 2.0..10.0. If the block belongs to the luminance/Y plane, saturate the dequantization factor between 1.0..10.0.

To dequantize the matrix of 64 coefficients, multiply the DC coefficient (element 0) by 2 and multiply the AC coefficients at indices 1 and 8 by 4. Multiply the remainder of the AC coefficients by the computed quantization factor.

## Inverse Discrete Cosine Transform

The IDCT is compatible with JPEG's. It is just different by a factor of 4. By multiplying the input data by 4 and passing the block to JPEG's IDCT, you get the same output as libmimic's code.

## Post Processing

The open source libmimic package contains an impressive amount of post processing code as well.