Indeo 3

FourCC: IV31, IV32
Company: Intel, Ligos
Technical Description: http://www.csse.monash.edu.au/~timf/videocodec/indeo3.txt (mirrored)
Samples: http://samples.mplayerhq.hu/V-codecs/IV32/
Partial specification: http://multimedia.cx/mirror/5386232.pdf

Introduction

Indeo Video 3 is a proprietary video compression algorithm developed by Intel. It is primarily used to encode video in AVI files. The format was seen commonly since 1992 after the release of Windows 3.1 and was superseded by Indeo Video Interactive version 4 and 5 (see Indeo 4 and Indeo 5).

There are several versions of this codec (R3.1, R3.2) but the version R3.2 (FOURCC 'IV32') is the most common. There is a support for this codec for all major platforms (Windows, Macintosh, Linux).

This document focuses on principles necessary to implement an Indeo Video 3 decoder.

Decoder specification

Coding techniques

Indeo Video 3 employs color subsampling, differential coding, vector quantization, run-lenght coding and motion compensation to achieve its compression.

Supported picture dimensions

Indeo3 has the following size restrictions:

width:  16...640 pixels
height: 16...480 pixels

All picture dimensions must be a multiple of 4.

Internal pixel format

Indeo Video 3 operates completely in the YUV color space. It uses a special downsampled case of the YVU 4:1:0 format where pixel are represented using only 7 bits. The downsampling is done in the encoder by applying the right shifting on pixel values after color conversion and subsampling. Thus, both encoder and decoder operate internally on pixel values in the range of 0-127. This was done in order to permit the software-based SIMD processing (see software vector operations).

Additionaly the upsampling is needed in the decoder. After each plane was decoded, the pixel values shall be shifted one bit left to convert them back to 8-bit values.

Software vector operations

Indeo3 is built upon the so-called "softSIMD" technique, which allows an emulation of the vector operations purely in software. As this codec was developed there was no processors equipped with a SIMD instruction set, therefore the "softSIMD" can significally boost performance.

In a processor with explicit SIMD (single instruction, multiply data) support, one operation is executed on two or more data sets to produce multiple outputs. The basic idea of the "softSIMD" is to treat the processor registers as containing multiple data words; for example, a 32-bit register can be treated as containing four 8-bit data words.

The examples below illustrate how the "softSIMD" is implemented in Indeo3:

Example 1: Adding two delta values to two pixels at once
 
src_pixels = 0x20202020; // predicted pixels
delta      = 0x0000FE01; // low-order bits contain two delta values: -2 and +1
dst_pixels = src_pixels + delta; // dst_pixels contains 0x20201E21 now

Example 2: Adding four delta values to four pixels at once
 
src_pixels = 0x20202020; // predicted pixels
delta      = 0xFE030201; // delta values: -2, +3, +2 and +1
dst_pixels = src_pixels + delta; // dst_pixels contains 0x1E232221 now

Example 3: Averaging four pixels at once
 
dst_pixels = ((src_pixels1 + src_pixels2) >> 1) & 0x7F7F7F7F;

Internal picture representation

Each picture (frame) is divided into following pieces: planes, strips, cells and blocks.

Each frame consists of three planes: Y, V and U.

Each plane can be segmented into one or more vertical strips depends on picture width. A strip is a region of the image being encoded as self-contained section. A strip has fixed width (160 pixels for luminance and 40 pixels for chrominance). Its height is always the same as the height of the picture. Large pictures shall be encoded in several strips. This makes the encoder/decoder simplier and faster. The following example shows a typical picture segmentation (sizes are given for the luminance plane; for chrominance ones those must be divided by 4):

if the width of the image <= 160:	there is only one strip which width is <= 160
if the width of the image > 160:	there are two strips: 160 + (pic_width % 160)
if the width of the image > 320:	there are three strips: 160 + 160 + (pic_width % 320)
if the width of the image > 480:	there are four strips: 160 + 160 + 160 + (pic_width % 480)

The "%" in the examples above denotes the "MOD" operation.

Each strip is encoded with either intra-frame or inter-frame prediction. Inter-frame prediction is obtained by applying motion estimation.

Each strip will be further segmented into cells using binary tree image segmentation. Each cell consists of blocks in the raster order. All blocks in a cell share the same quantizations parameters: quantization mode and delta table index.

Available block sizes are 4x4, 4x8 and 8x8 pixels. The size of the block is determined during the encoding process according with energy of the cell. Blocks are quantized using adaptive vector quantization and the resulting bytecodes are packed into the compressed bitstream.

Binary tree segmentation

Binary tree segmentation is performed by splitting a region in half horizontally or vertically, and then possibly splitting each of the resulting sub-regions in half and so on until the desired segmentation (suitable for quantization) is achieved. The resulting structure is referred to as a "binary tree" because each node which is not a terminal node is being splitted to form two subbranches. The top node of the tree represents the entire strip. Each time a region is split, two new nodes are formed. The terminal node of the tree (i.e. which has no branches) will be encoded using vector quantization and run-length encoding.

Picture header

Picture header of indeo3 consists of the frame header and the bitstream header.

Frame header

This header is undocumented in the mentioned patent but its structure is very easy to understand:

size	name	comments
DWORD	frame_number	frame number starting with "0"
DWORD	unknown1	seems to be always "0"
DWORD	check_sum	checksum used for identifying indeo3 streams. Obtained as follows: frame_number XOR unknown1 XOR frame_size XOR 'FRMH'.
DWORD	frame_size	size of frame data in bytes

Bitstream header

A good description of this header is given in the patent mentioned above. The table below explains it with some additional comments:

size	name	value(s)	comments
WORD	dec_version	always = 0x20	indicates decoder version. Values other than 0x20 seem to be unsupported.
WORD	frame_flags	bit 0 => "1" indicates a periodic INTRA (key) frame. bit 1 => "1" indicates the use of the YVU9 8-bit pixel format. Unsupported by all known indeo3 decoders. bit 2 => "1" indicates a INTRA (key) frame, either periodic or random one. bit 3 => "1" tells that the next frame is an INTRA frame. bit 4 => "1" horizontal motion vectors have the halfpel resolution. bit 5 => "1" vertical motion vectors have the halfpel resolution. bit 6 => unused. bit 7 => reserved. bit 8 => this frame is a droppable INTER frame. bit 9 => buffer selector: 0 - primary, 1 - secondary. bits 10...15 are reserved.	frame flags. Bit 0 is the LSB.
DWORD	data_size	value of "128" indicates a NULL (sync) frame.	size of the bitstream in bits.
BYTE	cb_offset		offset for selecting VQ tables for the modes 1 and 4 (see below).
BYTE	reserved1		should be ignored.
WORD	checksum		introduced for debugging. Unfortunately all known encoders set this field to "0" making it unusable.
WORD	height		height of the coded picture.
WORD	width		width of the coded picture.
DWORD	y_offset		offset to the data of the luminance plane from the beginning of this header.
DWORD	v_offset		offset to the data of the chroma v plane from the beginning of this header.
DWORD	u_offset		offset to the data of the chroma u plane from the beginning of this header.
DWORD	reserved2		should be ignored.
16 bytes	alt_quant	each byte indicates a pair of the VQ tables: bits 0...3 => secondary table index bits 4...7 => primary table index	VQ table set should be used with the modes 1 and 4.

Plane data

Each plane data has the following format:

size	name	condition	comments
DWORD	num_vectors		number of motion vectors if any. Contains "0" for INTRA frames.
num_vectors * 2	mc_vectors	present only if num_vectors != 0	array of motion vectors. Each entry contains two signed bytes. First one is the vertical component, second one is the horizontal component.
variable	vq_data		binary tree codes interleaved with cell data.

The plane data itself stores a binary tree segmentation interleaved with VQ codes for each cell and usually looks like this:

binary tree codes for selecting the size of the cell 0
VQ codes for the cell 0
binary tree codes for selecting the size of the cell 1
VQ codes for the cell 1
...
binary tree codes for selecting the size of the cell n
VQ codes for the cell n

The C-pseudocode example here shows how to parse the binary tree segmentation properly.

VQ data codes

The VQ data for each cell contains one-byte VQ codes of the following three types:

DYAD: this indicates a two-byte delta applied to two pixels at once.
QUAD: this indicates a four-byte delta applied to four pixels at once.
RLE_ESC: one of the special run-length codes (see below).

Each 4x4 block is coded using 8 dyads. The following table shows the order of dyads and their pixels in a block:

line	pixels of the dyad 1	pixels of the dyad 0
0	3, 2	1, 0
1	7, 6	5, 4
2	11, 10	9, 8
3	15, 14	13, 12

According with the table above a 4x4 block can be coded using 8 codes of the type "DYAD" as follows:

code 0 for the dyad 0, line 0
code 1 for the dyad 1, line 0
code 2 for the dyad 0, line 1
code 3 for the dyad 1, line 1
code 4 for the dyad 0, line 2
code 5 for the dyad 1, line 2
code 6 for the dyad 0, line 3
code 7 for the dyad 1, line 3

Two dyads can form a quad (four-byte delta). In this case the corresponding line will be coded using only one code of the type "QUAD". The following example shows a valid block encoding intermixing both "DYAD" and "QUAD" codes:

code 0, type = DYAD for the dyad 0, line 0
code 1, type = DYAD for the dyad 1, line 0
code 2, type = QUAD for the dyads 0 and 1, line 1
code 3, type = DYAD for the dyad 0, line 2
code 4, type = DYAD for the dyad 1, line 2
code 5, type = QUAD for the dyads 0 and 1, line 3

As we've seen above a line in a 4x4 block can be coded using either two codes of the type "DYAD" or one code of the type "QUAD". All other combinations are forbidden. The following example shows an invalid encoding intermixing both "DYAD" and "QUAD" codes:

code 0, type = DYAD for the dyad 0, line 0
code 1, type = QUAD --------> ERROR!!!

Run-length codes

Values 0xF8...0xFF of the VQ data codes are reserved for the special run-length codes. Their generalized meaning is described in the table below:

name	value	description	valid for lines
RLE_ESC_FF	0xFF	apply null delta to all lines up to the 2nd line	0
RLE_ESC_FE	0xFE	apply null delta to all lines up to the 3rd line	0 and 1
RLE_ESC_FD	0xFD	apply null delta to all remaining lines of this block	0, 1 and 2
RLE_ESC_FC	0xFC	apply null delta to all remaining lines of this block and to whole next block	all
RLE_ESC_FB	0xFB, counter	apply null delta to n blocks/skip n blocks. The "counter" is a byte and should be interpreted as follows: bits 0...4 => number of blocks to process bit 5 => 0 - copy blocks, 1 - mark blocks as skipped bits 6 and 7 are unused and should be set to "0"	all
RLE_ESC_FA	0xFA	INTRA: mark this block as skipped INTER: copy data from the corresponding reference block	0
RLE_ESC_F9	0xF9	INTRA: mark this block and the next one as skipped INTER: copy data from the corresponding two reference blocks	0
RLE_ESC_F8	0xF8	seems to be never used

Depends on coding mode, block size and block type there are some variations in the processing of this codes. For a detailed description see the explanation of the VQ coding modes below.

Pseudocode examples

Parsing binary tree codes

The C-pseudocode example below shows how to parse the binary tree segmentation properly:

#define H_SPLIT 0
#define V_SPLIT 1
#define INTRA   2
#define INTER   3
#define VQ_NULL 2
#define VQ_DATA 3

STRIP_WIDTH = 160 pixels for luma and 40 pixels for chroma planes;

current_cell.position = (0,0);
current_cell.dimensions = plane.dimensions;
current_cell.tree = MC_TREE;
current_cell.type = INTRA;

while (binary tree not exhausted)
{
    btree_code = get 2-bit binary tree code from bitstream;
    switch (btree_code) {
        if (current_cell.tree == MC_TREE) {
            case H_SPLIT:
                split current cell into two vertical subcells and push them onto stack;
                break;
            case V_SPLIT:
                if (current_cell.width >= STRIP_WIDTH)
                    split current strip into two horizontal strips and push them onto stack;
                else
                    split current cell into two horizontal subcells and push them onto stack;
                break;
            case INTRA:
                current_cell.type = INTRA;
                current_cell.tree = VQ_TREE;
                break;
            case INTER:
                current_cell.type = INTER;
                current_cell.tree = VQ_TREE;
                current_cell.vec_indx = get next byte from bitstream;
                break;
        } else { // VQ_TREE
            case H_SPLIT:
                split current cell into two vertical subcells and push them onto stack;
                break;
            case V_SPLIT:
                if (current_cell.width >= STRIP_WIDTH)
                    split current strip into two horizontal strips and push them onto stack;
                else
                    split current cell into two horizontal subcells and push them onto stack;
                break;
            case VQ_NULL:
                code = get 2-bit binary tree code from bitstream;
                if (code == 0)
                    copy data from the referenced cell into current one;
                else if (code == 1)
                    mark current cell as skipped;
                else
                    return error "invalid null code";
                pop current cell from stack;
                break;
            case VQ_DATA:
                decode VQ data for current cell;
                pop current cell from stack;
                place binary tree data pointer to next byte after decoded data;
                break;
        }
    }
}

Indeo 3

Contents

Introduction

Decoder specification

Coding techniques

Supported picture dimensions

Internal pixel format

Software vector operations

Internal picture representation

Binary tree segmentation

Picture header

Frame header

Bitstream header

Plane data

VQ data codes

Run-length codes

Pseudocode examples

Parsing binary tree codes

Navigation menu

Indeo 3

Introduction

Decoder specification

Coding techniques

Supported picture dimensions

Internal pixel format

Software vector operations

Internal picture representation

Binary tree segmentation

Picture header

Frame header

Bitstream header

Plane data

VQ data codes

Run-length codes

Pseudocode examples

Parsing binary tree codes

Navigation menu

Search