- FourCC: IV31, IV32
- Company: Intel, Ligos
- Technical Description: http://www.csse.monash.edu.au/~timf/videocodec/indeo3.txt (mirrored)
- Samples: http://samples.mplayerhq.hu/V-codecs/IV32/
- Partial specification: http://multimedia.cx/mirror/5386232.pdf
Indeo Video 3 is a proprietary video compression algorithm developed by Intel. It is primarily used to encode video in AVI files. The format was commonly seen around the time of 1996 and has since been superseded by Indeo Video Interactive version 4 and 5 (see Indeo 4 and Indeo 5). There are several versions of this codec (R3.1, R3.2) but the version R3.2 (FOURCC 'IV32') is the most common. There is a support for this codec for all major platforms (Windows, Macintosh, Linux).
This document focuses on principles necessary to implement an Indeo Video 3 decoder.
Supported picture dimensions
Indeo3 has the following size restrictions:
width: 16...640 pixels height: 16...480 pixels
All picture dimensions must be a multiple of 4.
Internal pixel format
Indeo Video 3 operates completely in the YUV color space. It uses a special downsampled case of the YVU 4:1:0 format where pixel are represented using only 7 bits. The downsampling is done in the encoder by applying the right shifting on pixel values after color conversion and subsampling. Thus, both encoder and decoder operate internally on pixel values in the range of 0-127. This was done in order to permit the software-based SIMD processing (see software vector operations).
Additionaly the upsampling is needed in the decoder. After each plane was decoded, the pixel values shall be shifted one bit left to convert them back to 8-bit values.
Software vector operations
Indeo3 is built upon the so-called "softSIMD" technique, which allows an emulation of the vector operations purely in software. As this codec was developed there was no processors equipped with a SIMD instruction set, therefore the "softSIMD" can significally boost performance.
In a processor with explicit SIMD (single instruction, multiply data) support, one operation is executed on two or more data sets to produce multiple outputs. The basic idea of the "softSIMD" is to treat the processor registers as containing multiple data words; for example, a 32-bit register can be treated as containing four 8-bit data words.
The examples below illustrate how the "softSIMD" is implemented in Indeo3:
Example 1: Adding two delta values to two pixels at once src_pixels = 0x20202020; // predicted pixels delta = 0x0000FE01; // low-order bits contain two delta values: -2 and +1 dst_pixels = src_pixels + delta; // dst_pixels contains 0x20201E21 now
Example 2: Adding four delta values to four pixels at once src_pixels = 0x20202020; // predicted pixels delta = 0xFE030201; // delta values: -2, +3, +2 and +1 dst_pixels = src_pixels + delta; // dst_pixels contains 0x1E232221 now
Example 3: averaging four pixels at once dst_pixels = ((src_pixels1 + src_pixels2) >> 1) & 0x7F7F7F7F;
Internal picture representation
Each picture (frame) is divided into following pieces: planes, strips, cells and blocks.
Each frame consists of three planes: Y, V and U.
Each plane can be segmented into one or more vertical strips depends on picture width. A strip is a region of the image being encoded as self-contained section. A strip has fixed width (160 pixels for luminance and 40 pixels for chrominance). Its height is always the same as the height of the picture. Large pictures shall be encoded in several strips. This makes the encoder/decoder simplier and faster. The following example shows a typical picture segmentation (sizes are given for the luminance plane; for chrominance ones those must be divided by 4):
if the width of the image <= 160: there is only one strip which width is <= 160 if the width of the image > 160: there are two strips: 160 + (pic_width % 160) if the width of the image > 320: there are three strips: 160 + 160 + (pic_width % 320) if the width of the image > 480: there are four strips: 160 + 160 + 160 + (pic_width % 480)
The "%" in the examples above denotes the "MOD" operation.
Each strip is encoded with either intra-frame or inter-frame prediction. Inter-frame prediction is obtained by applying motion estimation.
Each strip will be further segmented into cells using binary tree image segmentation. Each cell consists of blocks in the raster order. All blocks in a cell share the same quantizations parameters: quantization mode and delta table index.
Available block sizes are 4x4, 4x8 and 8x8 pixels. The size of the block is determined during the encoding process according with energy of the cell. Blocks are quantized using adaptive vector quantization and the resulting bytecodes are packed into the compressed bitstream.
Binary tree segmentation
Binary tree segmentation is performed by splitting a region in half horizontally or vertically, and then possibly splitting each of the resulting sub-regions in half and so on until the desired segmentation (suitable for quantization) is achieved. The resulting structure is referred to as a "binary tree" because each node which is not a terminal node is being splitted to form two subbranches. The top node of the tree represents the entire strip. Each time a region is split, two new nodes are formed. The terminal node of the tree (i.e. which has no branches) will be encoded using vector quantization and run-length encoding.
Picture header of indeo3 consists of the frame header and the bitstream header.
This header is undocumented in the mentioned patent but its structure is very easy to understand:
|DWORD||frame_number||frame number starting with "0"|
|DWORD||unknown1||seems to be always "0"|
|DWORD||check_sum||checksum used for identifying indeo3 streams. Obtained as follows: frame_number XOR unknown1 XOR frame_size XOR 'FRMH'.|
|DWORD||frame_size||size of frame data in bytes|
A good description of this header is given in the patent mentioned above. The table below explains it with some additional comments:
|WORD||dec_version||always = 0x20||indicates decoder version. Values other than 0x20 seem to be unsupported.|
||frame flags. Bit 0 is the LSB.|
|DWORD||data_size||value of "128" indicates a NULL (sync) frame.||size of the bitstream in bits.|
|BYTE||cb_offset||offset for selecting VQ tables for the modes 1 and 4 (see below).|
|BYTE||reserved1||should be ignored.|
|WORD||checksum||introduced for debugging. Unfortunately all known encoders set this field to "0" making it unusable.|
|WORD||height||height of the coded picture.|
|WORD||width||width of the coded picture.|
|DWORD||y_offset||offset to the data of the luminance plane from the beginning of this header.|
|DWORD||v_offset||offset to the data of the chroma v plane from the beginning of this header.|
|DWORD||u_offset||offset to the data of the chroma u plane from the beginning of this header.|
|DWORD||reserved2||should be ignored.|
|16 bytes||alt_quant||each byte indicates a pair of the VQ tables:
||VQ table set should be used with the modes 1 and 4.|