Indeo 4
- FourCC: IV41
- Company: Intel, then Ligos
- Samples: http://samples.mplayerhq.hu/V-codecs/IV41/
Introduction
Indeo Video Interactive (further IVI) is a proprietary video compression algorithm developed by Intel. IVI is a completely different technology compared to the previous Indeo releases were based on vector quantization (see Indeo 2 and Indeo 3). IVI is a block-based interframe compression algorithm similar to MPEG with the exception that it uses block wavelet transforms instead of the common Discrete Cosine Transform. IVI offers a set of "interactive" features which go beyond the traditional services provided by codecs. These include transparency (chroma-keying), local decoding, scalabilty and media protection.
There are currently two main versions of IVI: version 4 (further Indeo4) and version 5 (further Indeo5). Version 5 is very similar to 4, but uses a different more compactly bitstream format. For a description of the IVI version 5 see here: Indeo 5.
Although IVI codec software is currently available for Windows, Macintosh (QuickTime) and Linux (Xanim player), it is not well suited to cross-platform playback due to several incompabilities and speed differencies.
The following links provide a good user-level information about IVI: [1] [2] [3]
This article focuses on technical details of Indeo4 necessary to build a decoder. IVI is considered an undocumented algorithm. The author of this article was able to find several patents issued by Intel describing different parts of IVI. You can find all them in the references below. Unfortunately the most important parts of the codec like bitstream format or decoding tables were not described in the patents (or it was difficult to find any). The detailed information was obtained through reverse engineering of binary codecs.
Indeo Video Interactive Version 4 (Indeo4)
Brief description of the coding techniques
Color subsampling
Indeo4 operates internally in the YUV color space. The color format used is YVU 4:1:0 aka YVU9 aka Indeo raw.
Spatial compression
Transform coding is used to convert images from the spatial domain to the frequency domain. Indeo4 decomposes an image into square blocks and performs block wavelet transform in order to obtain frequency coefficients. Further quantization, run-length coding and huffman coding are performed in order to compress these coefficients. Indeo4 uses the block Haar and block Slant transforms as block transforms. A latest version of Indeo4 (available only under Windows) utilizes the Discrete Cosine Transform (further DCT) as well. Available block sizes are 8x8 and 4x4 pixels.
The decision which transform to use (DCT, Slant or Haar) will depend on the quality/speed settings. Although the DCT produces better looking pictures than the Slant or Haar transforms, it usually requires alot of computations (multiplies). Furthermore it's unsupported in the indeo4 decoders for Mac and Linux. Compared to the DCT, both Haar and Slant transforms can be programmed to use only shifts and adds without multiplies. It makes the coding much faster. Further these transforms be easily parallelized using a SIMD instruction set.
Temporal compression
Indeo4 uses motion estimation in order to perform temporal compression. It supports intra-coded frames (frames without prediction) and inter-coded frames (delta frames). Delta frames can be coded relatively to a previous frame (P frame) or relatively to previous and subsequent frames (bi-directional or B frame).
Brief description of the interactive features
Transparency
The Indeo4 supports a form of transparency analogous to chroma-keying or blue-screening, allowing foreground video objects to be composited dynamically over a different background - a bitmap or possibly even another video. Encoder analyses each frame and generates a 1-bit transparency bitmask: a pixel is either transparent or not. This bitmask is then encoded as an additional plane into the bitstream using huffman coding.
Local decoding
Sometimes an application (a game, for example) needs to display only a part of the decoded video image. In this case, much of the source image doesn't need to be displayed. Indeo4 provides a capability called local decode that saves processor resources by decoding images partially. The playback application can tell Indeo4 to decode only a rectangular subregion, called the view port, from the source video image. The minimum possible size of the local decode viewport is defined during compression, but the display size and location of the viewport can be changed dynamically during playback. During compression, Indeo4 breaks each frame into small pieces called slices (tiles). Each slice is then encoded into the bitstream as self-containing section that can be easily skipped at decoding time. The decoder decides if a particular slice should be decoded or not.
Scalability
This feature allows Indeo4 to adapt playback to the processor power of the particular machine being used for playback. This works by dividing the image into a number of frequency bands using wavelet decomposition. These bands represent the image at a different level of sharpness. All bands are necessary to perfectly recreate the original image. But if there is not enough processor power available, the decoder can decompress fewer bands of each frame, rather than simply dropping frames. This produces blurry images, but preserves the motion.
Media protection through access keys
Indeo4 allows video to be compressed with an embedded "password" (access key) which must be supplied by the player software for the video to be usable. If no valid password was specified the decoder should not decode a protected video clip. Nevertheless no encryption of the frame data is performed.
Decoder specification
Picture layout
Each Indeo4 video frame is represented by three two-dimensional component planes: one luminance (Y) plane and two chrominance (U and V) planes. Each component plane can be subdivided into several frequency bands using wavelet decomposition. There are two subdivision dispositions:
Profile | Number of luma bands | Number of chroma bands |
---|---|---|
Normal (Scalability mode is off) | 1 | 1 |
Scalable (Scalability mode is on) | 4 | 1 |
Each band is divided into blocks - usually 8x8 pixels for the luminance plane and 4x4 for the chrominance planes.
Blocks are grouped into macroblocks; available macroblock sizes are 16x16, 8x8 and 4x4 pixels.
The relation between planes, macroblocks and blocks is shown in the table below:
Plane | Macroblock size | block size | Number of blocks in a MB |
---|---|---|---|
luma | 16x16 | 8x8 | 4 |
luma | 8x8 | 8x8 | 1 |
chroma | 4x4 | 4x4 | 1 |
Macroblocks have a set of parameters like MB type (intra or inter), quantization delta, motion vector. These parameters are shared between blocks in a macroblock.
One or more rows of macroblocks are grouped into slices. Slices are coded as independant parts of the bitstream and can be easily skipped during decoding if they are not in the view. See Local decoding feature for a further description. A slice may be as big as an entire component plane or have a size assigned during encoding.
Bitstream organization
The indeo4 bitstream is organized hierarchically and has the following structure:
Picture header Luma plane data Chroma V plane data Chroma U plane data [Transparency plane data]
Transparency plane is only present in the bitstream if transparency mode is enabled.
Each plane data (except the transparency plane) has the following organization:
Band 0: Band header, Slice1 data, [Slice2 data, ... SliceN data] Band 1: Band header, Slice1 data, [Slice2 data, ... SliceN data] Band N: Band header, Slice1 data, [Slice2 data, ... SliceN data]
Bands with the numbers 1...N are only present if scalability mode is enabled. Otherwise the bitstream contains only single band data. Each band can have one or more slices. Each slice has the following structure:
Slice data size Macroblocks info data Blocks data
"Slice data size" indicates the size of the slice data. "Macroblocks info data" contains information like MB type, quant delta, motion vector for all macroblocks in a particular slice. "Blocks data" contains huffman encoded transform coefficients for all blocks in a slice.
Bitstream format description
The bits of the indeo4 bitstream are in the order LSB to MSB. Bytes are in the sequence "byte 0, 1, 2, ...". Therefore, the first bit of the bitstream is byte 0 bit 0, followed by byte 0 bit 1, up to byte 0 bit 7, and then byte 1 bit 0.
The following functions are used in order to parse the indeo4 bitstream:
readbits(x) - reads x bits from the compressed bitstream skipbits(x) - discard x bits from the compressed bitstream align2byte() - align the bitstream pointer to the byte boundary
Picture header
The picture header of indeo4 has the following format:
- Picture Start Code (PSC) (18 bits)
PSC is a 18 bit word. Its value is 11 1111 1111 1111 1000b (0x3FFF8). It shall be byte aligned.
- Frame type (3 bits)
Indicates how this frame is coded:
Value | Frame type |
---|---|
0 | INTRA (key frame) |
1 | INTER |
2 | INTER (bi-directional ?) |
3 | INTER |
4 | unknown |
5 | NULL (empty frame) |
6 | NULL (empty frame) |
7 | Forbidden |
- Transparency status (1 bit)
The value of "1" indicates that transparency mode is enabled, otherwise - disabled.
- Marker bit (1 bit)
This bit shall be "0". Some decoder versions generate an error if this bit has value of "1".
- Data size indicator (1 bit)
The value of "1" indicates that the data size field (following this indicator) is present. Otherwise the data size field is not present in the picture header. Data size is considered to be "0" in this case.
- Data size (24 bits, optional)
The size of the bitstream in bytes. Only present in the bitstream if indicated by the "Data size indicator" bit (see above).
Null frames don't contain anything else than the fileds above. The following fields are present in the frames other than "NULL":
- Key lock status (1 bit)
The value of "1" indicates that the present bitstream is protected with an embedded "password" (access key).
- Key lock (32 bits, optional)
Only present in the bitstream if indicated by the "Key lock status" bit (see above). It represents a kind of "hash" used to verify if the password entered by the user is correct and thus to ensure if a particular user is authorized to decode the protected clip. For a description how to use this field see Access key protection.
- Picture size index (3 bits)
This is an index into the table of standard picture sizes:
Index value | Width | Height |
---|---|---|
0 | 640 | 480 |
1 | 320 | 240 |
2 | 160 | 120 |
3 | 704 | 480 |
4 | 352 | 240 |
5 | 352 | 288 |
6 | 176 | 144 |
7 | Custom | Custom |
The value of "7" ("custom picture size") indicates that the size of the picture is explicitly encoded in the picture header using "Picture width ex" and "Picture height ex" fields (see below).
- Picture height ex (16 bits, optional)
Only present in the picture header if the "Picture size index" (see above) has the value of "7". Indicates custom height of the coded frame that should be a multiply of 4.
- Picture width ex (16 bits, optional)
Only present in the picture header if the "Picture size index" (see above) has the value of "7". Indicates custom width of the coded frame that should be a multiply of 4.
- Slice size flag (1 bit)
The value "0" of this flag indicates that there is only one slice in this frame. Its size is as big as the size of an entire plane. The value "1" indicates custom slice size given using the fields "slice height index" and "slice width index" (see below).
- Slice height index (4 bits, optional)
- Slice width index (4 bits, optional)
These fields specify slice size explicitely. Only present in the bitstream if "Slice size flag" (see above) has the value of "1". The table below shows how to interpret their values:
Index value(both fields) | Corresponding slice height | Corresponding slice width |
---|---|---|
0 | 32 | 32 |
1 | 64 | 64 |
2 | 96 | 96 |
3 | 128 | 128 |
4 | 160 | 160 |
5 | 192 | 192 |
6 | 224 | 224 |
7 | 256 | 256 |
8 | 288 | 288 |
9 | 320 | 320 |
10 | 352 | 352 |
11 | 384 | 384 |
12 | 416 | 416 |
13 | 448 | 448 |
14 | 480 | 480 |
15 | same as picture height | same as picture width |
- Subsampling format index (2 bits)
indicates format of the coded frame:
Index value | Horizontal subsampling factor | Vertical subsampling factor | Format |
---|---|---|---|
0 | 4 | 4 | YVU9 |
1 | 2 | 2 | YV12 |
2 | 2 | 1 | YUY2(?) |
3 | Custom | Custom | Custom |
The value of "3" ("custom format") indicates the use of a custom format specified by two subsampling factors "Horizontal factor" and "Vertical factor" following this index (see below).
- Horizontal factor (2 bits, optional)
- Vertical factor (2 bits, optional)
Specifies horizontal and vertical subsampling factors respectively. Only present if "Subsampling format index" (see above) has the value of "3".
Please note: all known indeo4 decoders support only the YVU9 format. If "Sumpsampling format index" field has any value other than "0" an error will be generated! The description of the fields "Horizontal factor" and "Vertical factor" included for completeness only.
To be continued...
Annex A: Huffman coding
Huffman codes used in Indeo4 are of the form [string of k 1's][0][some additional bits]. The [string of k 1's][0] is the code prefix and the additional bits are the code bits. The same form of huffman coding was already used in Indeo 2 codec. Below an example of such a codebook:
0x 10xx 110xxx 1110xxxx 11110xxxxx 111110xxxxxx 111111xxxxxxx*
Codebooks like this one can be completely specified by saying for each k how many "additional bits x" there are. Thus the codebook above can be defined using the following short descriptor:
numRows = 7 xbits[numRows] = 1, 2, 3, 4, 5, 6, 7*
The asterisk denotes that the "0" at the end of the prefix in the last row is replaced with an "x" bit.
All Indeo4 codebooks are specified using short descriptors described above. Due to the high compactly nature of those, it's possibly to change codebooks several times per frame (for example, each band can have its own codebook representing signals more compactly as the predefined one).
Codebook descriptors in the bitstream
Indeo4 huffman codebook descriptor has the following format:
- Codebook selector (3 bits)
Specifies the number of the desired codebook. Values "0...6" select one of the predefined codebooks. Value of "7" indicates that a custom codebook explicitely specified by descriptor below.
- Number of rows (4 bits, optional)
Specifies number of rows in the codebook descriptor (see above). Only present if the "Codebook Selector" has the value of "7"(custom codebook).
- X bits (array of 4bits-words, variable length, optional)
Specifies how many xbits for each "number of rows" there are (see above). Only present if the "Codebook Selector" has the value of "7"(custom codebook).
The pseudocode Decoding huffman descriptors shows how to parse the codebook descriptors properly. For the predefined sets of huffman codebooks for both macroblock and block signals see Predefined huffman codebooks.
Annex B: Run-length coding
Run-length coding in Indeo4 is performed using a predefined set of the run-value tables. Each table consists of (skip,value) pairs, where skip is the number of zeros and value is the next non-zero component. The run-value tables are only used for AC coefficients. The DC coefficient (the first coefficient in a transform matrix) is coded using the differential coding.
Annex C: Pseudocode examples
Decoding huffman descriptors
cb_selector = readbits(3); /* decode codebook number */ if (cb_selector == 7) { /* ok, we have a custom codebook */ num_rows = readbits(4); for (i = 0; i < num_rows; i++) xbits[i] = readbits(4); /* generate the huffman table using num_rows and xbits[] */ } else { /* select a predefined codebook according with cb_selector */ }
Annex D: Decoding tables
Predefined huffman codebooks
Macroblock huffman tables
The table below shows the predefined set of huffman codebooks for coding macroblocks (especially huffman coded quant delta and motion vector delta signals):
MB_huff_cb_0: numRows = 8; xbits[numRows] = {0, 4, 5, 4, 4, 4, 6, 6};
MB_huff_cb_1: numRows = 12; xbits[numRows] = {0, 2, 2, 3, 3, 3, 3, 5, 3, 2, 2, 2};
MB_huff_cb_2: numRows = 12; xbits[numRows] = {0, 2, 3, 4, 3, 3, 3, 3, 4, 3, 2, 2};
MB_huff_cb_3: numRows = 12; xbits[numRows] = {0, 3, 4, 4, 3, 3, 3, 3, 3, 2, 2, 2};
MB_huff_cb_4: numRows = 13; xbits[numRows] = {0, 4, 4, 3, 3, 3, 3, 2, 3, 3, 2, 1, 1};
MB_huff_cb_5: numRows = 9; xbits[numRows] = {0, 4, 4, 4, 4, 3, 3, 3, 2};
MB_huff_cb_6: numRows = 10; xbits[numRows] = {0, 4, 4, 4, 4, 3, 3, 2, 2, 2};
MB_huff_default: numRows = 12; xbits[numRows] = {0, 4, 4, 4, 3, 3, 2, 3, 2, 2, 2, 2};
Block huffman tables
The table below shows the predefined set of huffman codebooks for coding block signals (huffman coded transform coefficients):
Block_huff_cb_0: numRows = 10; xbits[numRows] = {1, 2, 3, 4, 4, 7, 5, 5, 4, 1};
Block_huff_cb_1: numRows = 11; xbits[numRows] = {2, 3, 4, 4, 4, 7, 5, 4, 3, 3, 2};
Block_huff_cb_2: numRows = 12; xbits[numRows] = {2, 4, 5, 5, 5, 5, 6, 4, 4, 3, 1, 1};
Block_huff_cb_3: numRows = 13; xbits[numRows] = {3, 3, 4, 4, 5, 6, 6, 4, 4, 3, 2, 1, 1};
Block_huff_cb_4: numRows = 11; xbits[numRows] = {3, 4, 4, 5, 5, 5, 6, 5, 4, 2, 2};
Block_huff_cb_5: numRows = 13; xbits[numRows] = {3, 4, 5, 5, 5, 5, 6, 4, 3, 3, 2, 1, 1};
Block_huff_cb_6: numRows = 13; xbits[numRows] = {3, 4, 5, 5, 5, 6, 5, 4, 3, 3, 2, 1, 1};
Block_huff_default: numRows = 9; xbits[numRows] = {3, 4, 4, 5, 5, 5, 6, 5, 5};