Indeo 5: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
No edit summary
 
(105 intermediate revisions by 13 users not shown)
Line 1: Line 1:
* FOURCCs: IV50
* FOURCCs: IV50
* Company: [[Intel]], then [[Ligos]]
* Company: [[Intel]], then [[Ligos]]
* Samples: http://samples.mplayerhq.hu/V-codecs/IV50/
* Samples: http://ligos.com/videoclips/lions/lion_sif_ind5.zip
* Docs: http://www.ligos.com/pdf_docs/Indeo_doc.pdf
* Docs: http://www.ligos.com/pdf_docs/Indeo_FAQ.pdf
* Patent links: http://www.freepatentsonline.com/5532940.pdf
* Patent links: http://www.patentstorm.us/patents/5532940-description.html


== Introduction ==


== General description ==
Indeo5 is the latest version of Indeo Video Interactive(IVI). For an introduction to this compression algorithm see [[Indeo_4#Introduction|Indeo Video Interactive]].


=== Frame layout ===
== Indeo Video Interactive Version 5 (Indeo5) ==


The general indeo5 frames layout is composed of one global header, followed by the content of the three YUV plans.
For a description of the coding techniques see [[Indeo_4#Brief description of the coding techniques|Brief description of the coding techniques]].


In this document, the global header is split into 3 parts:
For a description of the interactive features see [[Indeo_4#Brief description of the interactive features|Brief description of the interactive features]].
* [[#Frame_header|Frame header]]: describe the kind of frame (I/P/B)
* [[#GOP_header|GOP header]]: some data which is true for all the frame in this GOP (present only in the first (I) frame of the GOP)
* [[#More_header|More header]]: some more data which is true only for this single frame


Each YUV plan begin with a [[#Plan_header|Plan header]], containing values which are valid only for this single plan.
Indeo5 algorithm is mostly the same as indeo4 with the following differences:


=== Encoding ===
- indeo5 uses a different bitstream format for picture and band headers that allows storing of compressed frames more compactly.


This codec is based on the slant transform. Other used standard techniques are huffman coding and motion compensation.
- indeo5 utilizes only the Slant transform. The Haar transform used in indeo4 was dropped due to its low quality.
 
- indeo5 uses the Daubechies (LeGall) 5/3 wavelet for the subband decomposition/recomposition instead of the Haar wavelet used in indeo4 in order to provide a better quality for the scalability mode.
 
- bidirectional frames mode seems to be dropped. Actually there is no indeo5 encoder supports its creating. Mac and Xanim decoders contain no code for handling of this kind of frames.
 
- indeo5 performs a partially encryption of the bitstream if a numeric password ("access key") was specified during encoding.
 
== Decoder specification ==
 
Indeo5 has the same picture layout and bitstream organization as indeo4. For a detailed description see [[Indeo_4#Picture layout|Indeo4 picture layout]] and [[Indeo_4#Bitstream organization|Bitstream organization]].


=== Conventions ===
=== Conventions ===
Line 25: Line 39:


Here are the meaning of each columns:
Here are the meaning of each columns:
* '''size''': The size of this value in bits. Bits are counted in MSB to LSB order. As an example, with the byte 01110000b, reading 3 bits then 5 bits will return 011b then 10000b.
* '''size''': The size of this value in bits. Bits are counted in LSB to MSB order. As an example, with the byte 01110000b, reading 3 bits then 5 bits will return 000b then 01110b. Reading more than 8 bits thus reads as a little-endian value. Think of the get_bits function as filling up the return value from its LSB, using the bits from each byte starting from their LSB.
* '''name''': Kind of variable name, used to reference the value. When a value is named valueX, it generally means we don't know it's purpose. Lines named alignmentX means that bits reader need to skip bits until next byte boundary.
* '''name''': Kind of variable name, used to reference the value. When a value is named valueX, it generally means we don't know it's purpose. Lines named alignmentX means that bits reader need to skip bits until next byte boundary.
* '''condition''': The value is present in the frame only if this condition is matched. No condition means that the value is always present.
* '''condition''': The value is present in the frame only if this condition is matched. No condition means that the value is always present.
* '''nb times''': How many times the value is repeated.
* '''value(s)''': Description of constant values and their meaning.
* '''comments''': Some details about the content of the value. It may also explain that a value is repeated until a certain condition is reached.
* '''comments''': Some details about the content of the value.
 
=== Picture header ===


Picture header of indeo5 consists of three parts:


== Headers ==
Picture_start_code, frame_type, frame_number
[GOP header]
Frame header


=== Frame header ===
The first two bytes of a frame tell the decoder how the following data should be interpreted. These include three fields:


{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
|- bgcolor="#f0f0f0" |
|- bgcolor="#f0f0f0" |
! size !! name !! condition !! nb times !! comments
! size in bits !! name !! value(s) !! comments
|-
|-
| || frame_flags || || ||
| align="center" | 5 || PSC || always = 0x1F || indeo5 picture start code
* frame_flags & 0x01 => backward predictive
* frame_flags & 0x02 => forward predictive
* frame_flags & 0x04 => null frame
common values:
* 0 => I frame
* 1 => P frame
* 3 => B frame
|-
|-
| 5 || const1 || || || = 0x1F always
| align="center" | || frame_type ||
* 0 => INTRA (key) frame
* 1 => INTER frame
* 2 => droppable INTER frame (scalability mode only)
* 3 => droppable INTER frame
* 4 => NULL frame
* 5...7 are illegal
|| frame type
|-
|-
| 8 || id_in_gop || || || frame number in GOP (0 for I frame)
| align="center" | 8 || frame_number || 0...0xFF || frame number in GOP (0 for I frame)
|}
|}


null frames don't contain anything else than this header.
Null frames don't contain anything else than this header.


=== GOP header ===
==== GOP header ====


This header is present in I frames only. The values in this header are valid during the whole GOP starting at this frame.
This header is present in INTRA (key) frames only. It's used for transfering of some general information (i.e. picture layout) that will be either rarely or never changed during a video sequence.  The values in this header are valid for all frames in the GOP.


{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
|- bgcolor="#f0f0f0" |
|- bgcolor="#f0f0f0" |
! size !! name !! condition !! nb times !! comments
! size in bits !! name !! condition !! value(s) !! comments
|-
| align="center" |  8 || <span id="gop_flags">gop_flags</span> || ||
* bit 0 => 1 - [[#gop_hdr_size|gop_hdr_size]] field is present
* bit 1 => subsampling format: 0 - YVU9, 1 - YV12
* bit 2 => unknown meaning
* bit 3 => transparency status?
* bit 4 =>
* bit 5 => access key protection: 1 - enabled
* bit 6 => local decoding: 1 - enabled
* bit 7 =>
|| GOP header flags (bit 0 is the LSB).
|-
| align="center" | 16 || <span id="gop_hdr_size">gop_hdr_size</span> || [[#gop_flags|gop_flags]] & 0x01 || || Size of this header in bytes. Only present in the bitstream if indicated by the [[#gop_flags|gop_flags]] bit 0.
|-
| align="center" | 32 || <span id="lock_word">lock_word</span> || [[#gop_flags|gop_flags]] & 0x20 || ||
Only present in the bitstream if "access key protection" is active (as indicated by the bit 5 of the [[#gop_flags|gop_flags]]). For a description of how to use this field see [[#Access key protection|Access key protection]].
|-
| align="center" |  2 || <span id="slice_size_id">slice_size_id</span> || [[#gop_flags|gop_flags]] & 0x40 ||
* 0 =>  64 x 64
* 1 => 128 x 128
* 2 => 256 x 256
* 3 => unused
|| ID of slice size. Only present if "local decoding mode" is enabled (indicated by the bit 6 of the [[#gop_flags|gop_flags]]).
|-
| align="center" |  2 || <span id="luma_levels">luma_levels</span> || ||
* 0 => no decomposition
* 1 => 1 level
* 2 => 2 levels
* 3 => forbidden
|| Number of wavelet decomposition levels for the luma plane. Number of resulting wavelet subbands can be calculated using the following equation: num_bands = [[#luma_levels|luma_levels]] * 3 + 1.
|-
| align="center" |  1 || <span id="chroma_levels">chroma_levels</span> || ||
* 0 => no decomposition
* 1 => forbidden
|| Number of wavelet decomposition levels for the chrominance planes. The value of "1" is forbidden because no knowing indeo5 software performs any decomposition of the chrominance planes.
|-
| align="center" |  4 || <span id="pic_size_id">pic_size_id</span> || || || Index into the table of the  [[#Standard_picture_sizes|standard picture sizes]]. If the picture has dimensions not listed in the table then this field contains the value of "15" and the actual picture size will be coded using [[#pic_height|pic_height]] and [[#pic_width|pic_width]] fields.
|-
|-
| align="right" | 8 || <span id="gh_flags">gh_flags</span> || || || [[#gh_flags|gh_flags]] & 0x02 => YV12 (default YVU9)
| align="center" | 13 || <span id="pic_height">pic_height</span> || rowspan="2" | [[#pic_size_id|pic_size_id]] == 15 || || Non-standard picture height.
|-
|-
| align="right" | 16 || <span id="value1">value1</span> || [[#gh_flags|gh_flags]] & 0x01 || ||
| align="center" | 13 || <span id="pic_width">pic_width</span> || || Non-standard picture width.
|-
|-
| align="right" | 32 || <span id="value2">value2</span> || [[#gh_flags|gh_flags]] & 0x20 || ||
| align="center" | variable || <span id="band_info_luma">band_info_luma</span> || || || Array of the [[#Band_info structure|Band_info structures]] describing each luminance band. For a description how to calculate the number of the luminance bands see here: [[#luma_levels|luma_levels]].
|-
|-
| align="right" |  2 || <span id="value3">value3</span> || [[#gh_flags|gh_flags]] & 0x40 || ||
| align="center" |  6-8 || <span id="band_info_chroma">band_info_chroma</span> || || || Array of the [[#Band_info structure|Band_info structures]] describing each chrominance band. Because the chrominance planes are being NEVER decomposed by the existing indeo5 software there is only one band per chrominance plane and therefore only one descriptor of this type.
|-
|-
| align="right" |  3 || <span id="value4">value4</span> || || ||
| align="center" |  3 || <span id="alignment1">alignment1</span> || rowspan="3" | [[#gop_flags|gop_flags]] & 0x08 || always == 0 || Alignment bits. Must be zero.
|-
|-
| align="right" |  4 || <span id="res_id">res_id</span> || || || see [[#Resolution_table|Resolution table]]
| align="center" |  1 || <span id="color_flg">color_flg</span> || || This flag indicates if the [[#transp_color|transp_color]] field is present.
|-
|-
| align="right" | 13 || <span id="height">height</span> || rowspan="2" | [[#res_id|res_id]] == 15 || || frame height
| align="center" | 24 || <span id="transp_color">transp_color</span> || || Transparency fill color.
|-
|-
| align="right" | 13 || <span id="width">width</span> || || frame width
| align="center" | ?? || <span id="alignment2">alignment2</span> || || || Align the bitreader on the next byte.
|-
|-
| align="right" | 6 || <span id="value5">value5</span> || || rowspan="2" | 2*n (n == 1 always?) ||
| align="center" | 8 || <span id="value1">value1</span> || || || Unused.
|-
|-
| align="right" | 2 || <span id="value6">value6</span> || [[#value5|value5]] >> 3 || need to be = 0
| align="center" | 8 || <span id="value2">value2</span> || || || Unused.
|-
|-
| align="right" | 4 || <span id="value7">value7</span> || rowspan="2" | [[#gh_flags|gh_flags]] & 0x08 || ||
| align="center" | 3 || <span id="value3">value3</span> || || || Unused.
|-
|-
| align="right" | 24 || <span id="value8">value8</span> || ||
| align="center" | 4 || <span id="value4">value4</span> || || ||
|-
|-
| align="right" | ?? || <span id="alignment1">alignment1</span> || || || align bits reader on next byte
| align="center" | 1 || <span id="gop_ext_flg">gop_ext_flg</span> || || || This flag indicates if the [[#gop_ext|gop_ext]] field is present.
|-
|-
| align="right" | 24 || <span id="value9">value9</span> || || ||
| align="center" | variable || <span id="gop_ext">gop_ext</span> || [[#gop_ext_flg|gop_ext_flg]] == 1 ||
do { val = getbits(16);
} while(val &0x8000);
|| GOP header extension.
|-
|-
| align="right" | 16 || <span id="value10">value10</span> || [[#value9|value9]] & 0x800000 || || loops while value10 & 0x8000 (probably some kind of VLC ?)
| align="center" | ?? || <span id="alignment3">alignment3</span> || || || Align the bitreader on the next byte.
|}
|}


=== More header ===
==== Frame header ====


This header is present in all kinds of frame except null.
This header is present in all kinds of frame except NULL. It's used mainly to transfer a huffman codebook for the macroblock signals and provide checksum information for debugging purposes.


{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
|- bgcolor="#f0f0f0" |
|- bgcolor="#f0f0f0" |
! size !! name !! condition !! nb times !! comments
! size in bits !! name !! condition !! value(s) !! comments
|-
| align="right" |  8 || <span id="mh_flags">mh_flags</span> || || ||
|-
| align="right" | 24 || <span id="frame_size">frame_size</span> || [[#mh_flags|mh_flags]] & 0x01 || || tolal size of frame data
|-
| align="right" | 16 || <span id="value11">value11</span> || [[#mh_flags|mh_flags]] & 0x10 || ||
|-
|-
| align="right" |  8 || <span id="counter1">counter1</span> || rowspan="2" | [[#mh_flags|mh_flags]] & 0x20 || || rowspan="2" | this whole block loops while [[#counter1|counter1]] != 0
| align="center" |  8 || <span id="frame_flags">frame_flags</span> || ||
* bit 0 => 1 - [[#pic_hdr_size|pic_hdr_size]] field is present
* bit 1 =>
* bit 2 =>
* bit 3 =>
* bit 4 => 1 - [[#frm_checksum|frm_checksum]] field is present.
* bit 5 => 1 - [[#frm_hdr_ext|frm_hdr_ext]] field is present.
* bit 6 => 1 - [[#mb_huff_desc|mb_huff_desc]] field is present. Otherwise select the default macroblock huffman codebook.
* bit 7 => 1 - [[#band_data_size|band_data_size]] field is present.
|| Frame flags (bit 0 is the LSB).
|-
|-
| align="right" | 8 || <span id="value12">value12</span> || [[#counter1|counter1]]
| align="center" | 24 || <span id="pic_hdr_size">pic_hdr_size</span> || [[#frame_flags|frame_flags]] & 0x01 || || Size of the entire picture header in bytes. Only present in the bitstream if indicated by the [[#frame_flags|frame_flags]] bit 0.
|-
|-
| align="right" | 3 || <span id="value13">value13</span> || [[#mh_flags|mh_flags]] & 0x40 || ||
| align="center" | 16 || <span id="frm_checksum">frm_checksum</span> || [[#frame_flags|frame_flags]] & 0x10 || || Frame checksum for debugging purposes. Only present in the bitstream if indicated by the [[#frame_flags|frame_flags]] bit 4.
|-
|-
| align="right" | 4 || <span id="counter2">counter2</span> || rowspan="2" | [[#value13|value13]] == 7 || ||
| align="center" | variable || <span id="frm_hdr_ext">frm_hdr_ext</span> || [[#frame_flags|frame_flags]] & 0x20 ||
To skip it, do the following:
do {
    len = getbits(8);
    for (i=0; i < len; i++) skipbits(8);
} while(len);
|| Unknown frame header extension. Its content will be ignored by the known indeo5 decoders. Only present in the bitstream if indicated by the [[#frame_flags|frame_flags]] bit 5.
|-
|-
| align="right" | 4 || <span id="value14">value14</span> || [[#counter2|counter2]] ||
| align="center" | variable || <span id="mb_huff_desc">mb_huff_desc</span> || [[#frame_flags|frame_flags]] & 0x40 || || Macroblock huffman codebook descriptor. Only present in the bitstream if indicated by the [[#frame_flags|frame_flags]] bit 6. For a description of the format of the huffman codebook descriptors see [[Indeo_4#Codebook descriptors in the bitstream|Codebook descriptors in the bitstream]].
|-
|-
| align="right" | 3 || <span id="value15">value15</span> || || ||
| align="center" | 3 || <span id="value5">value5</span> || || || Unused.
|-
|-
| align="right" | ?? || <span id="alignment2">alignment2</span> || || || align bits reader on next byte
| align="center" | ?? || <span id="alignment4">alignment4</span> || || || Align the biteader on the next byte.
|}
|}


=== Plan header ===
=== Band header ===


This header is present at the beginning of every plan.
This header describes a wavelet band.


{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
|- bgcolor="#f0f0f0" |
|- bgcolor="#f0f0f0" |
! size !! name !! condition !! nb times !! comments
! size in bits !! name !! condition !! value(s) !! comments
|-
|-
| align="right" |  8 || <span id="ph_flags">ph_flags</span> || || ||
| align="center" |  8 || <span id="gop_flags">band_flags</span> || ||
* bit 0 => 1 - this band is empty (doesn't contain any coded data).
* bit 1 => 1 - motion vector inheritance mode is enabled.
* bit 2 => 1 - qdelta parameter is present.
* bit 3 => 1 - qdelta inheritance mode is enabled.
* bit 4 => 1 - [[#rv_tab_corr|rv_tab_corr]] array is present.
* bit 5 => 1 - [[#band_hdr_ext|band_hdr_ext]] field is present.
* bit 6 => 1 - [[#rv_tab_sel|rv_tab_sel]] field is present. Otherwise use the default rv_table.
* bit 7 => 1 - [[#blk_huff_desc|blk_huff_desc]] field is present. Otherwise use the default block huffman codebook.
|| Band flags (bit 0 is the LSB).
|-
|-
| align="right" | 24 || <span id="plan_size">plan_size</span> || [[#mh_flags|mh_flags]] & 0x80 || || tolal size of plan data
| align="center" | 24 || <span id="band_data_size">band_data_size</span> || [[#frame_flags|frame_flags]] & 0x80 || || Size of the band data in bytes. Only present in the bitstream if indicated by the [[#frame_flags|frame_flags]] bit 7.
|-
|-
| align="right" | 8 || <span id="counter3">counter3</span> || rowspan="3" | [[#ph_flags|ph_flags]] & 0x10 || || must be < 0x3E
| align="center" | 8 || <span id="num_rv_corr">num_rv_corr</span> || rowspan="2" | [[#band_flags|band_flags]] & 0x10 || || Number of rv_table correction pairs. Must be <= 61. Only present in the bitstream if indicated by the [[#band_flags|band_flags]] bit 4.
|-
|-
| align="right" | 8 || <span id="value16">value16</span> || rowspan="2" | [[#counter3|counter3]] ||
| align="center" | variable || <span id="rv_tab_corr">rv_tab_corr</span> || || Array of rv_table correction pairs. Its size is [[#num_rv_corr|num_rv_corr]] * 2 bytes. Only present in the bitstream if indicated by the [[#band_flags|band_flags]] bit 4.
|-
|-
| align="right" | 8 || <span id="value17">value17</span> ||
| align="center" | 3 || <span id="rv_tab_sel">rv_tab_sel</span> || [[#band_flags|band_flags]] & 0x40 || || Indicates which run-value table should be used for decoding. Only present in the bitstream if indicated by the [[#band_flags|band_flags]] bit 6.
|-
|-
| align="right" | 3 || <span id="value18">value18</span> || [[#ph_flags|ph_flags]] & 0x40 || ||
| align="center" | variable || <span id="blk_huff_desc">blk_huff_desc</span> || [[#band_flags|band_flags]] & 0x80 || || Block huffman codebook descriptor. Only present in the bitstream if indicated by the [[#band_flags|band_flags]] bit 7. For a description of the format of the huffman codebook descriptors see [[Indeo_4#Codebook descriptors in the bitstream|Codebook descriptors in the bitstream]].
|-
|-
| align="right" | 3 || <span id="table1_id">table1_id</span> || [[#ph_flags|ph_flags]] & 0x80 || || see [[#Table_1|Table 1]]
| align="center" | 1 || <span id="checksum_flag">checksum_flag</span> || || || If set [[#band_checksum|band_checksum]] field is present in the bitstream.
|-
|-
| align="right" | 4 || <span id="counter4">counter4</span> || rowspan="2" | [[#table1_id|table1_id]] == 7 || || rowspan="2" | used instead of [[#Table1|Table1]]
| align="center" | 16 || <span id="band_checksum">band_checksum</span> || [[#checksum_flag|checksum_flag]] || || Band checksum for debugging purposes. Only present in the bitstream if indicated by the [[#checksum_flag|checksum_flag]].
|-
|-
| align="right" | 4 || <span id="value19">value19</span> || [[#counter4|counter4]]
| align="center" | 5 || <span id="band_glob_quant">band_glob_quant</span> || || || Global quantization level for this band.
|-
|-
| align="right" | 1 || <span id="value20">value20</span> || || ||
| align="center" | ?? || <span id="alignment5">alignment5</span> || || || Align the biteader on the next byte.
|-
|-
| align="right" | 16 || <span id="value21">value21</span> || [[#value20|value20]] || ||
| align="center" | variable || <span id="band_hdr_ext">band_hdr_ext</span> || [[#band_flags|band_flags]] & 0x20 ||
To skip it, do the following:
do {
    len = getbits(8);
    for (i=0; i < len; i++) skipbits(8);
} while(getbits(1));
|| Unknown band header extension. Its content will be ignored by the known indeo5 decoders. Only present in the bitstream if indicated by the [[#band_flags|band_flags]] bit 5.
|-
|-
| align="right" | 5 || <span id="value22">value22</span> || || ||
| align="center" | ?? || <span id="alignment6">alignment6</span> || || || Align the biteader on the next byte.
|}
 
== Scalability mode ==
 
This special feature of Indeo5 allows the decoder to adapt playback to the processor power of the particular machine being used for playback. Indeo5 offers both spatial and temporal scalability. Read more about that technique here: [[Scalable Video Coding]].
 
=== Spatial scalability ===
 
Spatial scalability works by dividing the image into a number of frequency bands using wavelet decomposition. These bands represent the image at a different level of sharpness. All bands are necessary to perfectly recreate the original image. But if there is not enough processor power available, the decoder can decompress fewer bands of each frame, rather than simply dropping frames. This produces blurry images, but preserves the motion.
 
The scalability mode is controlled by the user during encoding. If this mode is disabled the encoder acts like an usual block-based transform compression algorithm: each  of the three color planes will be processed using the Slant transform, quantization and Huffman coding.
If the scalability mode is enabled the encoder first performs subband decomposition using the [[Discrete Wavelet Transform]] (DWT). Although each color plane could be theoretically decomposed Indeo5 performs that only on the luminance plane data. This decomposition results in four wavelet bands, each of them is one-fourth of the original picture size. Further those band will be compressed using the Slant transform, quantization and Huffman coding.
 
==== Wavelet transform ====
 
The wavelet used in Indeo5 for decomposition/recomposition purposes is referred as CDF 5/3 or LeGall wavelet. It uses in a slightly different form in many other compression algorithms like [[JPEG 2000]] or [[Snow]]. The coefficients for the analysis filters (encoder) are:
 
  h0 = {-1, 2, 6, 2, -1} * 1/8
  h1 = {1, -2, 1} * 1/4
 
where "h0" is the low-pass filter and "h1" is the high-pass filter.
 
The coefficients for the synthesis filters (decoder) are:
 
  h0 = {1, 2, 1} * 1/2
  h1 = {1, 2, -6, 2, 1} * 1/4
 
where "h0" is the low-pass filter and "h1" is the high-pass filter.
 
This wavelet transform has the following advantages:
 
- it allows an integer implementation
 
- a fast algorithm (lifting) exists
 
- it produces better quality images than the Haar wavelet used in [[Indeo 4]] for the same purpose
 
- it allows the perfect reconstruction of the input signal
 
==== Wavelet bands ====
 
The [[#Wavelet transform|Wavelet transform]] produces four wavelet bands whose properties are summarized in the table below:
 
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
|- bgcolor="#f0f0f0" |
! band !! name !! dimensions !! frequency components !! transform
|-
|-
| align="right" | ?? || <span id="alignment3">alignment3</span> || || || align bits reader on next byte
| align="center" | 0 || align="center" | LL ||
* width = pic_width/2
* height = pic_height/2
|| Low freqs in both horizontal and vertical directions || 2D Slant 8x8
|-
|-
| align="right" |  8 || <span id="counter5">counter5</span> || rowspan="4" | [[#ph_flags|ph_flags]] & 0x20 || || rowspan="3" | all of this is repeated as long as [[#value23|value23]] is true
| align="center" |  1 || align="center" | HL ||
* width = pic_width/2
* height = pic_height/2
||
* Low freqs in the horizontal direction
* High freqs in the vertical direction
|| 1D Row Slant
|-
|-
| align="right" |  8 || <span id="skip1">skip1</span> || [[#counter5|counter5]]
| align="center" |  2 || align="center" | LH ||
* width = pic_width/2
* height = pic_height/2
||
* High freqs in the horizontal direction
* Low freqs in the vertical direction
|| 1D Column Slant
|-
|-
| align="right" |  1 || <span id="value23">value23</span> ||
| align="center" |  3 || align="center" | HH ||
|-
* width = pic_width/2
| align="right" | ?? || <span id="alignment4">alignment4</span> || || align bits reader on next byte
* height = pic_height/2
|| High freqs in both horizontal and vertical directions || No transform
|}
|}




== Plan data ==
The type of the transform used to process a particular band is chosen according to its frequency content. The low frequency image components are the most important components for visual sensitivity. Therefore the transform is selected so that it can process the low frequency components more efficiently than the high frequency ones. For example, the two-dimensional slant transform is used to process the band 0 because it contains the low frequency components in both horizontal and vertical directions. But the band 1 contains low frequency components only in the horizontal direction that's why the one-dimensional slant transform applied to each of the 8 rows in a 8x8 block is used. Similar to it, the band 2 uses the one-dimensional slant transform applied to each of the 8 columns in a 8x8 block. The band 3 contains only high frequency components in both directions therefore no transform is applied to its data. This band will be coded using quantization and entropy coding only.
 
==== Wavelet recomposition ====
 
The following section describes the wavelet recomposition - the last stage of the indeo5 decoder reconstructing an image from a plurality of [[#Wavelet bands|wavelet bands]]. It receives up to four separate bands (labeled b0-b3) and generates recomposed plane data by performing two-dimensional wavelet synthesis.
 
=== Temporal scalability ===
 
In order to achieve the temporal scalability Indeo5 introduces special droppable frames. The main advantage of such frames is that those can be skipped without damaging the whole video sequence. If there is not enough processor power available, the decoder can decompress fewer frames and thus display the video at reduced frame rate.
 
 
 
== Planes ==
=== Plane data ===


This is where the actual data is, but this still need to be reverse engineered :-(
Needs more analysis. Follows plane header.


{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
Line 183: Line 341:
|-
|-
| align="right" | 24 || <span id="value27">value27</span> || [[#value26|value26]] == 0xFF || || plan_data_size = value27
| align="right" | 24 || <span id="value27">value27</span> || [[#value26|value26]] == 0xFF || || plan_data_size = value27
|}
=== Block header ===
Each plane is split into a number of blocks in the x and y directions. There is one of these headers one after another for each block in the plane.
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
|- bgcolor="#f0f0f0" |
! size !! name !! condition !! nb times !! comments
|-
| align="right" |  1 || <span id="value28">value28</span> || || ||
|-
| align="right" |  vlc || <span id="value29">value29</span> || value28 && plane_state17 || ||
|-
| align="right" |  1 || <span id="value30">value30</span> || !value28 && plane_state12 && plane_state1 || ||
|-
| align="right" |  4 || <span id="value31">value31</span> || !value28 && four_blocks || ||
|-
| align="right" |  1 || <span id="value32">value32</span> || !value28 && !four_blocks || ||
|-
| align="right" |  vlc || <span id="value33">value33</span> || !value28 && plane_state14 && !plane_state13 && (plane_state17 <nowiki>||</nowiki> value31/2) || ||
|-
| align="right" |  vlc || <span id="value34">value34</span> || rowspan=2 | !value28 && !(block_state4 & 2) && !plane_state12 || ||
|-
| align="right" |  vlc || <span id="value35">value35</span> || ||
|}
The 'plane_state' states come from plane parsing; they are yet to be connected to the previous data.
block_state4 is too complicated to explain here, sorry!
=== Block data ===
Follows block header. One of these for each plane that has 'plane_flags&1'. The variable 'run' starts at -1 and carries over from one coded plane to the next. I don't really know what I'm doing with vlc's so the names might not be correct... but their functional description is.
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
|- bgcolor="#f0f0f0" |
! size !! name !! condition !! nb times !! comments
|-
| align="right" |  vlc || <span id="vlc">vlc</span> || || rowspan=4 valign=top | while (vlc != vlcEnd) ||
|-
| align="right" |  vlc || <span id="run_add">run_add</span> || rowspan=3 | vlc == vlcEsc || run += run_add + 1
|-
| align="right" |  vlc || <span id="lindex_lo">lindex_lo</span> || rowspan=2 | lindex = lindex_lo <nowiki>|</nowiki> (lindex_hi<<6)
|-
|-
| align="right" | ?? || ... || || || more data to analyze
| align="right" | vlc || <span id="lindex_hi">lindex_hi</span>
|}
|}


If vlc != vlcEsc then run_add is run_table[vlc], lindex is lindex_table[vlc].
After each loop, stored coefficient is: block[ scan_table[run] ] = level_tables[run][lindex-1].
The values of vlcEnd and vlcEsc are variable, as is the vlc table itself. However, they are all fixed for all the planes in the same block. run_table, lindex_table, scan_table are also fixed-per-block. level_tables is per-plane.


== Annexes ==
== Annexes ==


=== Resolution table ===
=== Standard picture sizes ===


{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
! bgcolor="#f0f0f0" | res_id
! bgcolor="#f0f0f0" | pic_size_id
| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11
| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15
|-
|-
! bgcolor="#f0f0f0" | width
! bgcolor="#f0f0f0" | width
| 640 || 320 || 160 || 704 || 352 || 352 || 176 || 240 || 640 || 704 || 80 || 88
| 640 || 320 || 160 || 704 || 352 || 352 || 176 || 240 || 640 || 704 || 80 || 88 || 0 || 0 || 0 || custom
|-
|-
! bgcolor="#f0f0f0" | height
! bgcolor="#f0f0f0" | height
| 480 || 240 || 120 || 224 || 240 || 288 || 144 || 180 || 240 || 240 || 60 || 72
| 480 || 240 || 120 || 224 || 240 || 288 || 144 || 180 || 240 || 240 || 60 || 72 || 0 || 0 || 0 || custom
|}
 
 
=== Band_info structure ===
 
This structure is a part of the [[#GOP header|GOP header]] and describes a wavelet band. Its size is usually 6 bits but can be extended up to 8 bits if the [[#ext_trans|ext_trans]] field is present. The same structure is used to describe both luminance and chrominance bands.
 
{| border="1" cellpadding="5" style="border-collapse: collapse; border-style: dashed; border-color: #2f6fab;"
|- bgcolor="#f0f0f0" |
! size !! name !! condition !! value(s) !! comments
|-
| align="center" |  1 || <span id="mv_res">mv_res</span> || ||
* 0 - fullpel
* 1 - halfpel
|| Motion vector resolution.
|-
| align="center" |  1 || <span id="mb_size_id">mb_size_id</span> || ||
* 0 => double
* 1 => single
|| Macroblock size factor. The real size of the macroblock should be calculated as follows: mb_size = [[#blk_size_id|blk_size_id]] << ![[#mb_size_id|mb_size_id]].
|-
| align="center" |  1 || <span id="blk_size_id">blk_size_id</span> || ||
* 0 => 8x8
* 1 => 4x4
|| Block size id.
|-
| align="center" | 1 || <span id="trans_flg">trans_flg</span> || ||
* 0 => standard
* 1 => non-standard
|| If this flag is set the field [[#ext_trans|ext_trans]] specifies a transform used to code this band explicitely. Otherwise the default transform is used.
|-
| align="center" | 2 || <span id="ext_trans">ext_trans</span> || [[#trans_flg|trans_flg]] != 0 ||
* 0 => 2D Slant
* 1 => Row Slant
* 2 => Column Slant
* 3 => No transform
|| Specifies a transform that should be used instead of the default transform for this band.
|-
| align="center" | 2 || <span id="end_marker">end_marker</span> || || always == 0 || End marker terminating this structure.
|}
|}


Line 416: Line 663:


default is used when !([[#ph_flags|ph_flags]] & 0x80)
default is used when !([[#ph_flags|ph_flags]] & 0x80)
== Games using indeo5 cutscenes ==
* Dino Crisis I: [http://en.wikipedia.org/wiki/Dino_Crisis]
* Lemmings Revolution: [http://en.wikipedia.org/wiki/Lemmings_Revolution]
* Mafia: The City of Lost Heaven [http://en.wikipedia.org/wiki/Mafia:_The_City_of_Lost_Heaven]
* Thief, Parts 1 & 2 : [http://en.wikipedia.org/wiki/Thief_(series)]




[[Category:Undiscovered Video Codecs]]
[[Category:Video Codecs]]
[[Category:Video Codecs]]

Latest revision as of 04:30, 11 November 2010

Introduction

Indeo5 is the latest version of Indeo Video Interactive(IVI). For an introduction to this compression algorithm see Indeo Video Interactive.

Indeo Video Interactive Version 5 (Indeo5)

For a description of the coding techniques see Brief description of the coding techniques.

For a description of the interactive features see Brief description of the interactive features.

Indeo5 algorithm is mostly the same as indeo4 with the following differences:

- indeo5 uses a different bitstream format for picture and band headers that allows storing of compressed frames more compactly.

- indeo5 utilizes only the Slant transform. The Haar transform used in indeo4 was dropped due to its low quality.

- indeo5 uses the Daubechies (LeGall) 5/3 wavelet for the subband decomposition/recomposition instead of the Haar wavelet used in indeo4 in order to provide a better quality for the scalability mode.

- bidirectional frames mode seems to be dropped. Actually there is no indeo5 encoder supports its creating. Mac and Xanim decoders contain no code for handling of this kind of frames.

- indeo5 performs a partially encryption of the bitstream if a numeric password ("access key") was specified during encoding.

Decoder specification

Indeo5 has the same picture layout and bitstream organization as indeo4. For a detailed description see Indeo4 picture layout and Bitstream organization.

Conventions

Headers are described in some tables. Each row of those tables describes a value which may be read from the frame. Those tables and rows are presented in the order of appearance in the frame.

Here are the meaning of each columns:

  • size: The size of this value in bits. Bits are counted in LSB to MSB order. As an example, with the byte 01110000b, reading 3 bits then 5 bits will return 000b then 01110b. Reading more than 8 bits thus reads as a little-endian value. Think of the get_bits function as filling up the return value from its LSB, using the bits from each byte starting from their LSB.
  • name: Kind of variable name, used to reference the value. When a value is named valueX, it generally means we don't know it's purpose. Lines named alignmentX means that bits reader need to skip bits until next byte boundary.
  • condition: The value is present in the frame only if this condition is matched. No condition means that the value is always present.
  • value(s): Description of constant values and their meaning.
  • comments: Some details about the content of the value.

Picture header

Picture header of indeo5 consists of three parts:

Picture_start_code, frame_type, frame_number
[GOP header]
Frame header

The first two bytes of a frame tell the decoder how the following data should be interpreted. These include three fields:

size in bits name value(s) comments
5 PSC always = 0x1F indeo5 picture start code
3 frame_type
  • 0 => INTRA (key) frame
  • 1 => INTER frame
  • 2 => droppable INTER frame (scalability mode only)
  • 3 => droppable INTER frame
  • 4 => NULL frame
  • 5...7 are illegal
frame type
8 frame_number 0...0xFF frame number in GOP (0 for I frame)

Null frames don't contain anything else than this header.

GOP header

This header is present in INTRA (key) frames only. It's used for transfering of some general information (i.e. picture layout) that will be either rarely or never changed during a video sequence. The values in this header are valid for all frames in the GOP.

size in bits name condition value(s) comments
8 gop_flags
  • bit 0 => 1 - gop_hdr_size field is present
  • bit 1 => subsampling format: 0 - YVU9, 1 - YV12
  • bit 2 => unknown meaning
  • bit 3 => transparency status?
  • bit 4 =>
  • bit 5 => access key protection: 1 - enabled
  • bit 6 => local decoding: 1 - enabled
  • bit 7 =>
GOP header flags (bit 0 is the LSB).
16 gop_hdr_size gop_flags & 0x01 Size of this header in bytes. Only present in the bitstream if indicated by the gop_flags bit 0.
32 lock_word gop_flags & 0x20

Only present in the bitstream if "access key protection" is active (as indicated by the bit 5 of the gop_flags). For a description of how to use this field see Access key protection.

2 slice_size_id gop_flags & 0x40
  • 0 => 64 x 64
  • 1 => 128 x 128
  • 2 => 256 x 256
  • 3 => unused
ID of slice size. Only present if "local decoding mode" is enabled (indicated by the bit 6 of the gop_flags).
2 luma_levels
  • 0 => no decomposition
  • 1 => 1 level
  • 2 => 2 levels
  • 3 => forbidden
Number of wavelet decomposition levels for the luma plane. Number of resulting wavelet subbands can be calculated using the following equation: num_bands = luma_levels * 3 + 1.
1 chroma_levels
  • 0 => no decomposition
  • 1 => forbidden
Number of wavelet decomposition levels for the chrominance planes. The value of "1" is forbidden because no knowing indeo5 software performs any decomposition of the chrominance planes.
4 pic_size_id Index into the table of the standard picture sizes. If the picture has dimensions not listed in the table then this field contains the value of "15" and the actual picture size will be coded using pic_height and pic_width fields.
13 pic_height pic_size_id == 15 Non-standard picture height.
13 pic_width Non-standard picture width.
variable band_info_luma Array of the Band_info structures describing each luminance band. For a description how to calculate the number of the luminance bands see here: luma_levels.
6-8 band_info_chroma Array of the Band_info structures describing each chrominance band. Because the chrominance planes are being NEVER decomposed by the existing indeo5 software there is only one band per chrominance plane and therefore only one descriptor of this type.
3 alignment1 gop_flags & 0x08 always == 0 Alignment bits. Must be zero.
1 color_flg This flag indicates if the transp_color field is present.
24 transp_color Transparency fill color.
?? alignment2 Align the bitreader on the next byte.
8 value1 Unused.
8 value2 Unused.
3 value3 Unused.
4 value4
1 gop_ext_flg This flag indicates if the gop_ext field is present.
variable gop_ext gop_ext_flg == 1

do { val = getbits(16); } while(val &0x8000);

GOP header extension.
?? alignment3 Align the bitreader on the next byte.

Frame header

This header is present in all kinds of frame except NULL. It's used mainly to transfer a huffman codebook for the macroblock signals and provide checksum information for debugging purposes.

size in bits name condition value(s) comments
8 frame_flags Frame flags (bit 0 is the LSB).
24 pic_hdr_size frame_flags & 0x01 Size of the entire picture header in bytes. Only present in the bitstream if indicated by the frame_flags bit 0.
16 frm_checksum frame_flags & 0x10 Frame checksum for debugging purposes. Only present in the bitstream if indicated by the frame_flags bit 4.
variable frm_hdr_ext frame_flags & 0x20

To skip it, do the following:

do {
   len = getbits(8);
   for (i=0; i < len; i++) skipbits(8);
} while(len);
Unknown frame header extension. Its content will be ignored by the known indeo5 decoders. Only present in the bitstream if indicated by the frame_flags bit 5.
variable mb_huff_desc frame_flags & 0x40 Macroblock huffman codebook descriptor. Only present in the bitstream if indicated by the frame_flags bit 6. For a description of the format of the huffman codebook descriptors see Codebook descriptors in the bitstream.
3 value5 Unused.
?? alignment4 Align the biteader on the next byte.

Band header

This header describes a wavelet band.

size in bits name condition value(s) comments
8 band_flags
  • bit 0 => 1 - this band is empty (doesn't contain any coded data).
  • bit 1 => 1 - motion vector inheritance mode is enabled.
  • bit 2 => 1 - qdelta parameter is present.
  • bit 3 => 1 - qdelta inheritance mode is enabled.
  • bit 4 => 1 - rv_tab_corr array is present.
  • bit 5 => 1 - band_hdr_ext field is present.
  • bit 6 => 1 - rv_tab_sel field is present. Otherwise use the default rv_table.
  • bit 7 => 1 - blk_huff_desc field is present. Otherwise use the default block huffman codebook.
Band flags (bit 0 is the LSB).
24 band_data_size frame_flags & 0x80 Size of the band data in bytes. Only present in the bitstream if indicated by the frame_flags bit 7.
8 num_rv_corr band_flags & 0x10 Number of rv_table correction pairs. Must be <= 61. Only present in the bitstream if indicated by the band_flags bit 4.
variable rv_tab_corr Array of rv_table correction pairs. Its size is num_rv_corr * 2 bytes. Only present in the bitstream if indicated by the band_flags bit 4.
3 rv_tab_sel band_flags & 0x40 Indicates which run-value table should be used for decoding. Only present in the bitstream if indicated by the band_flags bit 6.
variable blk_huff_desc band_flags & 0x80 Block huffman codebook descriptor. Only present in the bitstream if indicated by the band_flags bit 7. For a description of the format of the huffman codebook descriptors see Codebook descriptors in the bitstream.
1 checksum_flag If set band_checksum field is present in the bitstream.
16 band_checksum checksum_flag Band checksum for debugging purposes. Only present in the bitstream if indicated by the checksum_flag.
5 band_glob_quant Global quantization level for this band.
?? alignment5 Align the biteader on the next byte.
variable band_hdr_ext band_flags & 0x20

To skip it, do the following:

do {
   len = getbits(8);
   for (i=0; i < len; i++) skipbits(8);
} while(getbits(1));
Unknown band header extension. Its content will be ignored by the known indeo5 decoders. Only present in the bitstream if indicated by the band_flags bit 5.
?? alignment6 Align the biteader on the next byte.

Scalability mode

This special feature of Indeo5 allows the decoder to adapt playback to the processor power of the particular machine being used for playback. Indeo5 offers both spatial and temporal scalability. Read more about that technique here: Scalable Video Coding.

Spatial scalability

Spatial scalability works by dividing the image into a number of frequency bands using wavelet decomposition. These bands represent the image at a different level of sharpness. All bands are necessary to perfectly recreate the original image. But if there is not enough processor power available, the decoder can decompress fewer bands of each frame, rather than simply dropping frames. This produces blurry images, but preserves the motion.

The scalability mode is controlled by the user during encoding. If this mode is disabled the encoder acts like an usual block-based transform compression algorithm: each of the three color planes will be processed using the Slant transform, quantization and Huffman coding. If the scalability mode is enabled the encoder first performs subband decomposition using the Discrete Wavelet Transform (DWT). Although each color plane could be theoretically decomposed Indeo5 performs that only on the luminance plane data. This decomposition results in four wavelet bands, each of them is one-fourth of the original picture size. Further those band will be compressed using the Slant transform, quantization and Huffman coding.

Wavelet transform

The wavelet used in Indeo5 for decomposition/recomposition purposes is referred as CDF 5/3 or LeGall wavelet. It uses in a slightly different form in many other compression algorithms like JPEG 2000 or Snow. The coefficients for the analysis filters (encoder) are:

 h0 = {-1, 2, 6, 2, -1} * 1/8
 h1 = {1, -2, 1} * 1/4

where "h0" is the low-pass filter and "h1" is the high-pass filter.

The coefficients for the synthesis filters (decoder) are:

 h0 = {1, 2, 1} * 1/2
 h1 = {1, 2, -6, 2, 1} * 1/4

where "h0" is the low-pass filter and "h1" is the high-pass filter.

This wavelet transform has the following advantages:

- it allows an integer implementation

- a fast algorithm (lifting) exists

- it produces better quality images than the Haar wavelet used in Indeo 4 for the same purpose

- it allows the perfect reconstruction of the input signal

Wavelet bands

The Wavelet transform produces four wavelet bands whose properties are summarized in the table below:

band name dimensions frequency components transform
0 LL
  • width = pic_width/2
  • height = pic_height/2
Low freqs in both horizontal and vertical directions 2D Slant 8x8
1 HL
  • width = pic_width/2
  • height = pic_height/2
  • Low freqs in the horizontal direction
  • High freqs in the vertical direction
1D Row Slant
2 LH
  • width = pic_width/2
  • height = pic_height/2
  • High freqs in the horizontal direction
  • Low freqs in the vertical direction
1D Column Slant
3 HH
  • width = pic_width/2
  • height = pic_height/2
High freqs in both horizontal and vertical directions No transform


The type of the transform used to process a particular band is chosen according to its frequency content. The low frequency image components are the most important components for visual sensitivity. Therefore the transform is selected so that it can process the low frequency components more efficiently than the high frequency ones. For example, the two-dimensional slant transform is used to process the band 0 because it contains the low frequency components in both horizontal and vertical directions. But the band 1 contains low frequency components only in the horizontal direction that's why the one-dimensional slant transform applied to each of the 8 rows in a 8x8 block is used. Similar to it, the band 2 uses the one-dimensional slant transform applied to each of the 8 columns in a 8x8 block. The band 3 contains only high frequency components in both directions therefore no transform is applied to its data. This band will be coded using quantization and entropy coding only.

Wavelet recomposition

The following section describes the wavelet recomposition - the last stage of the indeo5 decoder reconstructing an image from a plurality of wavelet bands. It receives up to four separate bands (labeled b0-b3) and generates recomposed plane data by performing two-dimensional wavelet synthesis.

Temporal scalability

In order to achieve the temporal scalability Indeo5 introduces special droppable frames. The main advantage of such frames is that those can be skipped without damaging the whole video sequence. If there is not enough processor power available, the decoder can decompress fewer frames and thus display the video at reduced frame rate.


Planes

Plane data

Needs more analysis. Follows plane header.

size name condition nb times comments
1 value24
1 value25 ! value24 plan_data_size = value25
8 value26 value25 == 1 plan_data_size = value26
24 value27 value26 == 0xFF plan_data_size = value27

Block header

Each plane is split into a number of blocks in the x and y directions. There is one of these headers one after another for each block in the plane.

size name condition nb times comments
1 value28
vlc value29 value28 && plane_state17
1 value30 !value28 && plane_state12 && plane_state1
4 value31 !value28 && four_blocks
1 value32 !value28 && !four_blocks
vlc value33 !value28 && plane_state14 && !plane_state13 && (plane_state17 || value31/2)
vlc value34 !value28 && !(block_state4 & 2) && !plane_state12
vlc value35

The 'plane_state' states come from plane parsing; they are yet to be connected to the previous data.

block_state4 is too complicated to explain here, sorry!

Block data

Follows block header. One of these for each plane that has 'plane_flags&1'. The variable 'run' starts at -1 and carries over from one coded plane to the next. I don't really know what I'm doing with vlc's so the names might not be correct... but their functional description is.

size name condition nb times comments
vlc vlc while (vlc != vlcEnd)
vlc run_add vlc == vlcEsc run += run_add + 1
vlc lindex_lo lindex = lindex_lo | (lindex_hi<<6)
vlc lindex_hi


If vlc != vlcEsc then run_add is run_table[vlc], lindex is lindex_table[vlc].

After each loop, stored coefficient is: block[ scan_table[run] ] = level_tables[run][lindex-1].

The values of vlcEnd and vlcEsc are variable, as is the vlc table itself. However, they are all fixed for all the planes in the same block. run_table, lindex_table, scan_table are also fixed-per-block. level_tables is per-plane.

Annexes

Standard picture sizes

pic_size_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
width 640 320 160 704 352 352 176 240 640 704 80 88 0 0 0 custom
height 480 240 120 224 240 288 144 180 240 240 60 72 0 0 0 custom


Band_info structure

This structure is a part of the GOP header and describes a wavelet band. Its size is usually 6 bits but can be extended up to 8 bits if the ext_trans field is present. The same structure is used to describe both luminance and chrominance bands.

size name condition value(s) comments
1 mv_res
  • 0 - fullpel
  • 1 - halfpel
Motion vector resolution.
1 mb_size_id
  • 0 => double
  • 1 => single
Macroblock size factor. The real size of the macroblock should be calculated as follows: mb_size = blk_size_id << !mb_size_id.
1 blk_size_id
  • 0 => 8x8
  • 1 => 4x4
Block size id.
1 trans_flg
  • 0 => standard
  • 1 => non-standard
If this flag is set the field ext_trans specifies a transform used to code this band explicitely. Otherwise the default transform is used.
2 ext_trans trans_flg != 0
  • 0 => 2D Slant
  • 1 => Row Slant
  • 2 => Column Slant
  • 3 => No transform
Specifies a transform that should be used instead of the default transform for this band.
2 end_marker always == 0 End marker terminating this structure.

Table 1

table1_id 0 1 2 3 4 5 6 default
counter4 10 11 12 13 11 13 13 9
value19
1
2
3
4
4
7
5
5
4
1
2
3
4
4
4
7
5
4
3
3
2
2
4
5
5
5
5
6
4
4
3
1
1
3
3
4
4
5
6
6
4
4
3
2
1
1
3
4
4
5
5
5
6
5
4
2
2
3
4
5
5
5
5
6
4
3
3
2
1
1
3
4
5
5
5
6
5
4
3
3
2
1
1
3
4
4
5
5
5
6
5
5

default is used when !(ph_flags & 0x80)

Games using indeo5 cutscenes

  • Dino Crisis I: [1]
  • Lemmings Revolution: [2]
  • Mafia: The City of Lost Heaven [3]
  • Thief, Parts 1 & 2 : [4]