# On2 VP5

- FourCC: VP50
- Company: On2
- Samples: http://samples.mplayerhq.hu/V-codecs/VP5/

## Format

The aim here is to open this standard with a full description of the bitstream format and decoding process. Contributors from On2 especially encouraged here, but it is anticipated that this section will be completed through reverse engineering.

Please do not submit any copyrighted text or code here.

### Introduction

VP5 uses unidirectional ("P-frame") and intra-frame (within the current frame) prediction. Entropy coding is performed using range coding and an 8x8 iDCT is used. The format supports dynamic adjustment of encoded video resolution.

### Macroblock types

Each video frame is composed of an array of 16x16 macroblocks, just like MPEG-2, MPEG-4 parts 2 and 10. Each MB (macroblock) takes one of the following modes ("MV" means "motion vector"):

- INTRA (1): Intra MB
- INTER_NOVEC_PF (0): Inter MB, null MV, previous frame reference
- INTER_DELTA_PF (2): Inter MB, differential MV, previous frame reference
- INTER_4V (7): Inter MB, four MVs, previous frame reference
- INTER_V1_PF (3): Inter MB, MV 1, previous frame reference
- INTER_V2_PF (4): Inter MB, MV 2, previous frame reference
- INTER_NOVEC_GF (5): Inter MB, null MV, golden frame reference
- INTER_DELTA_GF (6): Inter MB, differential MV, golden frame reference
- INTER_V1_GF (8): Inter MB, MV 1, golden frame reference
- INTER_V2_GF (9): Inter MB, MV 2, golden frame reference

### Frame Header

Range coding commences at the very begining of this frame header.

Syntax | Number of bits | Probability | Semantics |
---|---|---|---|

frame_mode | 1 | 128 | 0 signifies an intra frame |

value0 | 1 | 128 | unused |

qp | 6 | 128 | Quantization parameter valid range 0..63 |

if (frame_mode == 0) { | 0 signifies intra frame | ||

version_minor | 8 | 128 | should be 0 |

version_major | 5 | 128 | should be 5 |

value3 | 2 | 128 | unknown |

interlace | 1 | 128 | true (1) means interlace will be used |

dim_y | 8 | 128 | Macroblock height of video |

dim_x | 8 | 128 | Macroblock width of video |

render_y | 8 | 128 | Displayed macroblock height of video |

render_x | 8 | 128 | Displayed macroblock width of video |

value4 | 2 | 128 | unknown |

} |

If dim_x or dim_y have values different from the previous intra frame, then the resolution of the encoded image has changed.

### Model for macroblock type parsing

This section allows to calculate a model used to parse macroblock type. This model is stored in the mb_type_model[3][10][10] table. To calculate this model, an intermediate table (mb_types_stats[3][10][2]) is extracted from the input stream. The macroblock type model is useful only for non-key frames (ie. frames which can contain macroblock type other than intra). So this procedure is not used for intra frame, which instead reset the value of mb_types_stats to the content of def_mb_types_stats.

So first here is how the input stream must be parsed:

Syntax | Number of bits | Probability | Semantics |
---|---|---|---|

for (ctx=0; ctx<3; ctx ) { | |||

if (flag1) { | 1 | 174 | |

index | 4 | 128 | |

mb_types_stats[ctx] = pre_def_mb_type_stats[index][ctx] | |||

} | |||

if (flag2) { | 1 | 254 | |

for (type=0; type<10; type ) { | |||

for(i=0; i<2; i ) { | |||

if (flag3) { | 1 | 205 | |

sign | 1 | 128 | |

if (flag4) { | 1 | 171 | |

if (flag5) { | 1 | 199 | |

delta | 7 | 128 | quater of delta... |

} else if (flag6) { | 1 | 140 | |

delta = 12 | |||

} else if (flag7) { | 1 | 125 | |

delta = 16 | |||

} else if (flag8) { | 1 | 104 | |

delta = 20 | |||

} else { | |||

delta = 24 | |||

} | |||

} else { | |||

if (flag9) { | 1 | 83 | |

delta = 4 | |||

} else { | |||

delta = 8 | |||

} | |||

} | |||

if (sign) { | |||

delta = -delta | |||

} | |||

mb_types_stats[ctx][type][i] = delta | |||

} | |||

} | |||

} | |||

} | |||

} |

Then mb_type_model is derived from mb_types_stats. The first index of mb_type_model is the number of predictors (0, 1 or 2). The second index is the type of the previous macroblock. A probablity is associated to each value of the third index as follow:

- 0 : probability for MB type to be the same as previous MB type
- 1 : probability for MB type to be in {INTER_NOVEC_PF, INTER_DELTA_PF, INTER_V1_PF, INTER_V2_PF}
- 2 : probability for MB type to be in {INTER_NOVEC_PF, INTER_DELTA_PF} knowing it's in {INTER_NOVEC_PF, INTER_DELTA_PF, INTER_V1_PF, INTER_V2_PF}
- 3 : probability for MB type to be in {INTRA, INTER_4V} knowing it's in {INTRA, INTER_NOVEC_GF, INTER_DELTA_GF, INTER_4V, INTER_V1_GF, INTER_V2_GF}
- 4 : probability for MB type to be in {INTER_NOVEC_PF} knowing it's in {INTER_NOVEC_PF, INTER_DELTA_PF}
- 5 : probability for MB type to be in {INTER_V1_PF} knowing it's in {INTER_V1_PF, INTER_V2_PF}
- 6 : probability for MB type to be in {INTRA} knowing it's in {INTRA, INTER_4V}
- 7 : probability for MB type to be in {INTER_NOVEC_GF, INTER_DELTA_GF} knowing it's in {INTER_NOVEC_GF, INTER_DELTA_GF, INTER_V1_GF, INTER_V2_GF}
- 8 : probability for MB type to be in {INTER_NOVEC_GF} knowing it's in {INTER_NOVEC_GF, INTER_DELTA_GF}
- 9 : probability for MB type to be in {INTER_V1_GF} knowing it's in {INTER_V1_GF, INTER_V2_GF}

mb_type_model[ctx][type][0] is calculated with the following formula:

255 - (255 * mb_types_stats[ctx][type][0]) / (1 mb_types_stats[ctx][type][0] mb_types_stats[ctx][type][1])

All the other probabilities are conditional probabilities noted P(A|B) and are calculated with:

1 255 * P(A) / (1 P(B))

For each macroblock type, it's probability is noted P(A) and calculated with:

100 * mb_types_stats[ctx][type][1]

with one exception: when calculating mb_type_model[ctx][type][i], P(type) is 0.

### Models for macroblock type parsing

This header contains various models used to parse motion vectors. It is only present in non-key frames. For key frames, all those modeles are filled with 0x80 instead, except vector_model_pdi[comp][0] which is filled with 0x55.

Syntax | Number of bits | Probability | Semantics |
---|---|---|---|

for (comp=0; comp<2; comp ) { | |||

if (flag1) { | 1 | vmc_pct[comp][0] | |

vector_model_dct[comp] | 7 | 128 | |

} | |||

if (flag2) { | 1 | vmc_pct[comp][1] | |

vector_model_sig[comp] | 7 | 128 | |

} | |||

if (flag3) { | 1 | vmc_pct[comp][2] | |

vector_model_pdi[comp][0] | 7 | 128 | |

} | |||

if (flag4) { | 1 | vmc_pct[comp][3] | |

vector_model_pdi[comp][1] | 7 | 128 | |

} | |||

} | |||

for (comp=0; comp<2; comp ) { | |||

for (node=0; node<7; node ) { | |||

if (flag5) { | 1 | vmc_pct[comp][4 node] | |

vector_model_pdv[comp][node] | 7 | 128 | |

} | |||

} | |||

} |

All the value in thoses models must be multiplied by 2. Those models can't contain a 0 value, so every 0 must then be replaced by a 1.

### Models for DCT coefficient parsing

This header contains various models used to parse DCT coefficient. Theses parsing are using a default_probability table which is initially filled with 0x80. The value read from the bitstream must be multiplied by 2. If the value is 0, it must be replaced by 1. The last two models are derived from the first two.

#### DC coefficient model

Syntax | Number of bits | Probability | Semantics |
---|---|---|---|

for (pt=0; pt<2; pt ) { | |||

for (node=0; node<11; node ) { | |||

if (flag1) { | 1 | dccv_pct[pt][node] | |

coeff_model_dccv[pt][node] = default_probability[node] | 7 | 128 | |

} else if (frame_mode == 0) { | |||

coeff_model_dccv[pt][node] = default_probability[node] | |||

} | |||

} | |||

} |

#### AC coefficient model

The value read from the bitstream must be multiplied by 2. If the value is 0, it must be replaced by 1.

Syntax | Number of bits | Probability | Semantics |
---|---|---|---|

for (ct=0; ct<3; ct ) { | |||

for (pt=0; pt<2; pt ) { | |||

for (cg=0; cg<6; cg ) { | |||

for (node=0; node<11; node ) { | |||

if (flag1) { | 1 | ract_pct[ct][pt][cg][node] | |

coeff_model_ract[pt][ct][cg][node] = default_probability[node] | 7 | 128 | |

} else if (frame_mode == 0) { | |||

coeff_model_ract[pt][ct][cg][node] = default_probability[node] | |||

} | |||

} | |||

} | |||

} | |||

} |

#### DC coefficient coding type model

Every value in this model is calculated with the folowing formula:

coeff_model_dcct[pt][ctx][node] = ceil((coeff_model_dccv[pt][node] * dccv_lc[node][ctx][0]) / 256) dccv_lc[node][ctx][1];

All the values in this model must be clipped between 1 and 254.

#### AC coefficient coding type model

Every value in this model is calculated with the folowing formula:

coeff_model_act[pt][ct][cg][ctx][node] = ceil((coeff_model_ract[pt][ct][cg][node] * ract_lc[ct][cg][node][ctx][0]) / 256) ract_lc[ct][cg][node][ctx][1];

All the values in this model must be clipped between 1 and 254.

### Macroblocks

Here starts the main macroblocks parsing loop. Macroblocks are coded from left to right and from top to bottom of the final picture.

#### MB type

For non-key frames, MB parsing begin with the decoding of the type of the MB.

First we need to get the ctx value from vector_predictor with frame_type set to FRAME_PREVIOUS. Then we select the proper model which is mb_type_model[ctx][prev_type] (prev_type is the type of the MB which was parsed just before this one). The MB type parsing itself is done according to the selected mb_type_model which defines a binary tree of probabilities. This tree is described right below. In this tree, each [x] means get 1 bit in the input stream using mb_type_model[x] as probability for this bit. If bit is 0, go to the top branch, if it's 1, go to the bottom branch. The leaf indicate the final type of this MB.

-- INTER_NOVEC_PF | -- [4] -- | | | -- INTER_DELTA_PF | -- [2] -- | | | | -- INTER_V1_PF | | | | -- [5] -- | | | -- INTER_V2_PF | -- [1] -- | | | | -- INTRA | | | | | -- [6] -- | | | | | | | -- INTER_4V | | | | -- [3] -- | | | | -- INTER_NOVEC_GF | | | | | -- [8] -- | | | | | | | -- INTER_DELTA_GF | | | | -- [7] -- | | | | -- INTER_V1_GF | | | | -- [9] -- | | | -- INTER_V2_GF | -- [0] -- | -- prev_type

#### Motion vector

The motion vector applies to all the blocks in the MB except in the case of four motion vectors There are different kind of MV decoding depending on MB type :

##### Straight vector candidate

Apply to the following MB types: INTER_V1_PF, INTER_V2_PF, INTER_V1_GF, INTER_V2_GF

The motion vector is the first (for V1) or second (for V2) candidate from vector_predictor with frame_type set either to FRAME_PREVIOUS (for PF) or to FRAME_GOLDEN (for GF).

##### Delta vector

Apply to the following MB types: INTER_DELTA_PF, INTER_DELTA_GF

Default delta, di1, di2 and neg value is 0. You then need to apply the following parsing first with comp = 0 (for component x of the vector) and then with comp = 1 (for component y of the vector):

Syntax | Number of bits | Probability | Semantics |
---|---|---|---|

if (flag1) { | 1 | vector_model_dct[comp] | |

neg | 1 | vector_model_sig[comp] | |

di1 | 1 | vector_model_pdi[comp][0] | |

di2 | 1 | vector_model_pdi[comp][1] | |

delta | here a binary tree parsing happens (see below) | ||

} |

The delta parsing itself is done according to vector_model_pdv[comp] which defines a binary tree of probabilities. This tree is described right below. In this tree, each [x] means get 1 bit in the input stream using vector_model_pdv[comp][x] as probability for this bit. If bit is 0, go to the top branch, if it's 1, go to the bottom branch. The leaf indicate the final value of delta.

-- 0 -- [2] -- | -- 1 -- [1] -- | | -- 2 | -- [3] -- | -- 3 -- [0] -- | -- 4 | -- [5] -- | | -- 5 -- [4] -- | -- 6 -- [6] -- -- 7

When the value of delta, di1, di2 and neg are parsed, the value of the current vector component can be calculated this way:

component = (1-2*neg) * (di1 | (di2 << 1) | (delta << 2))

##### Four motion vectors

Apply to the following MB types: INTER_4V

For this type of MB each of the 4 luma block of the MB use a different MV. So each of this 4 luma block has it's own block type. This block type can only be one which refers to PREVIOUS_FRAME (ie. INTER_V1_PF, INTER_V2_PF, INTER_DELTA_PF, INTER_NOVEC_PF).

Those 4 block types should be parsed this way:

Syntax | Number of bits | Probability | Semantics |
---|---|---|---|

for (block=0; block<4; block ) { | |||

type[block] | 2 | 128 | |

if (type[block] != INTER_NOVEC_PF) { | |||

type[block] | |||

} | |||

} |

Then for each of the 4 luma blocks it's MV is calculated the same way as if it were a full MB, according to it's block type.

The MV of the 2 chroma block is then calculated as the average of the 4 luma MV.

##### Zero vector

Apply to all the other MB types.

This is simply a null vector.

#### MB coefficients

This is the real content of the MB, ie. the data which will be feeded to the Inverse DCT.

This procedure is repeated 6 times (for each block in the MB, with block_num being the num of the block in the MB between 0 and 5). It uses several initialized variables :

- ct: code type, initialized to 1
- pt: plane type, initialized to 0 for luma blocks and 1 for chroma blocks
- ctx_last: initialized to the maximum coeff_idx of previous block on the same row and normalized to a maximal value of 24 (initialized to 24 at the start of a row).
- above_block_ndc: initialized to 0 if the DC coefficient of the block just above the current one is null, 1 else
- bi: initialized to 0 for blocks 0 and 1, 1 for blocks 2 and 3, 2 for block 4, and 3 for block 5
- coeff_ctx: this array is filled with 0 at the begining of each macroblock row

for (coeff_idx=0; coeff_idx<64; coeff_idx ) { /* select the correct models */ if (coeff_idx == 0) { /* DC coeff */ ctx = 6*coeff_ctx[bi][0] above_block_ndc; m1 = coeff_model_dccv[pt]; m2 = coeff_model_dcct[pt][ctx]; } else { /* AC coeff */ int cg = coeff_groups[coeff_idx]; ctx = coeff_ctx[bi][coeff_idx]; m1 = coeff_model_ract[pt][ct][cg]; m2 = cg > 2 ? model : coeff_model_act[pt][ct][cg][ctx]; } /* * Here applies a binary tree parssing as defined below. */ block_coeff[block_num][coeff_idx] = (coeff ^ -sign) sign; coeff_ctx[bi][coeff_idx] = cctx; } if (coeff_idx < ctx_last) { for (i=coeff_idx; i<=ctx_last; i ) coeff_ctx[bi][i] = 5; }

The following binary tree parsing allows to extract coefficients of the block plus other parameters useful for the decoding procedure itself. In this tree, each m1[x] or m2[x] means get 1 bit in the input stream using m1[x] or m2[x] as probability for this bit. If bit is 0, go to the top branch, if it's 1, go to the bottom branch. The leaf indicate the value extracted at the end of this tree parsing.

-- end of this block (break out of the loop) | - m2[1] - | | | if !ct | --------- -- ct=0, cctx=0 | - m2[0] - | | -- ct=1, cctx=1, sign=get_bit(), coeff=1 | | - m2[2] - | -- ct=2, cctx=2, sign=get_bit(), coeff=2 | | | - m2[4] - | | | -- ct=2, cctx=3, sign=get_bit(), coeff=3 | | | | | | - m2[5] - | | | | | -- ct=2, cctx=3, sign=get_bit(), coeff=4 - m2[3] - | | | -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(0) | | | - m1[7] - | | | | | -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(1) | | - m1[6] - | | -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(2) | | | - m1[9] - | | | | | -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(3) | | - m1[8] - | | -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(4) | | - m1[10] | -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(5)

The coeff_parse algorithm used by the previous binary tree parsing takes one parameter called idx and behave like this :

Syntax | Number of bits | Probability | Semantics |
---|---|---|---|

coeff = coeff_bias[idx] | |||

for (i=coeff_bit_length[idx]; i>=0; i--) { | |||

v | 1 | coeff_parse_table[idx][i] | |

coeff = v << i | |||

} |

## Algorithms

### Entropy Coding

Described here is the decoding process for the arithmetically-coded (AC) parts of the bitstream. VP5 uses a 16-bit range coding scheme to code binary symbols.

The AC decoder maintains three state variables: *code*, *mask* and *high*.

#### Initialization

At initialization, the first two bytes of the AC bitstream are shifted into *code*. The variable *high* is set to 0xff00. The variable *mask* is set to 0xffff.

#### Decoding a Binary Value

Each binary symbol has an associated probability *p* in the range 0 to 0xff.

A threshold, *t*, is computed thus:

*t*= 0x100 ( 0xff00