On2 VP5: Difference between revisions

Revision as of 07:46, 11 April 2007

FourCC: VP50
Company: On2
Samples: http://samples.mplayerhq.hu/V-codecs/VP5/

Format

The aim here is to open this standard with a full description of the bitstream format and decoding process. Contributors from On2 especially encouraged here, but it is anticipated that this section will be completed through reverse engineering.

Please do not submit any copyrighted text or code here.

Introduction

VP5 uses unidirectional ("P-frame") and intra-frame (within the current frame) prediction. Entropy coding is performed using range coding and an 8x8 iDCT is used. The format supports dynamic adjustment of encoded video resolution.

Macroblock types

Each video frame is composed of an array of 16x16 macroblocks, just like MPEG-2, MPEG-4 parts 2 and 10. Each MB (macroblock) takes one of the following modes ("MV" means "motion vector"):

INTRA (1): Intra MB
INTER_NOVEC_PF (0): Inter MB, null MV, previous frame reference
INTER_DELTA_PF (2): Inter MB, differential MV, previous frame reference
INTER_4V (7): Inter MB, four MVs, previous frame reference
INTER_V1_PF (3): Inter MB, MV 1, previous frame reference
INTER_V2_PF (4): Inter MB, MV 2, previous frame reference
INTER_NOVEC_GF (5): Inter MB, null MV, golden frame reference
INTER_DELTA_GF (6): Inter MB, differential MV, golden frame reference
INTER_V1_GF (8): Inter MB, MV 1, golden frame reference
INTER_V2_GF (9): Inter MB, MV 2, golden frame reference

Frame Header

Range coding commences at the very begining of this frame header.

Syntax	Number of bits	Probability	Semantics
frame_mode	1	128	0 signifies an intra frame
value0	1	128	unused
qp	6	128	Quantization parameter valid range 0..63
if (frame_mode == 0) {			0 signifies intra frame
version_minor	8	128	should be 0
version_major	5	128	should be 5
value3	2	128	unknown
interlace	1	128	true (1) means interlace will be used
dim_y	8	128	Macroblock height of video
dim_x	8	128	Macroblock width of video
render_y	8	128	Displayed macroblock height of video
render_x	8	128	Displayed macroblock width of video
value4	2	128	unknown
}

If dim_x or dim_y have values different from the previous intra frame, then the resolution of the encoded image has changed.

Model for macroblock type parsing

This section allows to calculate a model used to parse macroblock type. This model is stored in the mb_type_model[3][10][10] table. To calculate this model, an intermediate table (mb_types_stats[3][10][2]) is extracted from the input stream. The macroblock type model is useful only for non-key frames (ie. frames which can contain macroblock type other than intra). So this procedure is not used for intra frame, which instead reset the value of mb_types_stats to the content of def_mb_types_stats.

So first here is how the input stream must be parsed:

Syntax	Number of bits	Probability	Semantics
for (ctx=0; ctx<3; ctx ) {
if (flag1) {	1	174
index	4	128
mb_types_stats[ctx] = pre_def_mb_type_stats[index][ctx]
}
if (flag2) {	1	254
for (type=0; type<10; type ) {
for(i=0; i<2; i ) {
if (flag3) {	1	205
sign	1	128
if (flag4) {	1	171
if (flag5) {	1	199
delta	7	128	quater of delta...
} else if (flag6) {	1	140
delta = 12
} else if (flag7) {	1	125
delta = 16
} else if (flag8) {	1	104
delta = 20
} else {
delta = 24
}
} else {
if (flag9) {	1	83
delta = 4
} else {
delta = 8
}
}
if (sign) {
delta = -delta
}
mb_types_stats[ctx][type][i] = delta
}
}
}
}
}

Then mb_type_model is derived from mb_types_stats. The first index of mb_type_model is the number of predictors (0, 1 or 2). The second index is the type of the previous macroblock. A probablity is associated to each value of the third index as follow:

0 : probability for MB type to be the same as previous MB type
1 : probability for MB type to be in {INTER_NOVEC_PF, INTER_DELTA_PF, INTER_V1_PF, INTER_V2_PF}
2 : probability for MB type to be in {INTER_NOVEC_PF, INTER_DELTA_PF} knowing it's in {INTER_NOVEC_PF, INTER_DELTA_PF, INTER_V1_PF, INTER_V2_PF}
3 : probability for MB type to be in {INTRA, INTER_4V} knowing it's in {INTRA, INTER_NOVEC_GF, INTER_DELTA_GF, INTER_4V, INTER_V1_GF, INTER_V2_GF}
4 : probability for MB type to be in {INTER_NOVEC_PF} knowing it's in {INTER_NOVEC_PF, INTER_DELTA_PF}
5 : probability for MB type to be in {INTER_V1_PF} knowing it's in {INTER_V1_PF, INTER_V2_PF}
6 : probability for MB type to be in {INTRA} knowing it's in {INTRA, INTER_4V}
7 : probability for MB type to be in {INTER_NOVEC_GF, INTER_DELTA_GF} knowing it's in {INTER_NOVEC_GF, INTER_DELTA_GF, INTER_V1_GF, INTER_V2_GF}
8 : probability for MB type to be in {INTER_NOVEC_GF} knowing it's in {INTER_NOVEC_GF, INTER_DELTA_GF}
9 : probability for MB type to be in {INTER_V1_GF} knowing it's in {INTER_V1_GF, INTER_V2_GF}

mb_type_model[ctx][type][0] is calculated with the following formula:

 255 - (255 * mb_types_stats[ctx][type][0]) / (1   mb_types_stats[ctx][type][0]   mb_types_stats[ctx][type][1])

All the other probabilities are conditional probabilities noted P(A|B) and are calculated with:

 1   255 * P(A) / (1 P(B))

For each macroblock type, it's probability is noted P(A) and calculated with:

 100 * mb_types_stats[ctx][type][1]

with one exception: when calculating mb_type_model[ctx][type][i], P(type) is 0.

Models for macroblock type parsing

This header contains various models used to parse motion vectors. It is only present in non-key frames. For key frames, all those modeles are filled with 0x80 instead, except vector_model_pdi[comp][0] which is filled with 0x55.

Syntax	Number of bits	Probability
for (comp=0; comp<2; comp ) {
if (flag1) {	1	vmc_pct[comp][0]
vector_model_dct[comp]	7	128
}
if (flag2) {	1	vmc_pct[comp][1]
vector_model_sig[comp]	7	128
}
if (flag3) {	1	vmc_pct[comp][2]
vector_model_pdi[comp][0]	7	128
}
if (flag4) {	1	vmc_pct[comp][3]
vector_model_pdi[comp][1]	7	128
}
}
for (comp=0; comp<2; comp ) {
for (node=0; node<7; node ) {
if (flag5) {	1	vmc_pct[comp][4 node]
vector_model_pdv[comp][node]	7	128
}
}
}

All the value in thoses models must be multiplied by 2. Those models can't contain a 0 value, so every 0 must then be replaced by a 1.

Models for DCT coefficient parsing

This header contains various models used to parse DCT coefficient. Theses parsing are using a default_probability table which is initially filled with 0x80. The value read from the bitstream must be multiplied by 2. If the value is 0, it must be replaced by 1. The last two models are derived from the first two.

DC coefficient model

Syntax	Number of bits	Probability
for (pt=0; pt<2; pt ) {
for (node=0; node<11; node ) {
if (flag1) {	1	dccv_pct[pt][node]
coeff_model_dccv[pt][node] = default_probability[node]	7	128
} else if (frame_mode == 0) {
coeff_model_dccv[pt][node] = default_probability[node]
}
}
}

AC coefficient model

The value read from the bitstream must be multiplied by 2. If the value is 0, it must be replaced by 1.

Syntax	Number of bits	Probability
for (ct=0; ct<3; ct ) {
for (pt=0; pt<2; pt ) {
for (cg=0; cg<6; cg ) {
for (node=0; node<11; node ) {
if (flag1) {	1	ract_pct[ct][pt][cg][node]
coeff_model_ract[pt][ct][cg][node] = default_probability[node]	7	128
} else if (frame_mode == 0) {
coeff_model_ract[pt][ct][cg][node] = default_probability[node]
}
}
}
}
}

DC coefficient coding type model

Every value in this model is calculated with the folowing formula:

 coeff_model_dcct[pt][ctx][node] = ceil((coeff_model_dccv[pt][node] * dccv_lc[node][ctx][0]) / 256)
                                     dccv_lc[node][ctx][1];

All the values in this model must be clipped between 1 and 254.

AC coefficient coding type model

Every value in this model is calculated with the folowing formula:

 coeff_model_act[pt][ct][cg][ctx][node] = ceil((coeff_model_ract[pt][ct][cg][node] * ract_lc[ct][cg][node][ctx][0]) / 256)
                                            ract_lc[ct][cg][node][ctx][1];

All the values in this model must be clipped between 1 and 254.

Macroblocks

Here starts the main macroblocks parsing loop. Macroblocks are coded from left to right and from top to bottom of the final picture.

MB type

For non-key frames, MB parsing begin with the decoding of the type of the MB.

First we need to get the ctx value from vector_predictor with frame_type set to FRAME_PREVIOUS. Then we select the proper model which is mb_type_model[ctx][prev_type] (prev_type is the type of the MB which was parsed just before this one). The MB type parsing itself is done according to the selected mb_type_model which defines a binary tree of probabilities. This tree is described right below. In this tree, each [x] means get 1 bit in the input stream using mb_type_model[x] as probability for this bit. If bit is 0, go to the top branch, if it's 1, go to the bottom branch. The leaf indicate the final type of this MB.

                                        -- INTER_NOVEC_PF
                                       |
                              -- [4] -- 
                             |         |
                             |          -- INTER_DELTA_PF
                             |
                    -- [2] -- 
                   |         |
                   |         |          -- INTER_V1_PF
                   |         |         |
                   |          -- [5] -- 
                   |                   |
                   |                    -- INTER_V2_PF
                   |
          -- [1] -- 
         |         |
         |         |                    -- INTRA
         |         |                   |
         |         |          -- [6] -- 
         |         |         |         |
         |         |         |          -- INTER_4V
         |         |         |
         |          -- [3] -- 
         |                   |
         |                   |                    -- INTER_NOVEC_GF
         |                   |                   |
         |                   |          -- [8] -- 
         |                   |         |         |
         |                   |         |          -- INTER_DELTA_GF
         |                   |         |
         |                    -- [7] -- 
         |                             |
         |                             |          -- INTER_V1_GF
         |                             |         |
         |                              -- [9] -- 
         |                                       |
         |                                        -- INTER_V2_GF
         |
-- [0] -- 
         |
          -- prev_type

Motion vector

The motion vector applies to all the blocks in the MB except in the case of four motion vectors There are different kind of MV decoding depending on MB type :

Straight vector candidate

Apply to the following MB types: INTER_V1_PF, INTER_V2_PF, INTER_V1_GF, INTER_V2_GF

The motion vector is the first (for V1) or second (for V2) candidate from vector_predictor with frame_type set either to FRAME_PREVIOUS (for PF) or to FRAME_GOLDEN (for GF).

Delta vector

Apply to the following MB types: INTER_DELTA_PF, INTER_DELTA_GF

Default delta, di1, di2 and neg value is 0. You then need to apply the following parsing first with comp = 0 (for component x of the vector) and then with comp = 1 (for component y of the vector):

Syntax	Number of bits	Probability	Semantics
if (flag1) {	1	vector_model_dct[comp]
neg	1	vector_model_sig[comp]
di1	1	vector_model_pdi[comp][0]
di2	1	vector_model_pdi[comp][1]
delta			here a binary tree parsing happens (see below)
}

The delta parsing itself is done according to vector_model_pdv[comp] which defines a binary tree of probabilities. This tree is described right below. In this tree, each [x] means get 1 bit in the input stream using vector_model_pdv[comp][x] as probability for this bit. If bit is 0, go to the top branch, if it's 1, go to the bottom branch. The leaf indicate the final value of delta.

                              -- 0
                    -- [2] -- 
                   |          -- 1
          -- [1] -- 
         |         |          -- 2
         |          -- [3] -- 
         |                    -- 3
-- [0] -- 
         |                    -- 4
         |          -- [5] -- 
         |         |          -- 5
          -- [4] -- 
                   |          -- 6
                    -- [6] -- 
                              -- 7

When the value of delta, di1, di2 and neg are parsed, the value of the current vector component can be calculated this way:

 component = (1-2*neg) * (di1 | (di2 << 1) | (delta << 2))

Four motion vectors

Apply to the following MB types: INTER_4V

For this type of MB each of the 4 luma block of the MB use a different MV. So each of this 4 luma block has it's own block type. This block type can only be one which refers to PREVIOUS_FRAME (ie. INTER_V1_PF, INTER_V2_PF, INTER_DELTA_PF, INTER_NOVEC_PF).

Those 4 block types should be parsed this way:

Syntax	Number of bits	Probability
for (block=0; block<4; block ) {
type[block]	2	128
if (type[block] != INTER_NOVEC_PF) {
type[block]
}
}

Then for each of the 4 luma blocks it's MV is calculated the same way as if it were a full MB, according to it's block type.

The MV of the 2 chroma block is then calculated as the average of the 4 luma MV.

Zero vector

Apply to all the other MB types.

This is simply a null vector.

MB coefficients

This is the real content of the MB, ie. the data which will be feeded to the Inverse DCT.

This procedure is repeated 6 times (for each block in the MB, with block_num being the num of the block in the MB between 0 and 5). It uses several initialized variables :

ct: code type, initialized to 1
pt: plane type, initialized to 0 for luma blocks and 1 for chroma blocks
ctx_last: initialized to the maximum coeff_idx of previous block on the same row and normalized to a maximal value of 24 (initialized to 24 at the start of a row).
above_block_ndc: initialized to 0 if the DC coefficient of the block just above the current one is null, 1 else
bi: initialized to 0 for blocks 0 and 1, 1 for blocks 2 and 3, 2 for block 4, and 3 for block 5
coeff_ctx: this array is filled with 0 at the begining of each macroblock row

 for (coeff_idx=0; coeff_idx<64; coeff_idx  ) {
     /* select the correct models */
     if (coeff_idx == 0) {  /* DC coeff */
         ctx = 6*coeff_ctx[bi][0]   above_block_ndc;
         m1 = coeff_model_dccv[pt];
         m2 = coeff_model_dcct[pt][ctx];
     } else {               /* AC coeff */
         int cg = coeff_groups[coeff_idx];
         ctx = coeff_ctx[bi][coeff_idx];
         m1 = coeff_model_ract[pt][ct][cg];
         m2 = cg > 2 ? model : coeff_model_act[pt][ct][cg][ctx];
     }
     
     /*
      * Here applies a binary tree parssing as defined below.
      */
     
     block_coeff[block_num][coeff_idx] = (coeff ^ -sign)   sign;
     coeff_ctx[bi][coeff_idx] = cctx;
 }
 
 if (coeff_idx < ctx_last) {
     for (i=coeff_idx; i<=ctx_last; i  )
         coeff_ctx[bi][i] = 5;
 }

The following binary tree parsing allows to extract coefficients of the block plus other parameters useful for the decoding procedure itself. In this tree, each m1[x] or m2[x] means get 1 bit in the input stream using m1[x] or m2[x] as probability for this bit. If bit is 0, go to the top branch, if it's 1, go to the bottom branch. The leaf indicate the value extracted at the end of this tree parsing.

                    -- end of this block (break out of the loop)
                   |
          - m2[1] - 
         |         |
         | if !ct  |
          --------- -- ct=0, cctx=0
         |
- m2[0] - 
         |
         |          -- ct=1, cctx=1, sign=get_bit(), coeff=1
         |         |
          - m2[2] - 
                   |                    -- ct=2, cctx=2, sign=get_bit(), coeff=2
                   |                   |
                   |          - m2[4] - 
                   |         |         |          -- ct=2, cctx=3, sign=get_bit(), coeff=3
                   |         |         |         |
                   |         |          - m2[5] - 
                   |         |                   |
                   |         |                    -- ct=2, cctx=3, sign=get_bit(), coeff=4
                    - m2[3] - 
                             |
                             |
                             |                    -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(0)
                             |                   |
                             |          - m1[7] - 
                             |         |         |
                             |         |          -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(1)
                             |         |
                              - m1[6] - 
                                       |
                                       |                    -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(2)
                                       |                   |
                                       |          - m1[9] - 
                                       |         |         |
                                       |         |          -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(3)
                                       |         |
                                        - m1[8] - 
                                                 |
                                                 |          -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(4)
                                                 |         |
                                                  - m1[10]  
                                                           |
                                                            -- ct=2, cctx=4, sign=get_bit(), coeff=coeff_parse(5)

The coeff_parse algorithm used by the previous binary tree parsing takes one parameter called idx and behave like this :

Syntax	Number of bits	Probability
coeff = coeff_bias[idx]
for (i=coeff_bit_length[idx]; i>=0; i--) {
v	1	coeff_parse_table[idx][i]
coeff = v << i
}

Algorithms

Entropy Coding

Described here is the decoding process for the arithmetically-coded (AC) parts of the bitstream. VP5 uses a 16-bit range coding scheme to code binary symbols.

The AC decoder maintains three state variables: code, mask and high.

Initialization

At initialization, the first two bytes of the AC bitstream are shifted into code. The variable high is set to 0xff00. The variable mask is set to 0xffff.

Decoding a Binary Value

Each binary symbol has an associated probability p in the range 0 to 0xff.

A threshold, t, is computed thus:

t = 0x100 ( 0xff00

On2 VP5: Difference between revisions

Revision as of 07:46, 11 April 2007

Contents

Format

Introduction

Macroblock types

Frame Header

Model for macroblock type parsing

Models for macroblock type parsing

Models for DCT coefficient parsing

DC coefficient model

AC coefficient model

DC coefficient coding type model

AC coefficient coding type model

Macroblocks

MB type

Motion vector

Straight vector candidate

Delta vector

Four motion vectors

Zero vector

MB coefficients

Algorithms

Entropy Coding

Initialization

Decoding a Binary Value

Navigation menu

On2 VP5: Difference between revisions

Revision as of 07:46, 11 April 2007

Format

Introduction

Macroblock types

Frame Header

Model for macroblock type parsing

Models for macroblock type parsing

Models for DCT coefficient parsing

DC coefficient model

AC coefficient model

DC coefficient coding type model

AC coefficient coding type model

Macroblocks

MB type

Motion vector

Straight vector candidate

Delta vector

Four motion vectors

Zero vector

MB coefficients

Algorithms

Entropy Coding

Initialization

Decoding a Binary Value

Navigation menu

Search