Difference between revisions of "H.264 Prediction"

From MultimediaWiki
Jump to navigation Jump to search
(add the VP8 prediction modes)
(for clarity, don't list predictors in the table if they're not used in any calculations for that mode)
Line 12: Line 12:
* VP8: not used
* VP8: not used


  LT | T0  T1  T2  T3
    | T0  T1  T2  T3
  ---------------------
  ---------------------
  L0 | T0  T1  T2  T3
    | T0  T1  T2  T3
  L1 | T0  T1  T2  T3
    | T0  T1  T2  T3
  L2 | T0  T1  T2  T3
    | T0  T1  T2  T3
  L3 | T0  T1  T2  T3
    | T0  T1  T2  T3


=== Vertical (VP8) ===
=== Vertical (VP8) ===
Line 28: Line 28:
   LT | T0  T1  T2  T3  T4
   LT | T0  T1  T2  T3  T4
  ------------------------
  ------------------------
  L0 |  a  b  c  d
    |  a  b  c  d
  L1 |  a  b  c  d
    |  a  b  c  d
  L2 |  a  b  c  d
    |  a  b  c  d
  L3 |  a  b  c  d
    |  a  b  c  d


where:
where:
Line 47: Line 47:
* VP8: not used
* VP8: not used


  LT | T0  T1  T2  T3
    |  
  ---------------------
  ---------------------
   L0 | L0  L0  L0  L0
   L0 | L0  L0  L0  L0
Line 61: Line 61:
* VP8: mode 3
* VP8: mode 3


   LT | T0  T1  T2  T3
   LT |  
  --------------------
  --------------------
   L0 |  a  a  a  a
   L0 |  a  a  a  a
Line 83: Line 83:
* VP8: mode 0
* VP8: mode 0


  LT | T0  T1  T2  T3
    | T0  T1  T2  T3
  ---------------------
  ---------------------
   L0 |  a  a  a  a
   L0 |  a  a  a  a
Line 110: Line 110:
* VP8: mode 4
* VP8: mode 4


  LT | T0  T1  T2  T3  T4  T5  T6  T7
    | T0  T1  T2  T3  T4  T5  T6  T7
  -------------------------------------
  -------------------------------------
  L0 |  a  b  c  d
    |  a  b  c  d
  L1 |  b  c  d  e
    |  b  c  d  e
  L2 |  c  d  e  f
    |  c  d  e  f
  L3 |  d  e  f  g
    |  d  e  f  g


where:
where:
Line 134: Line 134:
* VP8: not used
* VP8: not used


  LT | T0  T1  T2  T3
    |     T1  T2  T3
  ---------------------
  ---------------------
  L0 |  a  b  c  c
    |  a  b  c  c
   L1 |  b  c  c  c
   L1 |  b  c  c  c
   L2 |  c  c  c  c
   L2 |  c  c  c  c
Line 154: Line 154:
* VP8: not used
* VP8: not used


  LT | T0  T1  T2  T3  T4  T5  T6  T7
    | T0  T1  T2  T3  T4  T5  T6  T7
  -------------------------------------
  -------------------------------------
   L0 |  a  b  c  d
   L0 |  a  b  c  d
Line 211: Line 211:
   L1 |  e  f  g  h
   L1 |  e  f  g  h
   L2 |  i  a  b  c
   L2 |  i  a  b  c
  L3 |  j  e  f  g
    |  j  e  f  g


where:
where:
Line 233: Line 233:
* VP8: mode 8
* VP8: mode 8


   LT | T0  T1  T2  T3
   LT | T0  T1  T2   
  ---------------------
  ---------------------
   L0 |  a  b  c  d
   L0 |  a  b  c  d
Line 260: Line 260:
* VP8: mode 7
* VP8: mode 7


  LT | T0  T1  T2  T3  T4  T5  T6 T7
    | T0  T1  T2  T3  T4  T5  T6  
  -------------------------------------
  ---------------------------------
  L0 |  a  b  c  d
    |  a  b  c  d
  L1 |  f  g  h  i
    |  f  g  h  i
  L2 |  b  c  d  e
    |  b  c  d  e
  L3 |  g  h  i  j
    |  g  h  i  j


where:
where:
Line 298: Line 298:
* VP8: mode 9
* VP8: mode 9


  LT | T0  T1  T2  T3
    |  
  ---------------------
  ---------------------
   L0 |  a  b  c  d
   L0 |  a  b  c  d
Line 312: Line 312:
   d = (L1 + 2*L2 + L3 + 2) / 4
   d = (L1 + 2*L2 + L3 + 2) / 4
   e = (L2 + L3 + 1) / 2
   e = (L2 + L3 + 1) / 2
   f = (L2 + 2*L3 + L3 + 2) / 4
   f = (L2 + 3*L3     + 2) / 4
   g = L3
   g = L3


Line 322: Line 322:
* VP8: not used
* VP8: not used


  LT | T0  T1  T2  T3
    | T0  T1  T2  T3 T4  T5  T6  T7
  ---------------------
  -------------------------------------
   L0 |  a  b  c  d
   L0 |  a  b  c  d
   L1 |  c  d  e  f
   L1 |  c  d  e  f
Line 331: Line 331:
   L5 |
   L5 |
   L6 |
   L6 |
  L7 |


where:
where:
Line 395: Line 394:
* RV40: mode 1
* RV40: mode 1


  LT | T0  T1  T2  T3  T4  ..  T15
      | T0  T1  T2  T3  T4  ..  T15
  -------------------------- .. -----
  -------------------------- .. -----
   L0 | T0  T1  T2  T3  T4  ..  T15
   L0 | T0  T1  T2  T3  T4  ..  T15
Line 409: Line 408:
* RV40: mode 2
* RV40: mode 2


  LT |  T0  T1  T2  T3  T4  ..  T15
      |  T0  T1  T2  T3  T4  ..  T15
  --------------------------- .. -----
  --------------------------- .. -----
   L0 |  L0  L0  L0  L0  L0  ..  L0
   L0 |  L0  L0  L0  L0  L0  ..  L0

Revision as of 09:49, 21 May 2010

This page documents the various prediction methods used in H.264 and related formats such as Sorenson Video 3, RealVideo 4, and On2 VP8.

4x4 Prediction Modes

4x4 prediction modes vary between different codecs. While they are almost the same for H.264 and Sorenson Video 3, RealVideo 4 has a different order for these modes and some of them significantly differ from H.264 counterparts (by using left predictors where H.264 does not and down left predictors which are not used elsewhere).

Vertical

  • H.264: mode 0
  • SVQ3: mode 0
  • RV40: mode 1
  • VP8: not used
    | T0  T1  T2  T3
---------------------
    | T0  T1  T2  T3
    | T0  T1  T2  T3
    | T0  T1  T2  T3
    | T0  T1  T2  T3

Vertical (VP8)

  • H.264: not used
  • SVQ3: not used
  • RV40: not used
  • VP8: mode 2
 LT | T0  T1  T2  T3  T4
------------------------
    |  a   b   c   d
    |  a   b   c   d
    |  a   b   c   d
    |  a   b   c   d

where:

 a = (LT + 2*T0 + T1 + 2) >> 2
 b = (T0 + 2*T1 + T2 + 2) >> 2
 c = (T1 + 2*T2 + T3 + 2) >> 2
 d = (T2 + 2*T3 + T4 + 2) >> 2

Horizontal

  • H.264: mode 1
  • SVQ3: mode 1
  • RV40: mode 2
  • VP8: not used
    | 
---------------------
 L0 | L0  L0  L0  L0
 L1 | L1  L1  L1  L1
 L2 | L2  L2  L2  L2
 L3 | L3  L3  L3  L3

Horizontal (VP8)

  • H.264: not used
  • SVQ3: not used
  • RV40: not used
  • VP8: mode 3
 LT | 
--------------------
 L0 |  a   a   a   a
 L1 |  b   b   b   b
 L2 |  c   c   c   c
 L3 |  d   d   d   d
 L4 |

where:

 a = (LT + 2*L0 + L1 + 2) >> 2
 b = (L0 + 2*L1 + L2 + 2) >> 2
 c = (L1 + 2*L2 + L3 + 2) >> 2
 d = (L2 + 2*L3 + L4 + 2) >> 2

DC

  • H.264: mode 2
  • SVQ3: mode 2
  • RV40: mode 0
  • VP8: mode 0
    | T0  T1  T2  T3
---------------------
 L0 |  a   a   a   a
 L1 |  a   a   a   a
 L2 |  a   a   a   a
 L3 |  a   a   a   a

where:

if top and left predictors are available
  a = (T0 + T1 + T2 + T3 + L0 + L1 + L2 + L3 + 4) / 8
else if top predictors are available
  a = (T0 + T1 + T2 + T3 + 2) / 4
else if left predictors are available
  a = (L0 + L1 + L2 + L3 + 2) / 4
else
  a = 128

Note that the VP8 reference code does not make any provisions for either or both sets of predictors to be missing.

Diagonal Down/Left

  • H.264: mode 3
  • SVQ3: not used
  • RV40: not used
  • VP8: mode 4
    | T0  T1  T2  T3  T4  T5  T6  T7
-------------------------------------
    |  a   b   c   d
    |  b   c   d   e
    |  c   d   e   f
    |  d   e   f   g

where:

 a = (T0 + 2*T1 + T2 + 2) / 4
 b = (T1 + 2*T2 + T3 + 2) / 4
 c = (T2 + 2*T3 + T4 + 2) / 4
 d = (T3 + 2*T4 + T5 + 2) / 4
 e = (T4 + 2*T5 + T6 + 2) / 4
 f = (T5 + 2*T6 + T7 + 2) / 4
 g = (T6 + 3*T7      + 2) / 4

Diagonal Down/Left (SVQ3)

  • H.264: not used
  • SVQ3: mode 3
  • RV40: not used
  • VP8: not used
    |     T1  T2  T3
---------------------
    |  a   b   c   c
 L1 |  b   c   c   c
 L2 |  c   c   c   c
 L3 |  c   c   c   c

where:

 a = (L1 + T1) / 2
 b = (L2 + T2) / 2
 c = (L3 + T3) / 2

Diagonal Down/Left (RV40)

  • H.264: not used
  • SVQ3: not used
  • RV40: mode 4
  • VP8: not used
    | T0  T1  T2  T3  T4  T5  T6  T7
-------------------------------------
 L0 |  a   b   c   d
 L1 |  b   c   d   e
 L2 |  c   d   e   f
 L3 |  d   e   f   g
 L4 |
 L5 |
 L6 |
 L7 |

where:

 a = (T0 + 2*T1 + T2 + L0 + 2*L1 + L2 + 4) / 8
 b = (T1 + 2*T2 + T3 + L1 + 2*L2 + L3 + 4) / 8
 c = (T2 + 2*T3 + T4 + L2 + 2*L3 + L4 + 4) / 8
 d = (T3 + 2*T4 + T5 + L3 + 2*L4 + L5 + 4) / 8
 e = (T4 + 2*T5 + T6 + L4 + 2*L5 + L6 + 4) / 8
 f = (T5 + 2*T6 + T7 + L5 + 2*L6 + L7 + 4) / 8
 g = (T6 +   T7      + L6 +   L7      + 2) / 4

Diagonal Down/Right

  • H.264: mode 4
  • SVQ3: mode 4
  • RV40: mode 3
  • VP8: mode 5
 LT | T0  T1  T2  T3
---------------------
 L0 |  d   e   f   g
 L1 |  c   d   e   f
 L2 |  b   c   d   e
 L3 |  a   b   c   d

where:

 a = (L3 + 2*L2 + L1 + 2) / 4
 b = (L2 + 2*L1 + L0 + 2) / 4
 c = (L1 + 2*L0 + LT + 2) / 4
 d = (L0 + 2*LT + T0 + 2) / 4
 e = (LT + 2*T0 + T1 + 2) / 4
 f = (T0 + 2*T1 + T2 + 2) / 4
 g = (T1 + 2*T2 + T3 + 2) / 4

Vertical/Right

  • H.264: mode 5
  • SVQ3: mode 5
  • RV40: mode 5
  • VP8: mode 6
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   d
 L1 |  e   f   g   h
 L2 |  i   a   b   c
    |  j   e   f   g

where:

 a = (LT + T0 + 1) / 2
 b = (T0 + T1 + 1) / 2
 c = (T1 + T2 + 1) / 2
 d = (T2 + T3 + 1) / 2
 e = (L0 + 2*LT + T0 + 2) / 4
 f = (LT + 2*T0 + T1 + 2) / 4
 g = (T0 + 2*T1 + T2 + 2) / 4
 h = (T1 + 2*T2 + T3 + 2) / 4
 i = (LT + 2*L0 + L1 + 2) / 4
 j = (L0 + 2*L1 + L2 + 2) / 4

Horizontal/Down

  • H.264: mode 6
  • SVQ3: mode 6
  • RV40: mode 8
  • VP8: mode 8
 LT | T0  T1  T2  
---------------------
 L0 |  a   b   c   d
 L1 |  e   f   a   b
 L2 |  g   h   e   f
 L3 |  i   j   g   h

where:

 a = (LT + L0 + 1) / 2
 b = (L0 + 2*LT + T0 + 2) / 4
 c = (LT + 2*T0 + T1 + 2) / 4
 d = (T0 + 2*T1 + T2 + 2) / 4
 e = (L0 + L1 + 1) / 2
 f = (LT + 2*L0 + L1 + 2) / 4
 g = (L1 + L2 + 1) / 2
 h = (L0 + 2*L1 + L2 + 2) / 4
 g = (L2 + L3 + 1) / 2
 j = (L1 + 2*L2 + L3 + 2) / 4

Vertical/Left

  • H.264: mode 7
  • SVQ3: mode 7
  • RV40: mode 6
  • VP8: mode 7
    | T0  T1  T2  T3  T4  T5  T6 
---------------------------------
    |  a   b   c   d
    |  f   g   h   i
    |  b   c   d   e
    |  g   h   i   j

where:

 a = (T0 + T1 + 1) / 2
 b = (T1 + T2 + 1) / 2
 c = (T2 + T3 + 1) / 2
 d = (T3 + T4 + 1) / 2
 e = (T4 + T5 + 1) / 2
 f = (T0 + 2*T1 + T2 + 2) / 4
 g = (T1 + 2*T2 + T3 + 2) / 4
 h = (T2 + 2*T3 + T4 + 2) / 4
 i = (T3 + 2*T4 + T5 + 2) / 4
 j = (T4 + 2*T5 + T6 + 2) / 4

For RV40 two coefficients differ:

 a = (2*T0 + 2*T1 + L1 + 2*L2 +   L3 +      4) / 8
 f = (  T0 + 2*T1 + T2 +   L2 + 2*L3 + L4 + 4) / 8

For VP8, 3 coefficients differ:

 c = (T2 + T3 + T4 + 2) / 4
 e = (T4 + 2*T5 + T6 + 2) / 4
 j = (T5 + 2*T6 + T7 + 2) / 4

Horizontal/Up

  • H.264: mode 8
  • SVQ3: mode 8
  • RV40: not used
  • VP8: mode 9
    | 
---------------------
 L0 |  a   b   c   d
 L1 |  c   d   e   f
 L2 |  e   f   g   g
 L3 |  g   g   g   g

where:

 a = (L0 + L1 + 1) / 2
 b = (L0 + 2*L1 + L2 + 2) / 4
 c = (L1 + L2 + 1) / 2
 d = (L1 + 2*L2 + L3 + 2) / 4
 e = (L2 + L3 + 1) / 2
 f = (L2 + 3*L3      + 2) / 4
 g = L3

Horizontal/Up (RV40)

  • H.264: not used
  • SVQ3: not used
  • RV40: mode 7
  • VP8: not used
    | T0  T1  T2  T3  T4  T5  T6  T7
-------------------------------------
 L0 |  a   b   c   d
 L1 |  c   d   e   f
 L2 |  e   f   g   h
 L3 |  g   h   i   j
 L4 |
 L5 |
 L6 |

where:

 a = (T1 + 2*T2 + T3 + 2*L0 + 2*L1 +      4) / 8
 b = (T2 + 2*T3 + T4 +   L0 + 2*L1 + L2 + 4) / 8
 c = (T3 + 2*T4 + T5 + 2*L1 + 2*L2 +      4) / 8
 d = (T4 + 2*T5 + T6 +   L1 + 2*L2 + L3 + 4) / 8
 e = (T5 + 2*T6 + T7 + 2*L2 + 2*L3 +      4) / 8
 f = (T6 + 3*T7 +        L2 + 3*L3 +      4) / 8
 g = (T6 +   T7 +        L3 +   L4      + 2) / 4
 h = (                   L3 + 2*L4 + L5 + 2) / 4
 i = (                   L4 +   L5      + 1) / 2
 j = (                   L4 + 2*L5 + L6 + 2) / 4

TrueMotion (VP8)

  • H.264: not used
  • SVQ3: not used
  • RV40: not used
  • VP8: mode 1
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   .   .   .
 L1 |  .   b   .   .
 L2 |  .   .   c   .
 L3 |  .   .   .   d

where this pattern is satisfied:

 a = SATURATE_U8(T0 - LT + L0)
 b = SATURATE_U8(T1 - LT + L1)
 c = SATURATE_U8(T2 - LT + L2)
 d = SATURATE_U8(T3 - LT + L3)

I.e., for each of the 16 samples: (top predictor for column) - (left/top predictor) + (left predictor for row), then saturate in an unsigned byte range 0..255.


16x16 Prediction Modes

DC

  • H.264: mode 0
  • SVQ3: mode 0
  • RV40: mode 0

Using the 16 top predictors (T0..T15) and the 16 left predictors (L0..L15), set all 256 elements to the mean, computed as:

if top and left predictors are available
  mean = (sum(T0..T15) + sum(L0..L15) + 16) / 32
else if top predictors are available
  mean = (sum(T0..T15) + 8) / 16
else if left predictors are available
  mean = (sum(L0..L15) + 8) / 16
else
  mean = 128

Vertical

  • H.264: mode 1
  • SVQ3: mode 1
  • RV40: mode 1
     | T0  T1  T2  T3  T4  ..  T15
-------------------------- .. -----
  L0 | T0  T1  T2  T3  T4  ..  T15
  L1 | T0  T1  T2  T3  T4  ..  T15
  L2 | T0  T1  T2  T3  T4  ..  T15
 ......
 L15 | T0  T1  T2  T3  T4  ..  T15

Horizontal

  • H.264: mode 2
  • SVQ3: mode 2
  • RV40: mode 2
     |  T0  T1  T2  T3  T4  ..  T15
--------------------------- .. -----
  L0 |  L0  L0  L0  L0  L0  ..   L0
  L1 |  L1  L1  L1  L1  L1  ..   L1
  L2 |  L2  L2  L2  L2  L2  ..   L2
 ......
 L15 | L15 L15 L15 L15 L15  ..  L15

Plane

  • H.264: mode 3
  • SVQ3: mode 3
  • RV40: mode 3

Notice that SVQ3 and RV40 follow a slightly different method here.

Given the top predictors (T0..T15), left predictors (L0..L15) and the left-top corner predictor (LT) arranged as follows:

  LT |    T0        T1        T2     ..    T15
-----------------------------------  ..  --------
  L0 | c[ 0, 0]  c[ 1, 0]  c[ 2, 0]  ..  c[15, 0]
  L1 | c[ 0, 1]  c[ 1, 1]  c[ 2, 1]  ..  c[15, 1]
 ......
 L15 | c[ 0,15]  c[ 1,15]  c[ 2,15]  ..  c[15,15]

Compute H' and V':

H' = 1* (T8 - T6) +
     2* (T9 - T5) +
     3*(T10 - T4) +
     4*(T11 - T3) +
     5*(T12 - T2) +
     6*(T13 - T1) +
     7*(T14 - T0) +
     8*(T15 - LT)
V' = 1* (L8 - L6) +
     2* (L9 - L5) +
     3*(L10 - L4) +
     4*(L11 - L3) +
     5*(L12 - L2) +
     6*(L13 - L1) +
     7*(L14 - L0) +
     8*(L15 - LT)

For H.264, compute H and V as:

 H = (5*H' + 32) / 64
 V = (5*V' + 32) / 64

For SVQ3, compute H and V as:

 V = (5*(H'/4)) / 16
 H = (5*(V'/4)) / 16 
 (notice that V and H are computed from H' and V', respectively)

For RV40, compute H and V as:

 H = (5*(H' >> 2)) >> 4
 V = (5*(V' >> 2)) >> 4 
 (like SVQ3 but without swapping and it's important to use shifts here instead of divisions)

The final process for filling in the 16x16 block is:

 a = 16 * (L15 + T15 + 1) - 7*(V+H)
 for (j = 0..15)
   for (i = 0..15)
     b = a + V * j + H * i
     c[i,j] = SATURATE_U8(b / 32)

The SATURATE_U8() function indicates that the result of the operation should be bounded to an unsigned 8-bit range (0..255).