H.264 Prediction

From MultimediaWiki
Revision as of 13:05, 30 July 2007 by Multimedia Mike (talk | contribs) (→‎16x16 Prediction Modes: RV40 doesn't use all the 16x16 modes)
Jump to navigation Jump to search

This page documents the various prediction methods used in H.264 and related formats such as Sorenson Video 3 and RealVideo 4.

4x4 Prediction Modes

Vertical

  • H.264: mode 0
  • SVQ3: mode 0
  • RV40: mode 1
 LT | T0  T1  T2  T3
---------------------
 L0 | T0  T1  T2  T3
 L1 | T0  T1  T2  T3
 L2 | T0  T1  T2  T3
 L3 | T0  T1  T2  T3

Horizontal

  • H.264: mode 1
  • SVQ3: mode 1
  • RV40: mode 2
 LT | T0  T1  T2  T3
---------------------
 L0 | L0  L0  L0  L0
 L1 | L1  L1  L1  L1
 L2 | L2  L2  L2  L2
 L3 | L3  L3  L3  L3

DC

  • H.264: mode 2
  • SVQ3: mode 2
  • RV40: mode 0
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   a   a   a
 L1 |  a   a   a   a
 L2 |  a   a   a   a
 L3 |  a   a   a   a

where:

a = (T0 + T1 + T2 + T3 + L0 + L1 + L2 + L3 + 4) / 8

Diagonal Down/Left

  • H.264: mode 3
  • SVQ3: not used
  • RV40: not used
 LT | T0  T1  T2  T3  T4  T5  T6  T7
-------------------------------------
 L0 |  a   b   c   d
 L1 |  b   c   d   e
 L2 |  c   d   e   f
 L3 |  d   e   f   g

where:

 a = (T0 + 2*T1 + T2 + 2) / 4
 b = (T1 + 2*T2 + T3 + 2) / 4
 c = (T2 + 2*T3 + T4 + 2) / 4
 d = (T3 + 2*T4 + T5 + 2) / 4
 e = (T4 + 2*T5 + T6 + 2) / 4
 f = (T5 + 2*T6 + T7 + 2) / 4
 g = (T6 * 3*T7      + 2) / 4

Diagonal Down/Left (SVQ3)

  • H.264: not used
  • SVQ3: mode 3
  • RV40: not used
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   c
 L1 |  b   c   c   c
 L2 |  c   c   c   c
 L3 |  c   c   c   c

where:

 a = (L1 + T1) / 2
 b = (L2 + T2) / 2
 c = (L3 + T3) / 2

Diagonal Down/Left (RV40)

  • H.264: not used
  • SVQ3: not used
  • RV40: mode 4

to be determined

Diagonal Down/Right

  • H.264: mode 4
  • SVQ3: mode 4
  • RV40: mode 3
 LT | T0  T1  T2  T3
---------------------
 L0 |  d   e   f   g
 L1 |  c   d   e   f
 L2 |  b   c   d   e
 L3 |  a   b   c   d

where:

 a = (L3 + 2*L2 + L1 + 2) / 4
 b = (L2 + 2*L1 + L0 + 2) / 4
 c = (L1 + 2*L0 + LT + 2) / 4
 d = (L0 + 2*LT + T0 + 2) / 4
 e = (LT + 2*T0 + T1 + 2) / 4
 f = (T0 + 2*T1 + T2 + 2) / 4
 g = (T1 + 2*T2 + T3 + 2) / 4

Vertical/Right

  • H.264: mode 5
  • SVQ3: mode 5
  • RV40: mode 5
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   d
 L1 |  e   f   g   h
 L2 |  i   a   b   c
 L3 |  j   e   f   g

where:

 a = (LT + T0 + 1) / 2
 b = (T0 + T1 + 1) / 2
 c = (T1 + T2 + 1) / 2
 d = (T2 + T3 + 1) / 2
 e = (L0 + 2*LT + T0 + 2) / 4
 f = (LT + 2*T0 + T1 + 2) / 4
 g = (T0 + 2*T1 + T2 + 2) / 4
 h = (T1 + 2*T2 + T3 + 2) / 4
 i = (LT + 2*L0 + L1 + 2) / 4
 j = (L0 + 2*L1 + L2 + 2) / 4

Horizontal/Down

  • H.264: mode 6
  • SVQ3: mode 6
  • RV40: mode 8
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   d
 L1 |  e   f   a   b
 L2 |  g   h   e   f
 L3 |  i   j   g   h

where:

 a = (LT + L0 + 1) / 2
 b = (L0 + 2*LT + T0 + 2) / 4
 c = (LT + 2*T0 + T1 + 2) / 4
 d = (T0 + 2*T1 + T2 + 2) / 4
 e = (L0 + L1 + 1) / 2
 f = (LT + 2*L0 + L1 + 2) / 4
 g = (L1 + L2 + 1) / 2
 h = (L0 + 2*L1 + L2 + 2) / 4
 g = (L2 + L3 + 1) / 2
 j = (L1 + 2*L2 + L3 + 2) / 4

Vertical/Left

  • H.264: mode 7
  • SVQ3: mode 7
  • RV40: mode 6
 LT | T0  T1  T2  T3  T4  T5  T6  T7
-------------------------------------
 L0 |  a   b   c   d
 L1 |  f   g   h   i
 L2 |  b   c   d   e
 L3 |  g   h   i   j

where:

 a = (T0 + T1 + 1) / 2
 b = (T1 + T2 + 1) / 2
 c = (T2 + T3 + 1) / 2
 d = (T3 + T4 + 1) / 2
 e = (T4 + T5 + 1) / 2
 f = (T0 + 2*T1 + T2 + 2) / 4
 g = (T1 + 2*T2 + T3 + 2) / 4
 h = (T2 + 2*T3 + T4 + 2) / 4
 i = (T3 + 2*T4 + T5 + 2) / 4
 j = (T4 + 2*T5 + T6 + 2) / 4

Horizontal/Up

  • H.264: mode 8
  • SVQ3: mode 8
  • RV40: mode 7
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   d
 L1 |  c   d   e   f
 L2 |  e   f   g   g
 L3 |  g   g   g   g

where:

 a = (L0 + L1 + 1) / 2
 b = (L0 + 2*L1 + L2 + 2) / 4
 c = (L1 + L2 + 1) / 2
 d = (L1 + 2*L2 + L3 + 2) / 4
 e = (L2 + L3 + 1) / 2
 f = (L2 + 2*L3 + L3 + 2) / 4
 g = L3

Left/DC

  • H.264: mode 9
  • SVQ3: mode 9
  • RV40: not used
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   a   a   a
 L1 |  a   a   a   a
 L2 |  a   a   a   a
 L3 |  a   a   a   a

where:

a = (L0 + L1 + L2 + L3 + 2) / 4

Top/DC

  • H.264: mode 10
  • SVQ3: mode 10
  • RV40: not used
 LT | T0  T1  T2  T3
---------------------
 L0 |  a   a   a   a
 L1 |  a   a   a   a
 L2 |  a   a   a   a
 L3 |  a   a   a   a

where:

a = (T0 + T1 + T2 + T3 + 2) / 4

DC-128

  • H.264: mode 11
  • SVQ3: mode 11
  • RV40: not used
 LT |  T0   T1   T2   T3
------------------------
 L0 | 128  128  128  128
 L1 | 128  128  128  128
 L2 | 128  128  128  128
 L3 | 128  128  128  128

16x16 Prediction Modes

DC

  • H.264: mode 0
  • SVQ3: mode 0
  • RV40: mode 0

Using the 16 top predictors (T0..T15) and the 16 left predictors (L0..L15), set all 256 elements to the mean, computed as:

 mean = (sum(T0..T15) + sum(L0..L15) + 16) / 32

Vertical

  • H.264: mode 1
  • SVQ3: mode 1
  • RV40: mode 1
  LT | T0  T1  T2  T3  T4  ..  T15
------------------------- .. -----
  L0 | T0  T1  T2  T3  T4  ..  T15
  L1 | T0  T1  T2  T3  T4  ..  T15
  L2 | T0  T1  T2  T3  T4  ..  T15
 ......
 L15 | T0  T1  T2  T3  T4  ..  T15

Horizontal

  • H.264: mode 2
  • SVQ3: mode 2
  • RV40: mode 2
  LT |  T0  T1  T2  T3  T4  ..  T15
--------------------------- .. -----
  L0 |  L0  L0  L0  L0  L0  ..   L0
  L1 |  L1  L1  L1  L1  L1  ..   L1
  L2 |  L2  L2  L2  L2  L2  ..   L2
 ......
 L15 | L15 L15 L15 L15 L15  ..  L15

Plane

  • H.264: mode 3
  • SVQ3: mode 3
  • RV40: mode 3

Notice that SVQ3 follows a slightly different method here. RV40 is likely different as well and should be regarded as unfinished.

Given the top predictors (T0..T15), left predictors (L0..L15) and the left-top corner predictor (LT) arranged as follows:

  LT |   T0    T1    T2  ..  T15
------------------------ .. -----
  L0 |  c0,0  c1,0  c2,0 .. c15,0
  L1 |  c0,1  c1,1  c2,1 .. c15,1
 ......
 L15 | c0,15 c1,15 c2,15 .. c15,15

Compute H and V as:

 H =  (T8 - T6) +
      (T9 - T5) +
     (T10 - T4) +
     (T11 - T3) +
     (T12 - T2) +
     (T13 - T1) +
     (T14 - T0) +
     (T15 - LT)
 V =  (L8 - L6) +
      (L9 - L5) +
     (L10 - L4) +
     (L11 - L3) +
     (L12 - L2) +
     (L13 - L1) +
     (L14 - L0) +
     (L15 - LT)

For H.264, further compute H and V as:

 H = (5*H + 32) / 64
 V = (5*V + 32) / 64

For SVQ3, further compute H and V as:

 H = (5*(H/4)) / 16
 V = (5*(V/4)) / 16 
 swap H and V

The final process for filling in the 16x16 block is:

 a = 16 * (L15 + T15 + 1) - 7*(V+H)
 for (j = 0..15)
   for (i = 0..15)
     b = a + V * (15 - j) + (i * H * 4)
     c[i,j] = SATURATE_U8((b + (i%4*H)) / 32)

The SATURATE_U8() function indicates that the result of the operation should be bounded to an unsigned 8-bit range (0..255).

Left/DC

  • H.264: mode 4
  • SVQ3: mode 4
  • RV40: not used

Using 16 left predictors (L0..L15), set all 256 elements to the mean, computed as:

 mean = (sum(L0..L15) + 8) / 16

Top/DC

  • H.264: mode 5
  • SVQ3: mode 5
  • RV40: not used

Using the 16 top predictors (T0..T15), set all 256 elements to the mean, computed as:

 mean = (sum(T0..T15) + 8) / 16

DC-128

  • H.264: mode 6
  • SVQ3: mode 6
  • RV40: not used

Set all 256 elements to 128.