H.264 Prediction

This page documents the various prediction methods used in H.264 and related formats such as Sorenson Video 3 and RealVideo 4.

4x4 Prediction Modes

Vertical

H.264: mode 0
SVQ3: mode 0
RV40: mode 1

 LT | T0  T1  T2  T3
---------------------
 L0 | T0  T1  T2  T3
 L1 | T0  T1  T2  T3
 L2 | T0  T1  T2  T3
 L3 | T0  T1  T2  T3

Horizontal

H.264: mode 1
SVQ3: mode 1
RV40: mode 2

 LT | T0  T1  T2  T3
---------------------
 L0 | L0  L0  L0  L0
 L1 | L1  L1  L1  L1
 L2 | L2  L2  L2  L2
 L3 | L3  L3  L3  L3

DC

H.264: mode 2
SVQ3: mode 2
RV40: mode 0

 LT | T0  T1  T2  T3
---------------------
 L0 |  a   a   a   a
 L1 |  a   a   a   a
 L2 |  a   a   a   a
 L3 |  a   a   a   a

where:

a = (T0 + T1 + T2 + T3 + L0 + L1 + L2 + L3 + 4) / 8

Diagonal Down/Left

H.264: mode 3
SVQ3: not used
RV40: not used

 LT | T0  T1  T2  T3  T4  T5  T6  T7
-------------------------------------
 L0 |  a   b   c   d
 L1 |  b   c   d   e
 L2 |  c   d   e   f
 L3 |  d   e   f   g

where:

 a = (T0 + 2*T1 + T2 + 2) / 4
 b = (T1 + 2*T2 + T3 + 2) / 4
 c = (T2 + 2*T3 + T4 + 2) / 4
 d = (T3 + 2*T4 + T5 + 2) / 4
 e = (T4 + 2*T5 + T6 + 2) / 4
 f = (T5 + 2*T6 + T7 + 2) / 4
 g = (T6 * 3*T7      + 2) / 4

Diagonal Down/Left (SVQ3)

H.264: not used
SVQ3: mode 3
RV40: not used

 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   c
 L1 |  b   c   c   c
 L2 |  c   c   c   c
 L3 |  c   c   c   c

where:

 a = (L1 + T1) / 2
 b = (L2 + T2) / 2
 c = (L3 + T3) / 2

Diagonal Down/Left (RV40)

H.264: not used
SVQ3: not used
RV40: mode 4

to be determined

Diagonal Down/Right

H.264: mode 4
SVQ3: mode 4
RV40: mode 3

 LT | T0  T1  T2  T3
---------------------
 L0 |  d   e   f   g
 L1 |  c   d   e   f
 L2 |  b   c   d   e
 L3 |  a   b   c   d

where:

 a = (L3 + 2*L2 + L1 + 2) / 4
 b = (L2 + 2*L1 + L0 + 2) / 4
 c = (L1 + 2*L0 + LT + 2) / 4
 d = (L0 + 2*LT + T0 + 2) / 4
 e = (LT + 2*T0 + T1 + 2) / 4
 f = (T0 + 2*T1 + T2 + 2) / 4
 g = (T1 + 2*T2 + T3 + 2) / 4

Vertical/Right

H.264: mode 5
SVQ3: mode 5
RV40: mode 5

 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   d
 L1 |  e   f   g   h
 L2 |  i   a   b   c
 L3 |  j   e   f   g

where:

 a = (LT + T0 + 1) / 2
 b = (T0 + T1 + 1) / 2
 c = (T1 + T2 + 1) / 2
 d = (T2 + T3 + 1) / 2
 e = (L0 + 2*LT + T0 + 2) / 4
 f = (LT + 2*T0 + T1 + 2) / 4
 g = (T0 + 2*T1 + T2 + 2) / 4
 h = (T1 + 2*T2 + T3 + 2) / 4
 i = (LT + 2*L0 + L1 + 2) / 4
 j = (L0 + 2*L1 + L2 + 2) / 4

Horizontal/Down

H.264: mode 6
SVQ3: mode 6
RV40: mode 8

 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   d
 L1 |  e   f   a   b
 L2 |  g   h   e   f
 L3 |  i   j   g   h

where:

 a = (LT + L0 + 1) / 2
 b = (L0 + 2*LT + T0 + 2) / 4
 c = (LT + 2*T0 + T1 + 2) / 4
 d = (T0 + 2*T1 + T2 + 2) / 4
 e = (L0 + L1 + 1) / 2
 f = (LT + 2*L0 + L1 + 2) / 4
 g = (L1 + L2 + 1) / 2
 h = (L0 + 2*L1 + L2 + 2) / 4
 g = (L2 + L3 + 1) / 2
 j = (L1 + 2*L2 + L3 + 2) / 4

Vertical/Left

H.264: mode 7
SVQ3: mode 7
RV40: mode 6

 LT | T0  T1  T2  T3  T4  T5  T6  T7
-------------------------------------
 L0 |  a   b   c   d
 L1 |  f   g   h   i
 L2 |  b   c   d   e
 L3 |  g   h   i   j

where:

 a = (T0 + T1 + 1) / 2
 b = (T1 + T2 + 1) / 2
 c = (T2 + T3 + 1) / 2
 d = (T3 + T4 + 1) / 2
 e = (T4 + T5 + 1) / 2
 f = (T0 + 2*T1 + T2 + 2) / 4
 g = (T1 + 2*T2 + T3 + 2) / 4
 h = (T2 + 2*T3 + T4 + 2) / 4
 i = (T3 + 2*T4 + T5 + 2) / 4
 j = (T4 + 2*T5 + T6 + 2) / 4

Horizontal/Up

H.264: mode 8
SVQ3: mode 8
RV40: mode 7

 LT | T0  T1  T2  T3
---------------------
 L0 |  a   b   c   d
 L1 |  c   d   e   f
 L2 |  e   f   g   g
 L3 |  g   g   g   g

where:

 a = (L0 + L1 + 1) / 2
 b = (L0 + 2*L1 + L2 + 2) / 4
 c = (L1 + L2 + 1) / 2
 d = (L1 + 2*L2 + L3 + 2) / 4
 e = (L2 + L3 + 1) / 2
 f = (L2 + 2*L3 + L3 + 2) / 4
 g = L3

Left/DC

H.264: mode 9
SVQ3: mode 9
RV40: not used

 LT | T0  T1  T2  T3
---------------------
 L0 |  a   a   a   a
 L1 |  a   a   a   a
 L2 |  a   a   a   a
 L3 |  a   a   a   a

where:

a = (L0 + L1 + L2 + L3 + 2) / 4

Top/DC

H.264: mode 10
SVQ3: mode 10
RV40: not used

 LT | T0  T1  T2  T3
---------------------
 L0 |  a   a   a   a
 L1 |  a   a   a   a
 L2 |  a   a   a   a
 L3 |  a   a   a   a

where:

a = (T0 + T1 + T2 + T3 + 2) / 4

DC-128

H.264: mode 11
SVQ3: mode 11
RV40: not used

 LT |  T0   T1   T2   T3
------------------------
 L0 | 128  128  128  128
 L1 | 128  128  128  128
 L2 | 128  128  128  128
 L3 | 128  128  128  128

16x16 Prediction Modes

DC

H.264: mode 0
SVQ3: mode 0
RV40: mode 0

Using the 16 top predictors (T0..T15) and the 16 left predictors (L0..L15), set all 256 elements to the mean, computed as:

 mean = (sum(T0..T15) + sum(L0..L15) + 16) / 32

Vertical

H.264: mode 1
SVQ3: mode 1
RV40: mode 1

  LT | T0  T1  T2  T3  T4  ..  T15
------------------------- .. -----
  L0 | T0  T1  T2  T3  T4  ..  T15
  L1 | T0  T1  T2  T3  T4  ..  T15
  L2 | T0  T1  T2  T3  T4  ..  T15
 ......
 L15 | T0  T1  T2  T3  T4  ..  T15

Horizontal

H.264: mode 2
SVQ3: mode 2
RV40: mode 2

  LT |  T0  T1  T2  T3  T4  ..  T15
--------------------------- .. -----
  L0 |  L0  L0  L0  L0  L0  ..   L0
  L1 |  L1  L1  L1  L1  L1  ..   L1
  L2 |  L2  L2  L2  L2  L2  ..   L2
 ......
 L15 | L15 L15 L15 L15 L15  ..  L15

Plane

H.264: mode 3
SVQ3: mode 3
RV40: mode 3

Notice that SVQ3 follows a slightly different method here. RV40 is likely different as well and should be regarded as unfinished.

Given the top predictors (T0..T15), left predictors (L0..L15) and the left-top corner predictor (LT) arranged as follows:

  LT |   T0    T1    T2  ..  T15
------------------------ .. -----
  L0 |  c_0,0  c_1,0  c_2,0 .. c_15,0
  L1 |  c_0,1  c_1,1  c_2,1 .. c_15,1
 ......
 L15 | c_0,15 c_1,15 c_2,15 .. c_15,15

Compute H and V as:

 H =  (T8 - T6) +
      (T9 - T5) +
     (T10 - T4) +
     (T11 - T3) +
     (T12 - T2) +
     (T13 - T1) +
     (T14 - T0) +
     (T15 - LT)

 V =  (L8 - L6) +
      (L9 - L5) +
     (L10 - L4) +
     (L11 - L3) +
     (L12 - L2) +
     (L13 - L1) +
     (L14 - L0) +
     (L15 - LT)

For H.264, further compute H and V as:

 H = (5*H + 32) / 64
 V = (5*V + 32) / 64

For SVQ3, further compute H and V as:

 H = (5*(H/4)) / 16
 V = (5*(V/4)) / 16 
 swap H and V

The final process for filling in the 16x16 block is:

 a = 16 * (L15 + T15 + 1) - 7*(V+H)
 for (j = 0..15)
   for (i = 0..15)
     b = a + V * (15 - j) + (i * H * 4)
     c[i,j] = SATURATE_U8((b + (i%4*H)) / 32)

The SATURATE_U8() function indicates that the result of the operation should be bounded to an unsigned 8-bit range (0..255).

Left/DC

H.264: mode 4
SVQ3: mode 4
RV40: not used

Using 16 left predictors (L0..L15), set all 256 elements to the mean, computed as:

 mean = (sum(L0..L15) + 8) / 16

Top/DC

H.264: mode 5
SVQ3: mode 5
RV40: not used

Using the 16 top predictors (T0..T15), set all 256 elements to the mean, computed as:

 mean = (sum(T0..T15) + 8) / 16

DC-128

H.264: mode 6
SVQ3: mode 6
RV40: not used

Set all 256 elements to 128.

H.264 Prediction

Contents

4x4 Prediction Modes

Vertical

Horizontal

DC

Diagonal Down/Left

Diagonal Down/Left (SVQ3)

Diagonal Down/Left (RV40)

Diagonal Down/Right

Vertical/Right

Horizontal/Down

Vertical/Left

Horizontal/Up

Left/DC

Top/DC

DC-128

16x16 Prediction Modes

DC

Vertical

Horizontal

Plane

Left/DC

Top/DC

DC-128

Navigation menu

H.264 Prediction

4x4 Prediction Modes

Vertical

Horizontal

DC

Diagonal Down/Left

Diagonal Down/Left (SVQ3)

Diagonal Down/Left (RV40)

Diagonal Down/Right

Vertical/Right

Horizontal/Down

Vertical/Left

Horizontal/Up

Left/DC

Top/DC

DC-128

16x16 Prediction Modes

DC

Vertical

Horizontal

Plane

Left/DC

Top/DC

DC-128

Navigation menu

Search