H.264 Prediction
This page documents the various prediction methods used in H.264 and related formats such as Sorenson Video 3 and RealVideo 4.
4x4 Prediction Modes
4x4 prediction modes vary between different codecs. While they are almost the same for H.264 and Sorenson Video 3, RealVideo 4 has a different order for these modes and some of them significantly differ from H.264 counterparts (by using left predictors where H.264 does not and down left predictors which are not used elsewhere).
Vertical
- H.264: mode 0
- SVQ3: mode 0
- RV40: mode 1
LT | T0 T1 T2 T3 --------------------- L0 | T0 T1 T2 T3 L1 | T0 T1 T2 T3 L2 | T0 T1 T2 T3 L3 | T0 T1 T2 T3
Horizontal
- H.264: mode 1
- SVQ3: mode 1
- RV40: mode 2
LT | T0 T1 T2 T3 --------------------- L0 | L0 L0 L0 L0 L1 | L1 L1 L1 L1 L2 | L2 L2 L2 L2 L3 | L3 L3 L3 L3
DC
- H.264: mode 2
- SVQ3: mode 2
- RV40: mode 0
LT | T0 T1 T2 T3 --------------------- L0 | a a a a L1 | a a a a L2 | a a a a L3 | a a a a
where:
a = (T0 + T1 + T2 + T3 + L0 + L1 + L2 + L3 + 4) / 8
Diagonal Down/Left
- H.264: mode 3
- SVQ3: not used
- RV40: not used
LT | T0 T1 T2 T3 T4 T5 T6 T7 ------------------------------------- L0 | a b c d L1 | b c d e L2 | c d e f L3 | d e f g
where:
a = (T0 + 2*T1 + T2 + 2) / 4 b = (T1 + 2*T2 + T3 + 2) / 4 c = (T2 + 2*T3 + T4 + 2) / 4 d = (T3 + 2*T4 + T5 + 2) / 4 e = (T4 + 2*T5 + T6 + 2) / 4 f = (T5 + 2*T6 + T7 + 2) / 4 g = (T6 + 3*T7 + 2) / 4
Diagonal Down/Left (SVQ3)
- H.264: not used
- SVQ3: mode 3
- RV40: not used
LT | T0 T1 T2 T3 --------------------- L0 | a b c c L1 | b c c c L2 | c c c c L3 | c c c c
where:
a = (L1 + T1) / 2 b = (L2 + T2) / 2 c = (L3 + T3) / 2
Diagonal Down/Left (RV40)
- H.264: not used
- SVQ3: not used
- RV40: mode 4
LT | T0 T1 T2 T3 T4 T5 T6 T7 ------------------------------------- L0 | a b c d L1 | b c d e L2 | c d e f L3 | d e f g L4 | L5 | L6 | L7 |
where:
a = (T0 + 2*T1 + T2 + L0 + 2*L1 + L2 + 4) / 8 b = (T1 + 2*T2 + T3 + L1 + 2*L2 + L3 + 4) / 8 c = (T2 + 2*T3 + T4 + L2 + 2*L3 + L4 + 4) / 8 d = (T3 + 2*T4 + T5 + L3 + 2*L4 + L5 + 4) / 8 e = (T4 + 2*T5 + T6 + L4 + 2*L5 + L6 + 4) / 8 f = (T5 + 2*T6 + T7 + L5 + 2*L6 + L7 + 4) / 8 g = (T6 + T7 + L6 + L7 + 2) / 4
Diagonal Down/Right
- H.264: mode 4
- SVQ3: mode 4
- RV40: mode 3
LT | T0 T1 T2 T3 --------------------- L0 | d e f g L1 | c d e f L2 | b c d e L3 | a b c d
where:
a = (L3 + 2*L2 + L1 + 2) / 4 b = (L2 + 2*L1 + L0 + 2) / 4 c = (L1 + 2*L0 + LT + 2) / 4 d = (L0 + 2*LT + T0 + 2) / 4 e = (LT + 2*T0 + T1 + 2) / 4 f = (T0 + 2*T1 + T2 + 2) / 4 g = (T1 + 2*T2 + T3 + 2) / 4
Vertical/Right
- H.264: mode 5
- SVQ3: mode 5
- RV40: mode 5
LT | T0 T1 T2 T3 --------------------- L0 | a b c d L1 | e f g h L2 | i a b c L3 | j e f g
where:
a = (LT + T0 + 1) / 2 b = (T0 + T1 + 1) / 2 c = (T1 + T2 + 1) / 2 d = (T2 + T3 + 1) / 2 e = (L0 + 2*LT + T0 + 2) / 4 f = (LT + 2*T0 + T1 + 2) / 4 g = (T0 + 2*T1 + T2 + 2) / 4 h = (T1 + 2*T2 + T3 + 2) / 4 i = (LT + 2*L0 + L1 + 2) / 4 j = (L0 + 2*L1 + L2 + 2) / 4
Horizontal/Down
- H.264: mode 6
- SVQ3: mode 6
- RV40: mode 8
LT | T0 T1 T2 T3 --------------------- L0 | a b c d L1 | e f a b L2 | g h e f L3 | i j g h
where:
a = (LT + L0 + 1) / 2 b = (L0 + 2*LT + T0 + 2) / 4 c = (LT + 2*T0 + T1 + 2) / 4 d = (T0 + 2*T1 + T2 + 2) / 4 e = (L0 + L1 + 1) / 2 f = (LT + 2*L0 + L1 + 2) / 4 g = (L1 + L2 + 1) / 2 h = (L0 + 2*L1 + L2 + 2) / 4 g = (L2 + L3 + 1) / 2 j = (L1 + 2*L2 + L3 + 2) / 4
Vertical/Left
- H.264: mode 7
- SVQ3: mode 7
- RV40: mode 6
LT | T0 T1 T2 T3 T4 T5 T6 T7 ------------------------------------- L0 | a b c d L1 | f g h i L2 | b c d e L3 | g h i j
where:
a = (T0 + T1 + 1) / 2 b = (T1 + T2 + 1) / 2 c = (T2 + T3 + 1) / 2 d = (T3 + T4 + 1) / 2 e = (T4 + T5 + 1) / 2 f = (T0 + 2*T1 + T2 + 2) / 4 g = (T1 + 2*T2 + T3 + 2) / 4 h = (T2 + 2*T3 + T4 + 2) / 4 i = (T3 + 2*T4 + T5 + 2) / 4 j = (T4 + 2*T5 + T6 + 2) / 4
For RV40 two coefficients differ:
a = (2*T0 + 2*T1 + L1 + 2*L2 + L3 + 4) / 8 f = ( T0 + 2*T1 + T2 + L2 + 2*L3 + L4 + 4) / 8
Horizontal/Up
- H.264: mode 8
- SVQ3: mode 8
- RV40: not used
LT | T0 T1 T2 T3 --------------------- L0 | a b c d L1 | c d e f L2 | e f g g L3 | g g g g
where:
a = (L0 + L1 + 1) / 2 b = (L0 + 2*L1 + L2 + 2) / 4 c = (L1 + L2 + 1) / 2 d = (L1 + 2*L2 + L3 + 2) / 4 e = (L2 + L3 + 1) / 2 f = (L2 + 2*L3 + L3 + 2) / 4 g = L3
Horizontal/Up (RV40)
- H.264: not used
- SVQ3: not used
- RV40: mode 7
LT | T0 T1 T2 T3 --------------------- L0 | a b c d L1 | c d e f L2 | e f g h L3 | g h i j L4 | L5 | L6 | L7 |
where:
a = (T1 + 2*T2 + T3 + 2*L0 + 2*L1 + 4) / 8 b = (T2 + 2*T3 + T4 + L0 + 2*L1 + L2 + 4) / 8 c = (T3 + 2*T4 + T5 + 2*L1 + 2*L2 + 4) / 8 d = (T4 + 2*T5 + T6 + L1 + 2*L2 + L3 + 4) / 8 e = (T5 + 2*T6 + T7 + 2*L2 + 2*L3 + 4) / 8 f = (T6 + 3*T7 + L2 + 3*L3 + 4) / 8 g = (T6 + T7 + L3 + L4 + 2) / 4 h = ( L3 + 2*L4 + L5 + 2) / 4 i = ( L4 + L5 + 1) / 2 j = ( L4 + 2*L5 + L6 + 2) / 4
Left/DC
- H.264: mode 9
- SVQ3: mode 9
- RV40: not used
LT | T0 T1 T2 T3 --------------------- L0 | a a a a L1 | a a a a L2 | a a a a L3 | a a a a
where:
a = (L0 + L1 + L2 + L3 + 2) / 4
Top/DC
- H.264: mode 10
- SVQ3: mode 10
- RV40: not used
LT | T0 T1 T2 T3 --------------------- L0 | a a a a L1 | a a a a L2 | a a a a L3 | a a a a
where:
a = (T0 + T1 + T2 + T3 + 2) / 4
DC-128
- H.264: mode 11
- SVQ3: mode 11
- RV40: not used
LT | T0 T1 T2 T3 ------------------------ L0 | 128 128 128 128 L1 | 128 128 128 128 L2 | 128 128 128 128 L3 | 128 128 128 128
16x16 Prediction Modes
DC
- H.264: mode 0
- SVQ3: mode 0
- RV40: mode 0
Using the 16 top predictors (T0..T15) and the 16 left predictors (L0..L15), set all 256 elements to the mean, computed as:
mean = (sum(T0..T15) + sum(L0..L15) + 16) / 32
Vertical
- H.264: mode 1
- SVQ3: mode 1
- RV40: mode 1
LT | T0 T1 T2 T3 T4 .. T15 ------------------------- .. ----- L0 | T0 T1 T2 T3 T4 .. T15 L1 | T0 T1 T2 T3 T4 .. T15 L2 | T0 T1 T2 T3 T4 .. T15 ...... L15 | T0 T1 T2 T3 T4 .. T15
Horizontal
- H.264: mode 2
- SVQ3: mode 2
- RV40: mode 2
LT | T0 T1 T2 T3 T4 .. T15 --------------------------- .. ----- L0 | L0 L0 L0 L0 L0 .. L0 L1 | L1 L1 L1 L1 L1 .. L1 L2 | L2 L2 L2 L2 L2 .. L2 ...... L15 | L15 L15 L15 L15 L15 .. L15
Plane
- H.264: mode 3
- SVQ3: mode 3
- RV40: not used
Notice that SVQ3 follows a slightly different method here.
Given the top predictors (T0..T15), left predictors (L0..L15) and the left-top corner predictor (LT) arranged as follows:
LT | T0 T1 T2 .. T15 ------------------------ .. ----- L0 | c0,0 c1,0 c2,0 .. c15,0 L1 | c0,1 c1,1 c2,1 .. c15,1 ...... L15 | c0,15 c1,15 c2,15 .. c15,15
Compute H and V as:
H = (T8 - T6) + (T9 - T5) + (T10 - T4) + (T11 - T3) + (T12 - T2) + (T13 - T1) + (T14 - T0) + (T15 - LT)
V = (L8 - L6) + (L9 - L5) + (L10 - L4) + (L11 - L3) + (L12 - L2) + (L13 - L1) + (L14 - L0) + (L15 - LT)
For H.264, further compute H and V as:
H = (5*H + 32) / 64 V = (5*V + 32) / 64
For SVQ3, further compute H and V as:
H = (5*(H/4)) / 16 V = (5*(V/4)) / 16 swap H and V
The final process for filling in the 16x16 block is:
a = 16 * (L15 + T15 + 1) - 7*(V+H) for (j = 0..15) for (i = 0..15) b = a + V * j + H * i c[i,j] = SATURATE_U8(b / 32)
The SATURATE_U8() function indicates that the result of the operation should be bounded to an unsigned 8-bit range (0..255).
Plane (RV40)
- H.264: not used
- SVQ3: not used
- RV40: mode 3
Left/DC
- H.264: mode 4
- SVQ3: mode 4
- RV40: not used
Using 16 left predictors (L0..L15), set all 256 elements to the mean, computed as:
mean = (sum(L0..L15) + 8) / 16
Top/DC
- H.264: mode 5
- SVQ3: mode 5
- RV40: not used
Using the 16 top predictors (T0..T15), set all 256 elements to the mean, computed as:
mean = (sum(T0..T15) + 8) / 16
DC-128
- H.264: mode 6
- SVQ3: mode 6
- RV40: not used
Set all 256 elements to 128.