H.264 Prediction: Difference between revisions
(Correct information about H.264 plane prediction and merge with RV40 plane prediction) |
(Merge DC-top, DC-left, and DC-128 into DC. They aren't logically separate, and don't have their own mode numbers.) |
||
Line 46: | Line 46: | ||
where: | where: | ||
a = (T0 + T1 + T2 + T3 + L0 + L1 + L2 + L3 + 4) / 8 | if top and left predictors are available | ||
a = (T0 + T1 + T2 + T3 + L0 + L1 + L2 + L3 + 4) / 8 | |||
else if top predictors are available | |||
a = (T0 + T1 + T2 + T3 + 2) / 4 | |||
else if left predictors are available | |||
a = (L0 + L1 + L2 + L3 + 2) / 4 | |||
else | |||
a = 128 | |||
=== Diagonal Down/Left === | === Diagonal Down/Left === | ||
Line 276: | Line 283: | ||
i = ( L4 + L5 + 1) / 2 | i = ( L4 + L5 + 1) / 2 | ||
j = ( L4 + 2*L5 + L6 + 2) / 4 | j = ( L4 + 2*L5 + L6 + 2) / 4 | ||
== 16x16 Prediction Modes == | == 16x16 Prediction Modes == | ||
Line 334: | Line 294: | ||
Using the 16 top predictors (T0..T15) and the 16 left predictors (L0..L15), set all 256 elements to the mean, computed as: | Using the 16 top predictors (T0..T15) and the 16 left predictors (L0..L15), set all 256 elements to the mean, computed as: | ||
if top and left predictors are available | |||
mean = (sum(T0..T15) + sum(L0..L15) + 16) / 32 | |||
else if top predictors are available | |||
mean = (sum(T0..T15) + 8) / 16 | |||
else if left predictors are available | |||
mean = (sum(L0..L15) + 8) / 16 | |||
else | |||
mean = 128 | |||
=== Vertical === | === Vertical === | ||
Line 426: | Line 393: | ||
The SATURATE_U8() function indicates that the result of the operation should be bounded to an unsigned 8-bit range (0..255). | The SATURATE_U8() function indicates that the result of the operation should be bounded to an unsigned 8-bit range (0..255). | ||
[[Category:Compression Theory]] | [[Category:Compression Theory]] |
Revision as of 12:55, 15 August 2007
This page documents the various prediction methods used in H.264 and related formats such as Sorenson Video 3 and RealVideo 4.
4x4 Prediction Modes
4x4 prediction modes vary between different codecs. While they are almost the same for H.264 and Sorenson Video 3, RealVideo 4 has a different order for these modes and some of them significantly differ from H.264 counterparts (by using left predictors where H.264 does not and down left predictors which are not used elsewhere).
Vertical
- H.264: mode 0
- SVQ3: mode 0
- RV40: mode 1
LT | T0 T1 T2 T3 --------------------- L0 | T0 T1 T2 T3 L1 | T0 T1 T2 T3 L2 | T0 T1 T2 T3 L3 | T0 T1 T2 T3
Horizontal
- H.264: mode 1
- SVQ3: mode 1
- RV40: mode 2
LT | T0 T1 T2 T3 --------------------- L0 | L0 L0 L0 L0 L1 | L1 L1 L1 L1 L2 | L2 L2 L2 L2 L3 | L3 L3 L3 L3
DC
- H.264: mode 2
- SVQ3: mode 2
- RV40: mode 0
LT | T0 T1 T2 T3 --------------------- L0 | a a a a L1 | a a a a L2 | a a a a L3 | a a a a
where:
if top and left predictors are available a = (T0 + T1 + T2 + T3 + L0 + L1 + L2 + L3 + 4) / 8 else if top predictors are available a = (T0 + T1 + T2 + T3 + 2) / 4 else if left predictors are available a = (L0 + L1 + L2 + L3 + 2) / 4 else a = 128
Diagonal Down/Left
- H.264: mode 3
- SVQ3: not used
- RV40: not used
LT | T0 T1 T2 T3 T4 T5 T6 T7 ------------------------------------- L0 | a b c d L1 | b c d e L2 | c d e f L3 | d e f g
where:
a = (T0 + 2*T1 + T2 + 2) / 4 b = (T1 + 2*T2 + T3 + 2) / 4 c = (T2 + 2*T3 + T4 + 2) / 4 d = (T3 + 2*T4 + T5 + 2) / 4 e = (T4 + 2*T5 + T6 + 2) / 4 f = (T5 + 2*T6 + T7 + 2) / 4 g = (T6 + 3*T7 + 2) / 4
Diagonal Down/Left (SVQ3)
- H.264: not used
- SVQ3: mode 3
- RV40: not used
LT | T0 T1 T2 T3 --------------------- L0 | a b c c L1 | b c c c L2 | c c c c L3 | c c c c
where:
a = (L1 + T1) / 2 b = (L2 + T2) / 2 c = (L3 + T3) / 2
Diagonal Down/Left (RV40)
- H.264: not used
- SVQ3: not used
- RV40: mode 4
LT | T0 T1 T2 T3 T4 T5 T6 T7 ------------------------------------- L0 | a b c d L1 | b c d e L2 | c d e f L3 | d e f g L4 | L5 | L6 | L7 |
where:
a = (T0 + 2*T1 + T2 + L0 + 2*L1 + L2 + 4) / 8 b = (T1 + 2*T2 + T3 + L1 + 2*L2 + L3 + 4) / 8 c = (T2 + 2*T3 + T4 + L2 + 2*L3 + L4 + 4) / 8 d = (T3 + 2*T4 + T5 + L3 + 2*L4 + L5 + 4) / 8 e = (T4 + 2*T5 + T6 + L4 + 2*L5 + L6 + 4) / 8 f = (T5 + 2*T6 + T7 + L5 + 2*L6 + L7 + 4) / 8 g = (T6 + T7 + L6 + L7 + 2) / 4
Diagonal Down/Right
- H.264: mode 4
- SVQ3: mode 4
- RV40: mode 3
LT | T0 T1 T2 T3 --------------------- L0 | d e f g L1 | c d e f L2 | b c d e L3 | a b c d
where:
a = (L3 + 2*L2 + L1 + 2) / 4 b = (L2 + 2*L1 + L0 + 2) / 4 c = (L1 + 2*L0 + LT + 2) / 4 d = (L0 + 2*LT + T0 + 2) / 4 e = (LT + 2*T0 + T1 + 2) / 4 f = (T0 + 2*T1 + T2 + 2) / 4 g = (T1 + 2*T2 + T3 + 2) / 4
Vertical/Right
- H.264: mode 5
- SVQ3: mode 5
- RV40: mode 5
LT | T0 T1 T2 T3 --------------------- L0 | a b c d L1 | e f g h L2 | i a b c L3 | j e f g
where:
a = (LT + T0 + 1) / 2 b = (T0 + T1 + 1) / 2 c = (T1 + T2 + 1) / 2 d = (T2 + T3 + 1) / 2 e = (L0 + 2*LT + T0 + 2) / 4 f = (LT + 2*T0 + T1 + 2) / 4 g = (T0 + 2*T1 + T2 + 2) / 4 h = (T1 + 2*T2 + T3 + 2) / 4 i = (LT + 2*L0 + L1 + 2) / 4 j = (L0 + 2*L1 + L2 + 2) / 4
Horizontal/Down
- H.264: mode 6
- SVQ3: mode 6
- RV40: mode 8
LT | T0 T1 T2 T3 --------------------- L0 | a b c d L1 | e f a b L2 | g h e f L3 | i j g h
where:
a = (LT + L0 + 1) / 2 b = (L0 + 2*LT + T0 + 2) / 4 c = (LT + 2*T0 + T1 + 2) / 4 d = (T0 + 2*T1 + T2 + 2) / 4 e = (L0 + L1 + 1) / 2 f = (LT + 2*L0 + L1 + 2) / 4 g = (L1 + L2 + 1) / 2 h = (L0 + 2*L1 + L2 + 2) / 4 g = (L2 + L3 + 1) / 2 j = (L1 + 2*L2 + L3 + 2) / 4
Vertical/Left
- H.264: mode 7
- SVQ3: mode 7
- RV40: mode 6
LT | T0 T1 T2 T3 T4 T5 T6 T7 ------------------------------------- L0 | a b c d L1 | f g h i L2 | b c d e L3 | g h i j
where:
a = (T0 + T1 + 1) / 2 b = (T1 + T2 + 1) / 2 c = (T2 + T3 + 1) / 2 d = (T3 + T4 + 1) / 2 e = (T4 + T5 + 1) / 2 f = (T0 + 2*T1 + T2 + 2) / 4 g = (T1 + 2*T2 + T3 + 2) / 4 h = (T2 + 2*T3 + T4 + 2) / 4 i = (T3 + 2*T4 + T5 + 2) / 4 j = (T4 + 2*T5 + T6 + 2) / 4
For RV40 two coefficients differ:
a = (2*T0 + 2*T1 + L1 + 2*L2 + L3 + 4) / 8 f = ( T0 + 2*T1 + T2 + L2 + 2*L3 + L4 + 4) / 8
Horizontal/Up
- H.264: mode 8
- SVQ3: mode 8
- RV40: not used
LT | T0 T1 T2 T3 --------------------- L0 | a b c d L1 | c d e f L2 | e f g g L3 | g g g g
where:
a = (L0 + L1 + 1) / 2 b = (L0 + 2*L1 + L2 + 2) / 4 c = (L1 + L2 + 1) / 2 d = (L1 + 2*L2 + L3 + 2) / 4 e = (L2 + L3 + 1) / 2 f = (L2 + 2*L3 + L3 + 2) / 4 g = L3
Horizontal/Up (RV40)
- H.264: not used
- SVQ3: not used
- RV40: mode 7
LT | T0 T1 T2 T3 --------------------- L0 | a b c d L1 | c d e f L2 | e f g h L3 | g h i j L4 | L5 | L6 | L7 |
where:
a = (T1 + 2*T2 + T3 + 2*L0 + 2*L1 + 4) / 8 b = (T2 + 2*T3 + T4 + L0 + 2*L1 + L2 + 4) / 8 c = (T3 + 2*T4 + T5 + 2*L1 + 2*L2 + 4) / 8 d = (T4 + 2*T5 + T6 + L1 + 2*L2 + L3 + 4) / 8 e = (T5 + 2*T6 + T7 + 2*L2 + 2*L3 + 4) / 8 f = (T6 + 3*T7 + L2 + 3*L3 + 4) / 8 g = (T6 + T7 + L3 + L4 + 2) / 4 h = ( L3 + 2*L4 + L5 + 2) / 4 i = ( L4 + L5 + 1) / 2 j = ( L4 + 2*L5 + L6 + 2) / 4
16x16 Prediction Modes
DC
- H.264: mode 0
- SVQ3: mode 0
- RV40: mode 0
Using the 16 top predictors (T0..T15) and the 16 left predictors (L0..L15), set all 256 elements to the mean, computed as:
if top and left predictors are available mean = (sum(T0..T15) + sum(L0..L15) + 16) / 32 else if top predictors are available mean = (sum(T0..T15) + 8) / 16 else if left predictors are available mean = (sum(L0..L15) + 8) / 16 else mean = 128
Vertical
- H.264: mode 1
- SVQ3: mode 1
- RV40: mode 1
LT | T0 T1 T2 T3 T4 .. T15 -------------------------- .. ----- L0 | T0 T1 T2 T3 T4 .. T15 L1 | T0 T1 T2 T3 T4 .. T15 L2 | T0 T1 T2 T3 T4 .. T15 ...... L15 | T0 T1 T2 T3 T4 .. T15
Horizontal
- H.264: mode 2
- SVQ3: mode 2
- RV40: mode 2
LT | T0 T1 T2 T3 T4 .. T15 --------------------------- .. ----- L0 | L0 L0 L0 L0 L0 .. L0 L1 | L1 L1 L1 L1 L1 .. L1 L2 | L2 L2 L2 L2 L2 .. L2 ...... L15 | L15 L15 L15 L15 L15 .. L15
Plane
- H.264: mode 3
- SVQ3: mode 3
- RV40: mode 3
Notice that SVQ3 and RV40 follow a slightly different method here.
Given the top predictors (T0..T15), left predictors (L0..L15) and the left-top corner predictor (LT) arranged as follows:
LT | T0 T1 T2 .. T15 ----------------------------------- .. -------- L0 | c[ 0, 0] c[ 1, 0] c[ 2, 0] .. c[15, 0] L1 | c[ 0, 1] c[ 1, 1] c[ 2, 1] .. c[15, 1] ...... L15 | c[ 0,15] c[ 1,15] c[ 2,15] .. c[15,15]
Compute H and V as:
H' = 1* (T8 - T6) + 2* (T9 - T5) + 3*(T10 - T4) + 4*(T11 - T3) + 5*(T12 - T2) + 6*(T13 - T1) + 7*(T14 - T0) + 8*(T15 - LT)
V' = 1* (L8 - L6) + 2* (L9 - L5) + 3*(L10 - L4) + 4*(L11 - L3) + 5*(L12 - L2) + 6*(L13 - L1) + 7*(L14 - L0) + 8*(L15 - LT)
For H.264, compute H and V as:
H = (5*H' + 32) / 64 V = (5*V' + 32) / 64
For SVQ3, compute H and V as:
V = (5*(H'/4)) / 16 H = (5*(V'/4)) / 16 (notice that V and H are computed from H' and V', respectively)
For RV40, compute H and V as:
H = (5*(H' >> 2)) >> 4 V = (5*(V' >> 2)) >> 4 (like SVQ3 but without swapping and it's important to use shifts here instead of divisions)
The final process for filling in the 16x16 block is:
a = 16 * (L15 + T15 + 1) - 7*(V+H) for (j = 0..15) for (i = 0..15) b = a + V * j + H * i c[i,j] = SATURATE_U8(b / 32)
The SATURATE_U8() function indicates that the result of the operation should be bounded to an unsigned 8-bit range (0..255).