Difference between revisions of "Bink Video 2"

From MultimediaWiki
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
* FourCC: KB2a-KR2i (in [[Bink Container]])
* FourCC: KB2a-KB2i (in [[Bink Container]])
* Company: [[RAD Game Tools]]
* Company: [[RAD Game Tools]]
* Samples: ''ADD ME''
* Samples: ''ADD ME''
Line 5: Line 5:
Bink Video 2 is a successor to [[Bink Video]].
Bink Video 2 is a successor to [[Bink Video]].


This iteration operates on 32x32 macroblock employing simplified 8x8 DCT.
This iteration operates on 32x32 macroblock employing simplified 8x8 DCT. Bink versions before 2.2 (i.e. KBa-KB2f) employed floating-point IDCT, newer versions use integer IDCT.


== Overall design ==
== Overall design ==


Bink2 codes frames in two slices
Bink2 codes frames in two slices, each slice comprises 32x32 macroblocks (16x16 for chroma). There are four possible macroblock types: intra, skip, motion only and motion plus residue. Intra data and residue are coded as 2x2 groups of 8x8 blocks with one of two codebooks selectable per block. Each 16x16 block can have its own motion vector. DCs and motion vectors are predicted from their neighbours when available.


== Bitstream format ==
== Bitstream format ==


Bink2 frame begins with two 4-byte little-endian words: the first one is some flags, the second one is offset to the second slice data.
Bink2 frame begins with two 4-byte little-endian words: the first one is frame flags, the second one is offset to the second slice data. Bitstream is LSB-coded.
There are four possible macroblock types: intra, skip, motion only and motion plus residue. Keyframes contain obviously only intra blocks and don't code block types. Macroblock consists of 4x4 luma blocks, two 2x2 chroma blocks and optional 4x4 alpha blocks. DCs are coded separately before ACs.
 
Keyframes contain obviously only intra blocks and don't code block types. Intra macroblocks contain IDCT block data, motion blocks contain motion vector data before optional IDCT data.


Internally CBP is represented as two bitmasks - low bits are actual coded bits pattern, top bits select an alternative Huffman codes for ACs.
Internally CBP is represented as two bitmasks - low bits are actual coded bits pattern, top bits select an alternative Huffman codes for ACs.
Line 44: Line 45:
   }
   }


Inter and intra coding are different only by quantisation matrix used and DC range (-1024..1023 instead of 0..2047).
Inter and intra coding employ the same coding methods but different scans, quantisation matrices and DC range (-1024..1023 instead of 0..2047).


All block types are coded as:
All block types are coded as:
Line 53: Line 54:
   AC blocks
   AC blocks


Quantiser is coded as the difference to the previous macroblock quantiser with variable-length codes and optional sign bit. Each component has its own quantiser initially set to 8 (at the beginning of every line?).
Quantiser is coded as the difference to the previous macroblock quantiser with variable-length codes and optional sign bit. Each component has its own quantiser initially set to 8 at the beginning of every line.


==== Luma CBP coding ====
==== Luma CBP coding ====
Line 59: Line 60:
   11 - reuse previous CBP in full
   11 - reuse previous CBP in full
   10 - reuse previous CBP low bits (no VLC selection)
   10 - reuse previous CBP low bits (no VLC selection)
   0  - decode base CBP by nibbles (0 - keep previous nibble, 1 - read new)
   0  - decode base CBP by nibbles


Then, unless it's full reuse of course, VLC part is decoded. For this you iterate by nibbles and if it has nonzero bits and either exactly one bit set or bit read from bitstream is one you read bits for VLC pattern corresponding to base CBP bits.
Nibble decoding starts with the second nibble of original CBP as the reference and for each nibble of CBP one bit it read, if it is set then keep the nibble as is, otherwise read new nibble.
 
Then, unless it's full reuse of course, VLC part is decoded. For this you iterate by nibbles and if it has nonzero bits and either exactly one bit set or bit read from bitstream is set you read bits for VLC pattern corresponding to base CBP bits (i.e. if bit 3 is set in CBP then you read a bit to determine which VLC code to use).


==== Chroma CBP coding ====
==== Chroma CBP coding ====
Line 87: Line 90:
In case it's the first macroblock in the slice an addition start value is read with its size depending on dc_bits and quantiser.
In case it's the first macroblock in the slice an addition start value is read with its size depending on dc_bits and quantiser.


DCs use median prediction in form min(max(A + B - C, min(A, B, C)), max(A, B, C)). Here is the table of prediction values used for each luma DC.
  add_bits = { 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 6 }[quant] + dc_bits - 1;
  if (add_bits < 10) {
    len = get_bits(10 - add_bits);
    if (len > 0 && get_bit())
      len = -len;
    dc[0] += (len << dc_bits) * dc_scale;
  }


  [ 0] =    0,   0,   0
DCs use median prediction in form <code>min(max(A + B - C, min(A, B, C)), max(A, B, C))</code>. When only two predictors are available it uses <code>min(max(A, B), max(min(A, B), 2 * A - B))</code>. When possible it tries to predict value from top, left and top-left neighbours. DC prediction goes in order:
  [ 1] =  [0],   0,   0
 
  [ 2] =  [0], [0], [1]
  0  1  4  5
  [ 3] =  [1], [2], [0]
  2 3  6  7
  [ 4] = [1],  [1],  [3]
    
   [ 5] =  [4], [4],  [4]
  8  9  12 13
  [ 6] = [4],  [3],  [1]
   10 11  14 15
   [ 7] =  [5],  [6], [4]
 
   [ 8] =  [2],    0, [3]
When not all neighbours are available, it uses left/top neigbour or 1023 for the first block. For intra blocks with non-intra neighbours the average is calculated on those blocks and used instead of DC values.
  [ 9] = [3],  [8],  [2]
 
   [10] =  [8],  [8],  [9]
Chroma blocks code only 4 values but they do it in the same manner.
  [11] = [9], [10],  [8]
  [12] =  [6],  [9],  [3]
  [13] =  [7], [12], [6]
  [14] = [12], [11],  [9]
  [15] = [13], [14], [12]


==== AC coding ====
==== AC coding ====
Line 145: Line 149:
==== Motion data ====
==== Motion data ====


It's the same as decoding 4 DCs twice: read size, read 4 values, read their signs, for the first macroblock in line read 5-bit start value with sign, repeat second time.
It's the same as decoding 4 DCs twice: for each MV component read size, read 4 values, read their signs for non-zero values. For the first macroblock in the slice there is 5-bit value with a sign for non-zero coded after each component that should be multiplied by 16 and added to the first MV value.
 
MVs are predicted using median prediction from top, left and top-left neighbours. For intra blocks in inter frame motion vector data is filled too (but not coded).


=== KB2g-KB2i ===
=== KB2g-KB2i ===
Line 151: Line 157:
Bitstream format has changed somewhat but the design is the same.
Bitstream format has changed somewhat but the design is the same.


  (for newer versions only)column VLC flags, row VLC flags
   for each macroblock {
   for each macroblock {
     unless keyframe get block type
     unless keyframe get block type
Line 161: Line 168:
       if (alpha)
       if (alpha)
         intra_alpha (the same coding as intra_luma)
         intra_alpha (the same coding as intra_luma)
      if (another_plane)
        intra_plane (the same coding as intra_luma)
       break;
       break;
     case MOTION:
     case MOTION:
Line 172: Line 181:
       inter_chroma
       inter_chroma
       if (alpha)
       if (alpha)
         inter_alpha
         inter_alpha (the same coding as inter_luma)
      if (another_plane)
        inter_plane (the same coding as inter_luma)
       break;
       break;
     }
     }
Line 188: Line 199:
   if (dq && get_bit())
   if (dq && get_bit())
     dq = -dq;
     dq = -dq;
Initial quantiser is set to 16, the following quantisers are predicted based on neighbours. Residue macroblock quantisers are predicted and updated independently from intra ones.


==== Luma CBP decoding ====
==== Luma CBP decoding ====
Line 208: Line 221:
   }
   }
   cbp ^= mask;
   cbp ^= mask;
   if (get_bit())
   if (no col/row flag set && get_bit())
     cbp |= cbp << 16; //VLC part
     cbp |= cbp << 16; //VLC part


Line 235: Line 248:
==== AC decoding ====
==== AC decoding ====


This resembles AC decoding for the older version(s) but:
  idx = 1;
* skip part is now decoded before value part
  esc_len = 0;
* values are coded in the same way as DCs above
  dst[0] = dc * 8 + 32;
  while (idx < 64) {
    if (esc_len-- <= 0) {
      skip = get_code(skip_cb);
      if (skip == 11)
        skip = get_bits(6);
      else if (skip == 13) {
        skip = 0;
        esc_len = 7;
      }
    }
    prefix = get_limited_unary(12, 0) + 1;
    if (prefix >= 4)
      level = (1 << (prefix - 3)) + get_bits(prefix - 3) + 2;
    else
      level = prefix;
    if (level && get_bit())
      level = -level;
    pos = zigzag[idx];
    dst[pos] = ((level * quant[q & 3][pos] << (q >> 2)) + 0x40) >> 7;
    idx++;
  }


==== Motion decoding ====
==== Motion decoding ====
Line 244: Line 278:


   mv[i] = get_vlc(mv_vlc);
   mv[i] = get_vlc(mv_vlc);
   if (mv[i] == 8) { //escape
   if (mv[i] == esc) { //escape
     bits = get_unary(0, 12) + 4;
     bits = get_unary(12, 1) + 4;
     mv[i] = get_bits(bits) + (1 << bits) - 1;
     mv[i] = get_bits(bits) + (1 << bits) - 1;
     if (mv[i] & 1)
     if (mv[i] & 1)
Line 257: Line 291:
=== DCT ===
=== DCT ===


  **TODO**
Floating-point IDCT:
 
  t00 = src[2] + src[6];
  t01 = (src[2] - src[6]) * 1.4142135 - t00;
  t02 = src[0] + src[4];
  t03 = src[0] - src[4];
  t04 = src[3] + src[5];
  t05 = src[3] - src[5];
  t06 = src[1] + src[7];
  t07 = src[1] - src[7];
  t08 = t02 + t00;
  t09 = t02 - t00;
  t10 = t03 + t01;
  t11 = t03 - t01;
  t12 = t06 + t04;
  t13 = (t06 - t04) * 1.4142135;
  t14 = (t07 - t05) * 1.847759;
  t15 = t05 * 2.613126 + t14 - t12;
  t16 = t13 - t15;
  t17 = t07 * 1.0823922 - t14 + t16;
 
  dst[0] = t08 + t12;
  dst[1] = t10 + t15;
  dst[2] = t11 + t16;
  dst[3] = t09 - t17;
  dst[4] = t09 + t17;
  dst[5] = t11 - t16;
  dst[6] = t10 - t15;
  dst[7] = t08 - t12;
 
Fixed-point IDCT:
 
  #define idct_mul_a(val) (val + (val >> 2))
  #define idct_mul_b(val) (val >> 1)
  #define idct_mul_c(val) (val - (val >> 2) - (val >> 4))
  #define idct_mul_d(val) (val + (val >> 2) - (val >> 4))
  #define idct_mul_e(val) (val >> 2)
 
  tmp00 = src[3] + src[5];
  tmp01 = src[3] - src[5];
  tmp02 = idct_mul_a(src[2]) + idct_mul_b(src[6]);
  tmp03 = idct_mul_b(src[2]) - idct_mul_a(src[6]);
  tmp0 = (src[0] + src[4]) + tmp02;
  tmp1 = (src[0] + src[4]) - tmp02;
  tmp2 = src[0] - src[4];
  tmp3 = src[1] + tmp00;
  tmp4 = src[1] - tmp00;
  tmp5 = tmp01 + src[7];
  tmp6 = tmp01 - src[7];
  tmp7 = tmp4 + idct_mul_c(tmp6);
  tmp8 = idct_mul_c(tmp4) - tmp6;
  tmp9  = idct_mul_d(tmp3) + idct_mul_e(tmp5);
  tmp10 = idct_mul_e(tmp3) - idct_mul_d(tmp5);
  tmp11 = tmp2 + tmp03;
  tmp12 = tmp2 - tmp03;
 
  dst[0] = tmp0  + tmp9;
  dst[1] = tmp11 + tmp7;
  dst[2] = tmp12 + tmp8;
  dst[3] = tmp1  + tmp10;
  dst[4] = tmp1  - tmp10;
  dst[5] = tmp12 - tmp8;
  dst[6] = tmp11 - tmp7;
  dst[7] = tmp0  - tmp9;
 
For the second stage output should be shifted right by 6.


=== Luma motion compensation ===
=== Luma motion compensation ===
Line 270: Line 369:
1/2:
1/2:
   (A + B + 1) >> 1
   (A + B + 1) >> 1
== Codebooks ==
=== KB2f ===
Quantiser absolute differences:
  0: 0x01, 1 bits
  1: 0x02, 2 bits
  2: 0x04, 3 bits
  3: 0x08, 4 bits
  4: 0x10, 7 bits
  5: 0x30, 7 bits
  6: 0x50, 7 bits
  7: 0x70, 7 bits
  8: 0x00, 8 bits
  9: 0x20, 8 bits
  10: 0x40, 8 bits
  11: 0x60, 8 bits
  12: 0x80, 8 bits
  13: 0xA0, 8 bits
  14: 0xC0, 8 bits
  15: 0xE0, 8 bits
For non-zero difference a sign bit is read afterwards.
AC values codebook 1:
  0:  0x04, 3 bits
  1:  0x01, 1 bits
  2:  0x02, 2 bits
  3:  0x00, 4 bits
  4:  0x08, 5 bits
  5:  0x18, 6 bits
  6:  0xF8, 8 bits
  7: 0x178, 9 bits
  8: 0x138, 9 bits
  9:  0x38, 9 bits
  10: 0x1B8, 9 bits
  11:  0x78, 9 bits
  12:  0xB8, 9 bits
AC values codebook 2:
  0:  0x0A, 6 bits
  1:  0x01, 1 bits
  2:  0x04, 3 bits
  3:  0x08, 4 bits
  4:  0x06, 3 bits
  5:  0x00, 4 bits
  6:  0x02, 4 bits
  7:  0x1A, 5 bits
  8:  0x2A, 7 bits
  9: 0x16A, 9 bits
  10: 0x1EA, 9 bits
  11:  0x6A, 9 bits
  12:  0xEA, 9 bits
AC zero run codebook 1:
  0:  0x00, 1 bits
  1:  0x01, 3 bits
  2:  0x0D, 4 bits
  3:  0x15, 5 bits
  4:  0x45, 7 bits
  5:  0x85, 8 bits
  6:  0xA5, 8 bits
  7: 0x165, 9 bits
  8:  0x65, 9 bits
  9: 0x1E5, 9 bits
  10:  0xE5, 9 bits
  11:  0x25, 8 bits
  12:  0x03, 2 bits
  13:  0x05, 8 bits
AC zero run codebook 2:
  0:  0x00, 1 bits
  1:  0x01, 3 bits
  2:  0x03, 4 bits
  3:  0x07, 4 bits
  4:  0x1F, 5 bits
  5:  0x1B, 7 bits
  6:  0x0F, 6 bits
  7:  0x2F, 6 bits
  8:  0x5B, 8 bits
  9:  0xDB, 9 bits
  10: 0x1DB, 9 bits
  11:  0x3B, 6 bits
  12:  0x05, 3 bits
  13:  0x0B, 5 bits
=== KB2g ===
AC zero run codebook 1:
  0: 0x01, 1 bits
  1: 0x04, 3 bits
  2: 0x00, 4 bits
  3: 0x08, 4 bits
  4: 0x02, 5 bits
  5: 0x32, 7 bits
  6: 0x0A, 5 bits
  7: 0x12, 6 bits
  8: 0x3A, 7 bits
  9: 0x7A, 8 bits
  10: 0xFA, 8 bits
  11: 0x72, 7 bits
  12: 0x06, 3 bits
  13: 0x1A, 6 bits
AC zero run codebook 2:
  0:  0x01, 1 bits
  1:  0x00, 3 bits
  2:  0x04, 4 bits
  3:  0x2C, 9 bits
  4:  0x6C, 9 bits
  5:  0x0C, 7 bits
  6:  0x4C, 7 bits
  7:  0xAC, 9 bits
  8:  0xEC, 8 bits
  9: 0x12C, 9 bits
  10: 0x16C, 9 bits
  11: 0x1AC, 9 bits
  12:  0x02, 2 bits
  13:  0x1C, 5 bits
Motion vector codebook:
  0: 0x01, 1 bits
  1: 0x06, 3 bits
  2: 0x0C, 5 bits
  3: 0x1C, 5 bits
  4: 0x18, 7 bits
  5: 0x38, 7 bits
  6: 0x58, 7 bits
  7: 0x78, 7 bits
-7: 0x68, 7 bits
-6: 0x48, 7 bits
-5: 0x28, 7 bits
-4: 0x08, 7 bits
-3: 0x14, 5 bits
-2: 0x04, 5 bits
-1: 0x02, 3 bits
esc: 0x00, 4 bits


[[Category:Video Codecs]]
[[Category:Video Codecs]]
[[Category:Game Formats]]
[[Category:Game Formats]]
[[Category:Formats missing in FFmpeg]]
[[Category:Formats missing in FFmpeg]]

Latest revision as of 10:21, 14 March 2019

Bink Video 2 is a successor to Bink Video.

This iteration operates on 32x32 macroblock employing simplified 8x8 DCT. Bink versions before 2.2 (i.e. KBa-KB2f) employed floating-point IDCT, newer versions use integer IDCT.

Overall design

Bink2 codes frames in two slices, each slice comprises 32x32 macroblocks (16x16 for chroma). There are four possible macroblock types: intra, skip, motion only and motion plus residue. Intra data and residue are coded as 2x2 groups of 8x8 blocks with one of two codebooks selectable per block. Each 16x16 block can have its own motion vector. DCs and motion vectors are predicted from their neighbours when available.

Bitstream format

Bink2 frame begins with two 4-byte little-endian words: the first one is frame flags, the second one is offset to the second slice data. Bitstream is LSB-coded.

Keyframes contain obviously only intra blocks and don't code block types. Intra macroblocks contain IDCT block data, motion blocks contain motion vector data before optional IDCT data.

Internally CBP is represented as two bitmasks - low bits are actual coded bits pattern, top bits select an alternative Huffman codes for ACs.

KB2f (and probably earlier)

 for each macroblock {
   unless keyframe get block type
   switch (block type) {
   case INTRA:
     intra_luma
     intra_chroma
     intra_chroma
     if (alpha)
       intra_alpha (the same coding as intra_luma)
     break;
   case MOTION:
     motion_data
     break;
   case INTER:
     motion_data
     inter_luma
     inter_chroma
     inter_chroma
     if (alpha)
       inter_alpha
     break;
   }
 }

Inter and intra coding employ the same coding methods but different scans, quantisation matrices and DC range (-1024..1023 instead of 0..2047).

All block types are coded as:

 CBP
 quantiser difference
 DCs
 AC blocks

Quantiser is coded as the difference to the previous macroblock quantiser with variable-length codes and optional sign bit. Each component has its own quantiser initially set to 8 at the beginning of every line.

Luma CBP coding

 11 - reuse previous CBP in full
 10 - reuse previous CBP low bits (no VLC selection)
 0  - decode base CBP by nibbles

Nibble decoding starts with the second nibble of original CBP as the reference and for each nibble of CBP one bit it read, if it is set then keep the nibble as is, otherwise read new nibble.

Then, unless it's full reuse of course, VLC part is decoded. For this you iterate by nibbles and if it has nonzero bits and either exactly one bit set or bit read from bitstream is set you read bits for VLC pattern corresponding to base CBP bits (i.e. if bit 3 is set in CBP then you read a bit to determine which VLC code to use).

Chroma CBP coding

 11 - reuse previous CBP in full
 10 - reuse previous CBP low bits (no VLC selection)
 0  - read new CBP (4 bits)

VLC part decoding is the same as for luma CBP

DC coding

 dc_bits = get_bits(3);
 if (dc_bits == 7)
   dc_bits += get_bits(2);
 for (i = 0; i < num_dcs; i += 4) {
   for (j = 0; j < 4; j++)
     dc[i + j] = get_bits(dc_bits);
   for (j = 0; j < 4; j++)
     if (dc[i + j])
       if (get_bit())
         dc[i + j] = -dc[i + j];
 }

In case it's the first macroblock in the slice an addition start value is read with its size depending on dc_bits and quantiser.

 add_bits = { 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 6 }[quant] + dc_bits - 1;
 if (add_bits < 10) {
   len = get_bits(10 - add_bits);
   if (len > 0 && get_bit())
     len = -len;
   dc[0] += (len << dc_bits) * dc_scale;
 }

DCs use median prediction in form min(max(A + B - C, min(A, B, C)), max(A, B, C)). When only two predictors are available it uses min(max(A, B), max(min(A, B), 2 * A - B)). When possible it tries to predict value from top, left and top-left neighbours. DC prediction goes in order:

  0  1   4  5
  2  3   6  7
 
  8  9  12 13
 10 11  14 15

When not all neighbours are available, it uses left/top neigbour or 1023 for the first block. For intra blocks with non-intra neighbours the average is calculated on those blocks and used instead of DC values.

Chroma blocks code only 4 values but they do it in the same manner.

AC coding

The coding is quite similar to any other DCT block coding with the only exception that skip value may indicate a run on 7 coded values.

 val_vlc  = (cbp & VLC_BIT) ? val_vlc2  : val_vlc1;
 skip_vlc = (cbp & VLC_BIT) ? skip_vlc2 : skip_vlc1;
 run = 0;
 idx = 1;
 do {
   val = get_vlc(val_vlc);
   if (val >= 4)
     val = (1 << val - 3) + get_bits(val - 3) + 2;
   if (val && get_bit())
     val = -val;
   block[scan[idx++]] = val;
   if (idx >= 64)
     break;
   run--;
   if (run <= 0) {
     skip = get_vlc(skip_vlc);
     switch (skip) {
     case 11:
       skip = get_bits(6);
       break;
     case 12:
       skip = 62;
       break;
     case 13:
       skip = 0;
       run  = 7;
       break;
     }
     idx += skip;
   }
 } while (idx < 64);
 

Motion data

It's the same as decoding 4 DCs twice: for each MV component read size, read 4 values, read their signs for non-zero values. For the first macroblock in the slice there is 5-bit value with a sign for non-zero coded after each component that should be multiplied by 16 and added to the first MV value.

MVs are predicted using median prediction from top, left and top-left neighbours. For intra blocks in inter frame motion vector data is filled too (but not coded).

KB2g-KB2i

Bitstream format has changed somewhat but the design is the same.

 (for newer versions only)column VLC flags, row VLC flags
 for each macroblock {
   unless keyframe get block type
   switch (block type) {
   case INTRA:
     quantiser
     intra_luma
     intra_chroma
     intra_chroma
     if (alpha)
       intra_alpha (the same coding as intra_luma)
     if (another_plane)
       intra_plane (the same coding as intra_luma)
     break;
   case MOTION:
     motion_data
     break;
   case INTER:
     motion_data
     quantiser
     inter_luma
     inter_chroma
     inter_chroma
     if (alpha)
       inter_alpha (the same coding as inter_luma)
     if (another_plane)
       inter_plane (the same coding as inter_luma)
     break;
   }
 }

Block type is now coded with truncated unary code that is index in block type list. Selected type is then moved one position to the front of the list then. Initial contents are { MOTION, INTER, SKIP, INTRA }

Quantiser is now applicable to the whole macroblock instead of components and its delta is coded in this way:

 dq = get_unary(0, 4);
 if (dq == 3)
   dq += get_bit();
 else if (dq == 4)
   dq += get_bits(5) + 1;
 if (dq && get_bit())
   dq = -dq;

Initial quantiser is set to 16, the following quantisers are predicted based on neighbours. Residue macroblock quantisers are predicted and updated independently from intra ones.

Luma CBP decoding

 ones = ones_count(prev_cbp & 0xFFFF);
 if (ones >= 8) {
   ones = 16 - ones;
   mask = 0xFFFF;
 } else {
   mask = 0;
 }
 cbp = 0;
 if (!get_bit()) {
   if (ones > 3)
     cbp = get_bits(16);
   else
     for (i = 0; i < 16; i += 4)
       if (!get_bit())
         cbp |= get_bits(4) << i;
 }
 cbp ^= mask;
 if (no col/row flag set && get_bit())
   cbp |= cbp << 16; //VLC part

Chroma CBP decoding

 pattern[16] = { 0, 0, 0, 0xF, 0, 0xF, 0xF, 0xF, 0, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF };
 
 if (get_bit())
   cbp = (VLC part of prev_cbp) | pattern[prev_cbp & 0xF];
 else {
   cbp = get_bits(4);
   if (get_bit())
     VLC part = cbp;
 }

DC decoding

Now each element is decoded individually:

 dc[i] = get_unary(0, 11);
 if (dc[i] >= 4)
   dc[i] = (1 << dc[i] - 3) + get_bits(dc[i] - 3) + 2;
 if (dc[i] && get_bit())
   dc[i] = -dc[i];

AC decoding

 idx = 1;
 esc_len = 0;
 dst[0] = dc * 8 + 32;
 while (idx < 64) {
   if (esc_len-- <= 0) {
     skip = get_code(skip_cb);
     if (skip == 11)
       skip = get_bits(6);
     else if (skip == 13) {
       skip = 0;
       esc_len = 7;
     }
   }
   prefix = get_limited_unary(12, 0) + 1;
   if (prefix >= 4)
     level = (1 << (prefix - 3)) + get_bits(prefix - 3) + 2;
   else
     level = prefix;
   if (level && get_bit())
     level = -level;
   pos = zigzag[idx];
   dst[pos] = ((level * quant[q & 3][pos] << (q >> 2)) + 0x40) >> 7;
   idx++;
 }

Motion decoding

First, a bit flag is read to determine whether we'll decode one or four MVs, then MV components are decoded.

 mv[i] = get_vlc(mv_vlc);
 if (mv[i] == esc) { //escape
   bits = get_unary(12, 1) + 4;
   mv[i] = get_bits(bits) + (1 << bits) - 1;
   if (mv[i] & 1)
     mv[i] = -(mv[i] >> 1);
   else
     mv[i] = mv[i] >> 1;
 }

DSP algorithms

DCT

Floating-point IDCT:

 t00 =  src[2] + src[6];
 t01 = (src[2] - src[6]) * 1.4142135 - t00;
 t02 = src[0] + src[4];
 t03 = src[0] - src[4];
 t04 = src[3] + src[5];
 t05 = src[3] - src[5];
 t06 = src[1] + src[7];
 t07 = src[1] - src[7];
 t08 = t02 + t00;
 t09 = t02 - t00;
 t10 = t03 + t01;
 t11 = t03 - t01;
 t12 = t06 + t04;
 t13 = (t06 - t04) * 1.4142135;
 t14 = (t07 - t05) * 1.847759;
 t15 = t05 * 2.613126 + t14 - t12;
 t16 = t13 - t15;
 t17 = t07 * 1.0823922 - t14 + t16;
 
 dst[0] = t08 + t12;
 dst[1] = t10 + t15;
 dst[2] = t11 + t16;
 dst[3] = t09 - t17;
 dst[4] = t09 + t17;
 dst[5] = t11 - t16;
 dst[6] = t10 - t15;
 dst[7] = t08 - t12;

Fixed-point IDCT:

 #define idct_mul_a(val) (val + (val >> 2))
 #define idct_mul_b(val) (val >> 1)
 #define idct_mul_c(val) (val - (val >> 2) - (val >> 4))
 #define idct_mul_d(val) (val + (val >> 2) - (val >> 4))
 #define idct_mul_e(val) (val >> 2)
 tmp00 = src[3] + src[5];
 tmp01 = src[3] - src[5];
 tmp02 = idct_mul_a(src[2]) + idct_mul_b(src[6]);
 tmp03 = idct_mul_b(src[2]) - idct_mul_a(src[6]);
 tmp0 = (src[0] + src[4]) + tmp02;
 tmp1 = (src[0] + src[4]) - tmp02;
 tmp2 = src[0] - src[4];
 tmp3 = src[1] + tmp00;
 tmp4 = src[1] - tmp00;
 tmp5 = tmp01 + src[7];
 tmp6 = tmp01 - src[7];
 tmp7 = tmp4 + idct_mul_c(tmp6);
 tmp8 = idct_mul_c(tmp4) - tmp6;
 tmp9  = idct_mul_d(tmp3) + idct_mul_e(tmp5);
 tmp10 = idct_mul_e(tmp3) - idct_mul_d(tmp5);
 tmp11 = tmp2 + tmp03;
 tmp12 = tmp2 - tmp03;
 
 dst[0] = tmp0  + tmp9;
 dst[1] = tmp11 + tmp7;
 dst[2] = tmp12 + tmp8;
 dst[3] = tmp1  + tmp10;
 dst[4] = tmp1  - tmp10;
 dst[5] = tmp12 - tmp8;
 dst[6] = tmp11 - tmp7;
 dst[7] = tmp0  - tmp9;

For the second stage output should be shifted right by 6.

Luma motion compensation

 (A - 4*B + 19*C - 4*D + E + 1) >> 5

Chroma motion compensation

1/4:

 (6*A + 2*B + 1) >> 3

1/2:

 (A + B + 1) >> 1

Codebooks

KB2f

Quantiser absolute differences:

  0: 0x01, 1 bits
  1: 0x02, 2 bits
  2: 0x04, 3 bits
  3: 0x08, 4 bits
  4: 0x10, 7 bits
  5: 0x30, 7 bits
  6: 0x50, 7 bits
  7: 0x70, 7 bits
  8: 0x00, 8 bits
  9: 0x20, 8 bits
 10: 0x40, 8 bits
 11: 0x60, 8 bits
 12: 0x80, 8 bits
 13: 0xA0, 8 bits
 14: 0xC0, 8 bits
 15: 0xE0, 8 bits

For non-zero difference a sign bit is read afterwards.

AC values codebook 1:

  0:  0x04, 3 bits
  1:  0x01, 1 bits
  2:  0x02, 2 bits
  3:  0x00, 4 bits
  4:  0x08, 5 bits
  5:  0x18, 6 bits
  6:  0xF8, 8 bits
  7: 0x178, 9 bits
  8: 0x138, 9 bits
  9:  0x38, 9 bits
 10: 0x1B8, 9 bits
 11:  0x78, 9 bits
 12:  0xB8, 9 bits

AC values codebook 2:

  0:  0x0A, 6 bits
  1:  0x01, 1 bits
  2:  0x04, 3 bits
  3:  0x08, 4 bits
  4:  0x06, 3 bits
  5:  0x00, 4 bits
  6:  0x02, 4 bits
  7:  0x1A, 5 bits
  8:  0x2A, 7 bits
  9: 0x16A, 9 bits
 10: 0x1EA, 9 bits
 11:  0x6A, 9 bits
 12:  0xEA, 9 bits

AC zero run codebook 1:

  0:  0x00, 1 bits
  1:  0x01, 3 bits
  2:  0x0D, 4 bits
  3:  0x15, 5 bits
  4:  0x45, 7 bits
  5:  0x85, 8 bits
  6:  0xA5, 8 bits
  7: 0x165, 9 bits
  8:  0x65, 9 bits
  9: 0x1E5, 9 bits
 10:  0xE5, 9 bits
 11:  0x25, 8 bits
 12:  0x03, 2 bits
 13:  0x05, 8 bits

AC zero run codebook 2:

  0:  0x00, 1 bits
  1:  0x01, 3 bits
  2:  0x03, 4 bits
  3:  0x07, 4 bits
  4:  0x1F, 5 bits
  5:  0x1B, 7 bits
  6:  0x0F, 6 bits
  7:  0x2F, 6 bits
  8:  0x5B, 8 bits
  9:  0xDB, 9 bits
 10: 0x1DB, 9 bits
 11:  0x3B, 6 bits
 12:  0x05, 3 bits
 13:  0x0B, 5 bits

KB2g

AC zero run codebook 1:

  0: 0x01, 1 bits
  1: 0x04, 3 bits
  2: 0x00, 4 bits
  3: 0x08, 4 bits
  4: 0x02, 5 bits
  5: 0x32, 7 bits
  6: 0x0A, 5 bits
  7: 0x12, 6 bits
  8: 0x3A, 7 bits
  9: 0x7A, 8 bits
 10: 0xFA, 8 bits
 11: 0x72, 7 bits
 12: 0x06, 3 bits
 13: 0x1A, 6 bits

AC zero run codebook 2:

  0:  0x01, 1 bits
  1:  0x00, 3 bits
  2:  0x04, 4 bits
  3:  0x2C, 9 bits
  4:  0x6C, 9 bits
  5:  0x0C, 7 bits
  6:  0x4C, 7 bits
  7:  0xAC, 9 bits
  8:  0xEC, 8 bits
  9: 0x12C, 9 bits
 10: 0x16C, 9 bits
 11: 0x1AC, 9 bits
 12:  0x02, 2 bits
 13:  0x1C, 5 bits

Motion vector codebook:

 0: 0x01, 1 bits
 1: 0x06, 3 bits
 2: 0x0C, 5 bits
 3: 0x1C, 5 bits
 4: 0x18, 7 bits
 5: 0x38, 7 bits
 6: 0x58, 7 bits
 7: 0x78, 7 bits
-7: 0x68, 7 bits
-6: 0x48, 7 bits
-5: 0x28, 7 bits
-4: 0x08, 7 bits
-3: 0x14, 5 bits
-2: 0x04, 5 bits
-1: 0x02, 3 bits
esc: 0x00, 4 bits