Bink Video 2

From MultimediaWiki
Revision as of 12:23, 21 June 2014 by Kostya (talk | contribs)
Jump to navigation Jump to search

Bink Video 2 is a successor to Bink Video.

This iteration operates on 32x32 macroblock employing simplified 8x8 DCT.

Overall design

Bink2 codes frames in two slices

Bitstream format

Bink2 frame begins with two 4-byte little-endian words: the first one is some flags, the second one is offset to the second slice data. There are four possible macroblock types: intra, skip, motion only and motion plus residue. Keyframes contain obviously only intra blocks and don't code block types. Macroblock consists of 4x4 luma blocks, two 2x2 chroma blocks and optional 4x4 alpha blocks. DCs are coded separately before ACs.

Internally CBP is represented as two bitmasks - low bits are actual coded bits pattern, top bits select an alternative Huffman codes for ACs.

KB2f (and probably earlier)

 for each macroblock {
   unless keyframe get block type
   switch (block type) {
   case INTRA:
     intra_luma
     intra_chroma
     intra_chroma
     if (alpha)
       intra_alpha (the same coding as intra_luma)
     break;
   case MOTION:
     motion_data
     break;
   case INTER:
     motion_data
     inter_luma
     inter_chroma
     inter_chroma
     if (alpha)
       inter_alpha
     break;
   }
 }

Inter and intra coding are different only by quantisation matrix used and DC range (-1024..1023 instead of 0..2047).

All block types are coded as:

 CBP
 quantiser difference
 DCs
 AC blocks

Quantiser is coded as the difference to the previous macroblock quantiser with variable-length codes and optional sign bit. Each component has its own quantiser initially set to 8 (at the beginning of every line?).

Luma CBP coding

 11 - reuse previous CBP in full
 10 - reuse previous CBP low bits (no VLC selection)
 0  - decode base CBP by nibbles (0 - keep previous nibble, 1 - read new)

Then, unless it's full reuse of course, VLC part is decoded. For this you iterate by nibbles and if it has nonzero bits and either exactly one bit set or bit read from bitstream is one you read bits for VLC pattern corresponding to base CBP bits.

Chroma CBP coding

 11 - reuse previous CBP in full
 10 - reuse previous CBP low bits (no VLC selection)
 0  - read new CBP (4 bits)

VLC part decoding is the same as for luma CBP

DC coding

 dc_bits = get_bits(3);
 if (dc_bits == 7)
   dc_bits += get_bits(2);
 for (i = 0; i < num_dcs; i += 4) {
   for (j = 0; j < 4; j++)
     dc[i + j] = get_bits(dc_bits);
   for (j = 0; j < 4; j++)
     if (dc[i + j])
       if (get_bit())
         dc[i + j] = -dc[i + j];
 }

In case it's the first macroblock in the slice an addition start value is read with its size depending on dc_bits and quantiser.

DCs use median prediction in form min(max(A + B - C, min(A, B, C)), max(A, B, C)). Here is the table of prediction values used for each luma DC.

 [ 0] =    0,    0,    0
 [ 1] =  [0],    0,    0
 [ 2] =  [0],  [0],  [1]
 [ 3] =  [1],  [2],  [0]
 [ 4] =  [1],  [1],  [3]
 [ 5] =  [4],  [4],  [4]
 [ 6] =  [4],  [3],  [1]
 [ 7] =  [5],  [6],  [4]
 [ 8] =  [2],    0,  [3]
 [ 9] =  [3],  [8],  [2]
 [10] =  [8],  [8],  [9]
 [11] =  [9], [10],  [8]
 [12] =  [6],  [9],  [3]
 [13] =  [7], [12],  [6]
 [14] = [12], [11],  [9]
 [15] = [13], [14], [12]

AC coding

The coding is quite similar to any other DCT block coding with the only exception that skip value may indicate a run on 7 coded values.

 val_vlc  = (cbp & VLC_BIT) ? val_vlc2  : val_vlc1;
 skip_vlc = (cbp & VLC_BIT) ? skip_vlc2 : skip_vlc1;
 run = 0;
 idx = 1;
 do {
   val = get_vlc(val_vlc);
   if (val >= 4)
     val = (1 << val - 3) + get_bits(val - 3) + 2;
   if (val && get_bit())
     val = -val;
   block[scan[idx++]] = val;
   if (idx >= 64)
     break;
   run--;
   if (run <= 0) {
     skip = get_vlc(skip_vlc);
     switch (skip) {
     case 11:
       skip = get_bits(6);
       break;
     case 12:
       skip = 62;
       break;
     case 13:
       skip = 0;
       run  = 7;
       break;
     }
     idx += skip;
   }
 } while (idx < 64);
 

Motion data

It's the same as decoding 4 DCs twice: read size, read 4 values, read their signs, for the first macroblock in line read 5-bit start value with sign, repeat second time.

KB2g-KB2i

Bitstream format has changed somewhat but the design is the same.

 for each macroblock {
   unless keyframe get block type
   switch (block type) {
   case INTRA:
     quantiser
     intra_luma
     intra_chroma
     intra_chroma
     if (alpha)
       intra_alpha (the same coding as intra_luma)
     break;
   case MOTION:
     motion_data
     break;
   case INTER:
     motion_data
     quantiser
     inter_luma
     inter_chroma
     inter_chroma
     if (alpha)
       inter_alpha
     break;
   }
 }

Block type is now coded with truncated unary code that is index in block type list. Selected type is then moved one position to the front of the list then. Initial contents are { MOTION, INTER, SKIP, INTRA }

Quantiser is now applicable to the whole macroblock instead of components and its delta is coded in this way:

 dq = get_unary(0, 4);
 if (dq == 3)
   dq += get_bit();
 else if (dq == 4)
   dq += get_bits(5) + 1;
 if (dq && get_bit())
   dq = -dq;

Luma CBP decoding

 ones = ones_count(prev_cbp & 0xFFFF);
 if (ones >= 8) {
   ones = 16 - ones;
   mask = 0xFFFF;
 } else {
   mask = 0;
 }
 cbp = 0;
 if (!get_bit()) {
   if (ones > 3)
     cbp = get_bits(16);
   else
     for (i = 0; i < 16; i += 4)
       if (!get_bit())
         cbp |= get_bits(4) << i;
 }
 cbp ^= mask;
 if (get_bit())
   cbp |= cbp << 16; //VLC part

Chroma CBP decoding

 pattern[16] = { 0, 0, 0, 0xF, 0, 0xF, 0xF, 0xF, 0, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF };
 
 if (get_bit())
   cbp = (VLC part of prev_cbp) | pattern[prev_cbp & 0xF];
 else {
   cbp = get_bits(4);
   if (get_bit())
     VLC part = cbp;
 }

DC decoding

Now each element is decoded individually:

 dc[i] = get_unary(0, 11);
 if (dc[i] >= 4)
   dc[i] = (1 << dc[i] - 3) + get_bits(dc[i] - 3) + 2;
 if (dc[i] && get_bit())
   dc[i] = -dc[i];

AC decoding

This resembles AC decoding for the older version(s) but:

  • skip part is now decoded before value part
  • values are coded in the same way as DCs above

Motion decoding

First, a bit flag is read to determine whether we'll decode one or four MVs, then MV components are decoded.

 mv[i] = get_vlc(mv_vlc);
 if (mv[i] == 8) { //escape
   bits = get_unary(0, 12) + 4;
   mv[i] = get_bits(bits) + (1 << bits) - 1;
   if (mv[i] & 1)
     mv[i] = -(mv[i] >> 1);
   else
     mv[i] = mv[i] >> 1;
 }

DSP algorithms

DCT

**TODO**

Luma motion compensation

 (A - 4*B + 19*C - 4*D + E + 1) >> 5

Chroma motion compensation

1/4:

 (6*A + 2*B + 1) >> 3

1/2:

 (A + B + 1) >> 1