Midivid

From MultimediaWiki
Jump to navigation Jump to search

Midivid is a video codec developed by Jason Dorie of BlackBox Games (later acquired by Electronic Arts). It reportedly has a lot in common with MPEG-2, while claiming better compression ratios than MPEG-2.

MidiVid

First version of MidiVid codec uses LZSS compression, vector quantisation and simple hold-and-modify approach.

Frame header (all data is little-endian):

  • 32-bit frame size (can be not set)
  • 32-bit always zero?
  • 32-bit uncompressed flag (1 means the rest of frame is compressed with LZSS)

Frame data:

  • 16-bit number of vectors
  • 16-bit intra frame flag (i.e. no update mask present)
  • (inter only) 32-bit number of blocks to update
  • (inter only) update mask - one bit per each 4x4 block
  • vector data (each vector is 2x2 YUV block without any subsampling, so 12 bytes per each vector)
  • (for frames with more than 256 vector) top index bits
  • index bits (one byte per each 2x2 block)

Decoding process:

 num_vecs = get16le(src); src += 2;
 is_intra = get16le(src); src += 2;
 if (is_intra) {
   num_blocks = (w / 2) * (h / 2);
 } else {
   num_blocks = get32le(src); src += 2;
   update_mask = src;
   src += (w >> 5) * (h >> 2);
 }
 vec_data = src;
 src += num_vecs * 12;
 if (num_vecs <= 256)
   indices = src;
 else {
   idx9data = src;
   indices = idx9data + (num_blocks + 7) / 8;
 }
 
 idx9bits = 0;
 idx9val = 0;
 for (y = 0; y < h; y += 2) {
   for (x = 0; x < w; x += 2) {
     if (!is_intra && update mask bit set for x/4,y/4 block)
       continue;
     if (num_vecs <= 256) {
       idx = *indices++;
     } else {
       if (idx9bits == 0) {
         idx9val = *idx9data++;
         idx9bits = 8;
       }
       idx9bits--;
       idx = *indices++ | (((idx9val >> (7 - idx9bits)) & 1) << 8);
     }
     output vec[idx] as 2x2 block;
   }
 }

LZSS Algorithm

It's the straightforward implementation. First you have 16-bit flags and then literals or matches:

 for (;;) {
   op = src[0] | (src[1] << 8);
   src += 2;
   for (i = 0; i < 16; i++) {
     if no data left { return; }
     if (op & 1) {
       offset = ((src[0] & 0xF0) << 4) | src[1];
       length = (src[0] & 0xF) + 3;
       src += 2;
       for (j = 0; j < length; j++)
         dst[j] = dst[j - offset];
       dst += length;
     } else {
       *dst++ = *src++;
     }
     op >>= 1;
   }
 }

MidiVid Lossless

This codec combines several compression methods and each frame can have a different one. Known types (determined by 32-bit little-endian word at the start):

  • BWTZ - data is compressed with method borrowed from some BTW compressor;
  • BWTF - the same but data is split into several chunks, each preceded by 32-bit chunk size;
  • LZSS - data is compressed with the same LZSS scheme the original MidiVid codec uses.

Also the top bit may be set to indicate that next 32-bit word is frame size (otherwise frame data follows immediately).

BWT* Compression

Data header:

  • 32-bit decompressed size
  • 32-bit checksum
  • 16-bit compression flags

Compression flags meaning, LSB first:

  • bit 0 - RLE present
  • bits 1-3 - static preprocessing mode
  • bits 4-6 - table preprocessing mode
  • bit 7 - n-grams present
  • bit 8 - first value is stored as delta?
  • bit 15 - compressed data

Decoding order:

  • decode small values
  • decode (and add) large values
  • undo MTF
  • undo BWT
  • if bit 8 is set undo delta
  • if bit 7 is set expand n-grams
  • if bits 1-3 are set undo some preprocessing
  • if bits 4-6 are set undo table prediction
  • if bit 0 is set undo RLE

Data is split into two partitions: small values (0, 1 and 2 only) and large values (2-255).

Small Values Coding

Values are coded using classic 16-bit arithmetic coder and adaptive model using last three values for determining which context to use.

Model selection index:

 {{ 0,  1,  7 }, { 2,  3, 8 }, { 4, 10,  9 }},
 {{ 2, 12, 11 }, { 2,  3, 8 }, { 5,  5,  9 }},
 {{ 4,  7,  8 }, { 4, 10, 9 }, { 6,  6, 13 }}

Rescale thresholds:

 0x100, 0x60, 0x80, 0x80, 0x90, 0x90, 0x90,
 0x90, 0x90, 0x90, 0x90, 0x80, 0x80, 0x100

Large Values Coding

Large values coding uses the same arithmetic coder and symbol coding split into exponent (single model) and mantissa (several models dependent on exponent value), so actual decoding looks more like:

 for (i = 0; i < len; i++) {
   if (dst[i] == 2) {
     exp = ari_decode_model(exponent_model);
     if (exp) {
       val = ari_decode_model(mantissa_model[exp]);
       dst[i] += (1 << exp) + val - 1;
     }
   }
 }

Move-To-Front Preprocessing

The usual move-to-from preprocessing but with 1-2 coding extension often seen in BTW compressors (for preserving longer runs):

 val = *src++;
 lit = mtf_buf[val];
 *dst++ = lit;
 if (val < 2) {
   if (lastval && val) {
     mtf_buf[1] = mtf_buf[0];
     mtf_buf[0] = lit;
   }
 } else {
   move literal value to mtf_buf[1] while shifting all others down
 }
 lastval = val;

N-grams

Simple substitution for some input byte values into longer sequences

  • 0x7b -> 0x76 0x61
  • 0x7c -> 0x63 0x6e
  • 0x7d -> 0x6b 0x61 0x77
  • 0x7e -> 0x61 0x70 0x68
  • 0x7f -> 0x6a 0x64 0x6c
  • 0x80 -> 0x79 0x64 0x65
  • 0x81 -> 0x75 0x6b 0x62

Fixed Preprocessing

  • mode 0 - no preprocessing
  • mode 1 - replace some values using fixed translation table
  • mode 3 - reverse whole buffer
  • mode 4 - reverse whole buffer and replace 32-bit absolute after 0xE8 value with an offset

Table Prediction

  • mode 0 - no preprocessing
  • mode 1 - restore values from deltas to the previous byte
  • mode 2 - restore values from deltas to the previous byte interlaced by two
  • mode 3 - restore values from deltas to the previous byte interlaced by three
  • mode 4 - restore values from deltas to the previous 16-bit word
  • mode 5 - restore values from deltas to the previous 16-bit word interlaced by two


RLE

RLE is triggered after a run of eight identical values and codes run length in some form to resemble that symbol.

Checksum Calculation

 crc = 0xFFFFFFFF;
 for (i = 0; i < size; i++)
   crc = (crc >> 8) ^ (((crc & 0xFF) ^ *src++) << 24);

MidiVid 3

This is a DCT-based codec with full-pixel motion compensation.

Frame header:

  • 8 bits - intra blocks quantiser
  • 8 bits - inter blocks quantiser (stored as difference to intra quantiser)
  • 16 bits - inter frame flag
  • 16 bits - block modes size
  • 16 bits (inter only) - number of motion vectors

Frame data is split into several parts:

  • (inter only) inter macroblock mask
  • block modes
  • (inter only) motion vectors (packed the same way as block coefficients; those vectors are in pixel precision)
  • macroblock row data

Inter macroblock mask is stored as two flags per each macroblock packed into byte with low nibble (LSB first) having inter MB flags (i.e. if bit is set the macroblock is inter) and top nibble having residue not coded flags (i.e. if the bit is set whole macroblock is motion compensated without any residue coded).

Block modes are two-bit values packed into bytes LSB first. Their meaning is explained below.

Macroblock row data consists of 16-bit number of coefficients coded, 8-bit number of macroblocks coded in this row (inter-only) and coded coefficients array. DCs for each block (except for fill/skip blocks) are stored as the difference to the previous DC of the block in this component.

Intra modes:

  • 0 - fill 8x8 block with 128
  • 1 - put DC value
  • 2 - do reduced DCT on block with only top left 2x2 block filled
  • 3 - do full DCT

Inter modes (when change mask is set):

  • 0 - do just motion compensation
  • 1 - do MC and add DC value
  • 2 - do MC and add 2x2 IDCT block
  • 3 - do MC and add full IDCT block

Values Coding

All values are coded using static codebook:

 00
 01
 100
 101
 110
 1110
 11110
 111110
 1111110
 11111110
 111111110
 111111111

Zero value serves as a signal for a zero run, so actual codes are

 000xxx     -> x + 1 zeroes
 001xxxxxx  -> x + 9 zeroes

Values 1-9 are followed by sign bit and 0-8 bits of mantissa, so actual value is sign * ((1 << (cb_val - 1)) + get_bits(cb_val - 1)). Last value is used as an escape.

IDCT

 t0 = c[0] + c[4];
 t1 = c[0] - c[4];
 t2 = c[2] + c[6];
 t3 = (((c[2] - c[6]) * 362) >> 8) - t2;
 t4 = t0 + t2;
 t5 = t0 - t2;
 t6 = t1 + t3;
 t7 = t1 - t3;
 t8 = src[5] + src[3];
 t9 = src[5] - src[3];
 tA = src[1] + src[7];
 tB = src[1] - src[7];
 tC = t8 + tA;
 tD = (tB + t9) * 473 >> 8;
 tE = ((t9 * -669 >> 8) - tC) + tD;
 tF = ((tA - t8) * 362 >> 8) - tE;
 t10 = ((tB * 277 >> 8) - tD) + tF;
 dst[0] = t4 + tC;
 dst[1] = t6 + tE;
 dst[2] = t7 + tF;
 dst[3] = t5 - t10;
 dst[4] = t5 + t10;
 dst[5] = t7 - tF;
 dst[6] = t6 - tE;
 dst[7] = t4 - tC;

Quantisation

Luma base:

   12, 12, 15, 19, 25, 34, 40, 48,
   12, 12, 18, 22, 27, 44, 47, 46,
   17, 18, 21, 26, 35, 46, 52, 47,
   18, 20, 24, 28, 40, 61, 59, 51,
   20, 24, 32, 43, 50, 72, 72, 63,
   25, 31, 42, 48, 58, 72, 81, 75,
   38, 46, 54, 61, 71, 84, 88, 85,
   50, 61, 65, 68, 79, 78, 86, 91

Chroma base:

   12, 16, 24, 47, 99, 99, 99, 99,
   16, 21, 26, 66, 99, 99, 99, 99,
   24, 26, 56, 99, 99, 99, 99, 99,
   47, 66, 99, 99, 99, 99, 99, 99,
   99, 99, 99, 99, 99, 99, 99, 99,
   99, 99, 99, 99, 99, 99, 99, 99,
   99, 99, 99, 99, 99, 99, 99, 99,
   99, 99, 99, 99, 99, 99, 99, 99

Common:

   16384, 22725, 21407, 19266, 16384, 12873,  8867,  4520,
   22725, 31521, 29692, 26722, 22725, 17855, 12299,  6270,
   21407, 29692, 27969, 25172, 21407, 16819, 11585,  5906,
   19266, 26722, 25172, 22654, 19266, 15137, 10426,  5315,
   16384, 22725, 21407, 19266, 16384, 12873,  8867,  4520,
   12873, 17855, 16819, 15137, 12873, 10114,  6967,  3552,
    8867, 12299, 11585, 10426,  8867,  6967,  4799,  2446,
    4520,  6270,  5906,  5315,  4520,  3552,  2446,  1247

Quantisation matrix calculation:

 quant = quality < 50 ? 5000 / MAX(quality, 1) : 200 - MIN(quality, 100) * 2;
 qmat_{inter/intra}_{y/c}[i] = (clip(((base_qmat_{y/c}[i] * quant + 50) / 100), 1, 0x7FFF) * common_qmat[i] + 0x800) >> 12;

MidiVid 4.x

This is a 8x8 DCT-based codec that codes planes independently and has no motion compensation beside simple "update current block" mode.

The frame data is organised as following:

  • offsets to various parts of frame data;
  • skip flags (inter frame only);
  • coefficients types for luma;
  • coefficients for luma;
  • coefficient types for chroma;
  • coefficients for chroma.

Skip flags tell whether this block in luma plane should be intra or inter (chroma blocks in inter frames are always inter).

Coefficient types are Huffman-coded types telling which kind of decoding should be applied to coefficients. First there's code definition stored and then the actual data. All coefficients are stored as continuous stream, some opcodes are used to signal zero runs (either fixed size or with additional bits to read) and how to read non-zero values. E.g. mode 1 means 24 + get_bits(2) * 2) zeroes, mode 5 means three zeroes, mode 15 means +-6, mode 18 means sign bit plus 13 + get_bits(3) value. This is similar to On2 VPx DCT token stream.

After decoding coefficients are dequantised, transformed with 8x8 integer IDCT and put/added to the destination frame.

MidiVid Archival

This is a simple codec based on Huffman coding or deflate plus median prediction.

Frame format:

 4 bytes - compression type (HUFY - Huffman coding, LZVY - deflate)
 4 bytes - source size
 data

For LZVY it's deflated data, for Huffman-coded it contains:

 3 bytes - decompressed size
 1 byte  - start symbol
 1 byte  - number of symbols minus one
 tree weights

Weights can be zero for non-present symbols and coded as either 1 plus 12 bits or 0 plus 3 bits.

Decompressed data is in YUV420 format, Y plane data first, then U plane data and V plane data.

Games Using Midivid