Midivid: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
Line 108: Line 108:
Compression flags meaning, LSB first:
Compression flags meaning, LSB first:
* bit 0    - RLE present
* bit 0    - RLE present
* bits 1-3 -  
* bits 1-3 - static preprocessing mode
* bits 4-6 -
* bits 4-6 - table preprocessing mode
* bit 7    - n-grams present
* bit 7    - n-grams present
* bit 8    - first value is stored as delta?
* bit 8    - first value is stored as delta?
Line 118: Line 118:
* decode (and add) large values
* decode (and add) large values
* undo MTF
* undo MTF
* undo distance coding(?)
* undo BWT
* if bit 8 is set undo delta
* if bit 8 is set undo delta
* if bit 7 is set expand n-grams
* if bit 7 is set expand n-grams
Line 173: Line 173:
   lastval = val;
   lastval = val;


==== Distance Coding (?) ====
==== N-grams ====
==== N-grams ====
Simple substitution for some input byte values into longer sequences
* 0x7b -> 0x76 0x61
* 0x7c -> 0x63 0x6e
* 0x7d -> 0x6b 0x61 0x77
* 0x7e -> 0x61 0x70 0x68
* 0x7f -> 0x6a 0x64 0x6c
* 0x80 -> 0x79 0x64 0x65
* 0x81 -> 0x75 0x6b 0x62
==== Fixed Preprocessing ====
==== Fixed Preprocessing ====
* mode 0 - no preprocessing
* mode 1 - replace some values using fixed translation table
* mode 3 - reverse whole buffer
* mode 4 - reverse whole buffer and replace 32-bit absolute after <code>0xE8</code> value with an offset
==== Table Prediction ====
==== Table Prediction ====
* mode 0 - no preprocessing
* mode 1 - restore values from deltas to the previous byte
* mode 2 - restore values from deltas to the previous byte interlaced by two
* mode 3 - restore values from deltas to the previous byte interlaced by three
* mode 4 - restore values from deltas to the previous 16-bit word
* mode 5 - restore values from deltas to the previous 16-bit word interlaced by two
==== RLE ====
==== RLE ====
RLE is triggered after a run of eight identical values and codes run length in some form to resemble that symbol.


==== Checksum Calculation ====
==== Checksum Calculation ====

Revision as of 05:30, 1 October 2019

Midivid is a video codec developed by Jason Dorie of BlackBox Games (later acquired by Electronic Arts). It reportedly has a lot in common with MPEG-2, while claiming better compression ratios than MPEG-2.

MidiVid

First version of MidiVid codec uses LZSS compression, vector quantisation and simple hold-and-modify approach.

Frame header (all data is little-endian):

  • 32-bit frame size (can be not set)
  • 32-bit always zero?
  • 32-bit uncompressed flag (1 means the rest of frame is compressed with LZSS)

Frame data:

  • 16-bit number of vectors
  • 16-bit intra frame flag (i.e. no update mask present)
  • (inter only) 32-bit number of blocks to update
  • (inter only) update mask - one bit per each 4x4 block
  • vector data (each vector is 2x2 YUV block without any subsampling, so 12 bytes per each vector)
  • (for frames with more than 256 vector) top index bits
  • index bits (one byte per each 2x2 block)

Decoding process:

 num_vecs = get16le(src); src += 2;
 is_intra = get16le(src); src += 2;
 if (is_intra) {
   num_blocks = (w / 2) * (h / 2);
 } else {
   num_blocks = get32le(src); src += 2;
   update_mask = src;
   src += (w >> 5) * (h >> 2);
 }
 vec_data = src;
 src += num_vecs * 12;
 if (num_vecs <= 256)
   indices = src;
 else {
   idx9data = src;
   indices = idx9data + (num_blocks + 7) / 8;
 }
 
 idx9bits = 0;
 idx9val = 0;
 for (y = 0; y < h; y += 2) {
   for (x = 0; x < w; x += 2) {
     if (!is_intra && update mask bit set for x/4,y/4 block)
       continue;
     if (num_vecs <= 256) {
       idx = *indices++;
     } else {
       if (idx9bits == 0) {
         idx9val = *idx9data++;
         idx9bits = 8;
       }
       idx9bits--;
       idx = *indices++ | (((idx9val >> (7 - idx9bits)) & 1) << 8);
     }
     output vec[idx] as 2x2 block;
   }
 }

LZSS Algorithm

It's the straightforward implementation. First you have 16-bit flags and then literals or matches:

 for (;;) {
   op = src[0] | (src[1] << 8);
   src += 2;
   for (i = 0; i < 16; i++) {
     if no data left { return; }
     if (op & 1) {
       offset = ((src[0] & 0xF0) << 4) | src[1];
       length = (src[0] & 0xF) + 3;
       src += 2;
       for (j = 0; j < length; j++)
         dst[j] = dst[j - offset];
       dst += length;
     } else {
       *dst++ = *src++;
     }
     op >>= 1;
   }
 }

MidiVid Lossless

This codec combines several compression methods and each frame can have a different one. Known types (determined by 32-bit little-endian word at the start):

  • BWTZ - data is compressed with method borrowed from some BTW compressor;
  • BWTF - the same but data is split into several chunks, each preceded by 32-bit chunk size;
  • LZSS - data is compressed with the same LZSS scheme the original MidiVid codec uses.

Also the top bit may be set to indicate that next 32-bit word is frame size (otherwise frame data follows immediately).

BWT* Compression

Data header:

  • 32-bit decompressed size
  • 32-bit checksum
  • 16-bit compression flags

Compression flags meaning, LSB first:

  • bit 0 - RLE present
  • bits 1-3 - static preprocessing mode
  • bits 4-6 - table preprocessing mode
  • bit 7 - n-grams present
  • bit 8 - first value is stored as delta?
  • bit 15 - compressed data

Decoding order:

  • decode small values
  • decode (and add) large values
  • undo MTF
  • undo BWT
  • if bit 8 is set undo delta
  • if bit 7 is set expand n-grams
  • if bits 1-3 are set undo some preprocessing
  • if bits 4-6 are set undo table prediction
  • if bit 0 is set undo RLE

Data is split into two partitions: small values (0, 1 and 2 only) and large values (2-255).

Small Values Coding

Values are coded using classic 16-bit arithmetic coder and adaptive model using last three values for determining which context to use.

Model selection index:

 {{ 0,  1,  7 }, { 2,  3, 8 }, { 4, 10,  9 }},
 {{ 2, 12, 11 }, { 2,  3, 8 }, { 5,  5,  9 }},
 {{ 4,  7,  8 }, { 4, 10, 9 }, { 6,  6, 13 }}

Rescale thresholds:

 0x100, 0x60, 0x80, 0x80, 0x90, 0x90, 0x90,
 0x90, 0x90, 0x90, 0x90, 0x80, 0x80, 0x100

Large Values Coding

Large values coding uses the same arithmetic coder and symbol coding split into exponent (single model) and mantissa (several models dependent on exponent value), so actual decoding looks more like:

 for (i = 0; i < len; i++) {
   if (dst[i] == 2) {
     exp = ari_decode_model(exponent_model);
     if (exp) {
       val = ari_decode_model(mantissa_model[exp]);
       dst[i] += (1 << exp) + val - 1;
     }
   }
 }

Move-To-Front Preprocessing

The usual move-to-from preprocessing but with 1-2 coding extension often seen in BTW compressors (for preserving longer runs):

 val = *src++;
 lit = mtf_buf[val];
 *dst++ = lit;
 if (val < 2) {
   if (lastval && val) {
     mtf_buf[1] = mtf_buf[0];
     mtf_buf[0] = lit;
   }
 } else {
   move literal value to mtf_buf[1] while shifting all others down
 }
 lastval = val;

N-grams

Simple substitution for some input byte values into longer sequences

  • 0x7b -> 0x76 0x61
  • 0x7c -> 0x63 0x6e
  • 0x7d -> 0x6b 0x61 0x77
  • 0x7e -> 0x61 0x70 0x68
  • 0x7f -> 0x6a 0x64 0x6c
  • 0x80 -> 0x79 0x64 0x65
  • 0x81 -> 0x75 0x6b 0x62

Fixed Preprocessing

  • mode 0 - no preprocessing
  • mode 1 - replace some values using fixed translation table
  • mode 3 - reverse whole buffer
  • mode 4 - reverse whole buffer and replace 32-bit absolute after 0xE8 value with an offset

Table Prediction

  • mode 0 - no preprocessing
  • mode 1 - restore values from deltas to the previous byte
  • mode 2 - restore values from deltas to the previous byte interlaced by two
  • mode 3 - restore values from deltas to the previous byte interlaced by three
  • mode 4 - restore values from deltas to the previous 16-bit word
  • mode 5 - restore values from deltas to the previous 16-bit word interlaced by two


RLE

RLE is triggered after a run of eight identical values and codes run length in some form to resemble that symbol.

Checksum Calculation

 crc = 0xFFFFFFFF;
 for (i = 0; i < size; i++)
   crc = (crc >> 8) ^ (((crc & 0xFF) ^ *src++) << 24);

MidiVid 3

Games Using Midivid