Midivid
- FourCCs: MVDV, MVLZ, MV30, MV43, MVHA
- Company: BlackBox Games
- Website: http://midivid.com/
- Samples:
Midivid is a video codec developed by Jason Dorie of BlackBox Games (later acquired by Electronic Arts). It reportedly has a lot in common with MPEG-2, while claiming better compression ratios than MPEG-2.
MidiVid
First version of MidiVid codec uses LZSS compression, vector quantisation and simple hold-and-modify approach.
Frame header (all data is little-endian):
- 32-bit frame size (can be not set)
- 32-bit always zero?
- 32-bit uncompressed flag (1 means the rest of frame is compressed with LZSS)
Frame data:
- 16-bit number of vectors
- 16-bit intra frame flag (i.e. no update mask present)
- (inter only) 32-bit number of blocks to update
- (inter only) update mask - one bit per each 4x4 block
- vector data (each vector is 2x2 YUV block without any subsampling, so 12 bytes per each vector)
- (for frames with more than 256 vector) top index bits
- index bits (one byte per each 2x2 block)
Decoding process:
num_vecs = get16le(src); src += 2; is_intra = get16le(src); src += 2; if (is_intra) { num_blocks = (w / 2) * (h / 2); } else { num_blocks = get32le(src); src += 2; update_mask = src; src += (w >> 5) * (h >> 2); } vec_data = src; src += num_vecs * 12; if (num_vecs <= 256) indices = src; else { idx9data = src; indices = idx9data + (num_blocks + 7) / 8; } idx9bits = 0; idx9val = 0; for (y = 0; y < h; y += 2) { for (x = 0; x < w; x += 2) { if (!is_intra && update mask bit set for x/4,y/4 block) continue; if (num_vecs <= 256) { idx = *indices++; } else { if (idx9bits == 0) { idx9val = *idx9data++; idx9bits = 8; } idx9bits--; idx = *indices++ | (((idx9val >> (7 - idx9bits)) & 1) << 8); } output vec[idx] as 2x2 block; } }
LZSS Algorithm
It's the straightforward implementation. First you have 16-bit flags and then literals or matches:
for (;;) { op = src[0] | (src[1] << 8); src += 2; for (i = 0; i < 16; i++) { if no data left { return; } if (op & 1) { offset = ((src[0] & 0xF0) << 4) | src[1]; length = (src[0] & 0xF) + 3; src += 2; for (j = 0; j < length; j++) dst[j] = dst[j - offset]; dst += length; } else { *dst++ = *src++; } op >>= 1; } }
MidiVid Lossless
This codec combines several compression methods and each frame can have a different one. Known types (determined by 32-bit little-endian word at the start):
BWTZ
- data is compressed with method borrowed from some BTW compressor;BWTF
- the same but data is split into several chunks, each preceded by 32-bit chunk size;LZSS
- data is compressed with the same LZSS scheme the original MidiVid codec uses.
Also the top bit may be set to indicate that next 32-bit word is frame size (otherwise frame data follows immediately).
BWT* Compression
Data header:
- 32-bit decompressed size
- 32-bit checksum
- 16-bit compression flags
Compression flags meaning, LSB first:
- bit 0 - RLE present
- bits 1-3 - static preprocessing mode
- bits 4-6 - table preprocessing mode
- bit 7 - n-grams present
- bit 8 - first value is stored as delta?
- bit 15 - compressed data
Decoding order:
- decode small values
- decode (and add) large values
- undo MTF
- undo BWT
- if bit 8 is set undo delta
- if bit 7 is set expand n-grams
- if bits 1-3 are set undo some preprocessing
- if bits 4-6 are set undo table prediction
- if bit 0 is set undo RLE
Data is split into two partitions: small values (0, 1 and 2 only) and large values (2-255).
Small Values Coding
Values are coded using classic 16-bit arithmetic coder and adaptive model using last three values for determining which context to use.
Model selection index:
{{ 0, 1, 7 }, { 2, 3, 8 }, { 4, 10, 9 }}, {{ 2, 12, 11 }, { 2, 3, 8 }, { 5, 5, 9 }}, {{ 4, 7, 8 }, { 4, 10, 9 }, { 6, 6, 13 }}
Rescale thresholds:
0x100, 0x60, 0x80, 0x80, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x80, 0x80, 0x100
Large Values Coding
Large values coding uses the same arithmetic coder and symbol coding split into exponent (single model) and mantissa (several models dependent on exponent value), so actual decoding looks more like:
for (i = 0; i < len; i++) { if (dst[i] == 2) { exp = ari_decode_model(exponent_model); if (exp) { val = ari_decode_model(mantissa_model[exp]); dst[i] += (1 << exp) + val - 1; } } }
Move-To-Front Preprocessing
The usual move-to-from preprocessing but with 1-2 coding extension often seen in BTW compressors (for preserving longer runs):
val = *src++; lit = mtf_buf[val]; *dst++ = lit; if (val < 2) { if (lastval && val) { mtf_buf[1] = mtf_buf[0]; mtf_buf[0] = lit; } } else { move literal value to mtf_buf[1] while shifting all others down } lastval = val;
N-grams
Simple substitution for some input byte values into longer sequences
- 0x7b -> 0x76 0x61
- 0x7c -> 0x63 0x6e
- 0x7d -> 0x6b 0x61 0x77
- 0x7e -> 0x61 0x70 0x68
- 0x7f -> 0x6a 0x64 0x6c
- 0x80 -> 0x79 0x64 0x65
- 0x81 -> 0x75 0x6b 0x62
Fixed Preprocessing
- mode 0 - no preprocessing
- mode 1 - replace some values using fixed translation table
- mode 3 - reverse whole buffer
- mode 4 - reverse whole buffer and replace 32-bit absolute after
0xE8
value with an offset
Table Prediction
- mode 0 - no preprocessing
- mode 1 - restore values from deltas to the previous byte
- mode 2 - restore values from deltas to the previous byte interlaced by two
- mode 3 - restore values from deltas to the previous byte interlaced by three
- mode 4 - restore values from deltas to the previous 16-bit word
- mode 5 - restore values from deltas to the previous 16-bit word interlaced by two
RLE
RLE is triggered after a run of eight identical values and codes run length in some form to resemble that symbol.
Checksum Calculation
crc = 0xFFFFFFFF; for (i = 0; i < size; i++) crc = (crc >> 8) ^ (((crc & 0xFF) ^ *src++) << 24);
MidiVid 3
This is a DCT-based codec with full-pixel motion compensation.
Frame header:
- 8 bits - intra blocks quantiser
- 8 bits - inter blocks quantiser (stored as difference to intra quantiser)
- 16 bits - inter frame flag
- 16 bits - block modes size
- 16 bits (inter only) - number of motion vectors
Frame data is split into several parts:
- (inter only) inter macroblock mask
- block modes
- (inter only) motion vectors (packed the same way as block coefficients; those vectors are in pixel precision)
- macroblock row data
Inter macroblock mask is stored as two flags per each macroblock packed into byte with low nibble (LSB first) having inter MB flags (i.e. if bit is set the macroblock is inter) and top nibble having residue not coded flags (i.e. if the bit is set whole macroblock is motion compensated without any residue coded).
Block modes are two-bit values packed into bytes LSB first. Their meaning is explained below.
Macroblock row data consists of 16-bit number of coefficients coded, 8-bit number of macroblocks coded in this row (inter-only) and coded coefficients array. DCs for each block (except for fill/skip blocks) are stored as the difference to the previous DC of the block in this component.
Intra modes:
- 0 - fill 8x8 block with 128
- 1 - put DC value
- 2 - do reduced DCT on block with only top left 2x2 block filled
- 3 - do full DCT
Inter modes (when change mask is set):
- 0 - do just motion compensation
- 1 - do MC and add DC value
- 2 - do MC and add 2x2 IDCT block
- 3 - do MC and add full IDCT block
Values Coding
All values are coded using static codebook:
00 01 100 101 110 1110 11110 111110 1111110 11111110 111111110 111111111
Zero value serves as a signal for a zero run, so actual codes are
000xxx -> x + 1 zeroes 001xxxxxx -> x + 9 zeroes
Values 1-9 are followed by sign bit and 0-8 bits of mantissa, so actual value is sign * ((1 << (cb_val - 1)) + get_bits(cb_val - 1))
. Last value is used as an escape.
IDCT
t0 = c[0] + c[4]; t1 = c[0] - c[4]; t2 = c[2] + c[6]; t3 = (((c[2] - c[6]) * 362) >> 8) - t2; t4 = t0 + t2; t5 = t0 - t2; t6 = t1 + t3; t7 = t1 - t3; t8 = src[5] + src[3]; t9 = src[5] - src[3]; tA = src[1] + src[7]; tB = src[1] - src[7]; tC = t8 + tA; tD = (tB + t9) * 473 >> 8; tE = ((t9 * -669 >> 8) - tC) + tD; tF = ((tA - t8) * 362 >> 8) - tE; t10 = ((tB * 277 >> 8) - tD) + tF; dst[0] = t4 + tC; dst[1] = t6 + tE; dst[2] = t7 + tF; dst[3] = t5 - t10; dst[4] = t5 + t10; dst[5] = t7 - tF; dst[6] = t6 - tE; dst[7] = t4 - tC;
Quantisation
Luma base:
12, 12, 15, 19, 25, 34, 40, 48, 12, 12, 18, 22, 27, 44, 47, 46, 17, 18, 21, 26, 35, 46, 52, 47, 18, 20, 24, 28, 40, 61, 59, 51, 20, 24, 32, 43, 50, 72, 72, 63, 25, 31, 42, 48, 58, 72, 81, 75, 38, 46, 54, 61, 71, 84, 88, 85, 50, 61, 65, 68, 79, 78, 86, 91
Chroma base:
12, 16, 24, 47, 99, 99, 99, 99, 16, 21, 26, 66, 99, 99, 99, 99, 24, 26, 56, 99, 99, 99, 99, 99, 47, 66, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99
Common:
16384, 22725, 21407, 19266, 16384, 12873, 8867, 4520, 22725, 31521, 29692, 26722, 22725, 17855, 12299, 6270, 21407, 29692, 27969, 25172, 21407, 16819, 11585, 5906, 19266, 26722, 25172, 22654, 19266, 15137, 10426, 5315, 16384, 22725, 21407, 19266, 16384, 12873, 8867, 4520, 12873, 17855, 16819, 15137, 12873, 10114, 6967, 3552, 8867, 12299, 11585, 10426, 8867, 6967, 4799, 2446, 4520, 6270, 5906, 5315, 4520, 3552, 2446, 1247
Quantisation matrix calculation:
quant = quality < 50 ? 5000 / MAX(quality, 1) : 200 - MIN(quality, 100) * 2; qmat_{inter/intra}_{y/c}[i] = (clip(((base_qmat_{y/c}[i] * quant + 50) / 100), 1, 0x7FFF) * common_qmat[i] + 0x800) >> 12;
MidiVid 4.x
This is a 8x8 DCT-based codec that codes planes independently and has no motion compensation beside simple "update current block" mode.
The frame data is organised as following:
- offsets to various parts of frame data;
- skip flags (inter frame only);
- coefficients types for luma;
- coefficients for luma;
- coefficient types for chroma;
- coefficients for chroma.
Skip flags tell whether this block in luma plane should be intra or inter (chroma blocks in inter frames are always inter).
Coefficient types are Huffman-coded types telling which kind of decoding should be applied to coefficients. First there's code definition stored and then the actual data. All coefficients are stored as continuous stream, some opcodes are used to signal zero runs (either fixed size or with additional bits to read) and how to read non-zero values. E.g. mode 1 means 24 + get_bits(2) * 2)
zeroes, mode 5 means three zeroes, mode 15 means +-6, mode 18 means sign bit plus 13 + get_bits(3)
value. This is similar to On2 VPx DCT token stream.
After decoding coefficients are dequantised, transformed with 8x8 integer IDCT and put/added to the destination frame.
MidiVid Archival
This is a simple codec based on Huffman coding or deflate plus median prediction.
Frame format:
4 bytes - compression type (HUFY - Huffman coding, LZVY - deflate) 4 bytes - source size data
For LZVY
it's deflated data, for Huffman-coded it contains:
3 bytes - decompressed size 1 byte - start symbol 1 byte - number of symbols minus one tree weights
Weights can be zero for non-present symbols and coded as either 1
plus 12 bits or 0
plus 3 bits.
Decompressed data is in YUV420 format, Y plane data first, then U plane data and V plane data.