VGM Video

From MultimediaWiki
Jump to navigation Jump to search

Java player: http://www.ila-ila.com/xvd-hist/sites/lab1454/eng/products/jpl_dm2.htm

There are several codecs in this family:

  • VT
  • Domen
  • VT2k aka BigBits
  • V2K-II
  • XVD

VT

First codec, uses variable-length codes to code macroblock types and either frame consisting of 8x8 IDCT blocks (coded in triplets - by component) or frame consisting of quantised deltas.

Domen

This codec is the real base for all subsequent video codecs as it has most of the features present in later codecs. It uses codebooks and so-called RL-coding, a binary run length coding. In this mode a variable with the run length of bit value is read using static codebook and then used either to decode a whole bitplane or a flag for current (macro)block. When the decoding process gets to the end of the run, next length is read and bit value is flipped.

Frame format:

4 bits - quality (used to derive quantiser, codebooks to use and maximum number of coefficients in the block)
1 bit  - intra frame flag
RL-coded bitplane for macroblock flags
if not intra frame {
 1 bit - has global MV
 if has global MV {
  4 bits signed - global MV x + 15
  4 bits signed - global MV y + 15
 }
 1 bit - initial value for MV present RL map
 for each 8x8 block {
  if RL bit is set {
   decode MV value symbol using codebook
   if (sym != 420) {
    blk.mv.x = sym / 29 - 14;
    blk.mv.y = sym % 29 - 14;
   } else {
    blk.mv.x = get_bits(6) - 31;
    blk.mv.y = get_bits(6) - 31;
   }
  }
 }
}

RL-coded map for coded block flags
2 bits - codebook index
Y DCs
2 bits - codebook index
Y ACs 1-2
Y ACs 3-
UV DCs
U ACs
V ACs

Luma coefficients 0-2 are coded in macroblocks, so for each macroblock and each coefficient first a pattern is decoded (using codebook) that tells which of four blocks have this coefficient coded and then (if coded) sign and actual coefficient value are coded as well.

Luma coefficients 3-63 are coded using RL-coding to tell which block has coefficient coded.

Chroma coefficients are coded in the similar way but using simpler coding: unary prefix for small codes and fixed-width bitfield for large codes.

V2K

This version of the codec dropped RL-coding and started to use arithmetic coder with single static model per data type for coding frame data.

First frame presumably contains Run-Level map consisting of 49 entries in the following format:

  • 1 bit - end-of-block flag
  • 4 bits - run value
  • 4 bits - level value

Frame format:

8 bits - flags
if (flags & 4) {
 8 bits - fade speed
}
if (flags & 1) {
 skip frame - fade the previous frame if needed, do nothing else
}
8 bits  - quantiser
if (flags & 0x10) {
 8 bits - altquant difference
 if (flags & 0x20) {
  read bit plane with flags telling which quantiser macroblock should use
 }
}
if (flags & 2) { // intra frame
} else { // inter frame
 if (flags & 8) {
  8 bits signed - global MV x
  8 bits signed - global MV y
 }
 if (!(flags & 0x40)) {
  read MB intra flags bitplane
 }
 for all macroblocks {
  if not intra MB {
   decode compound MV value using arithmetic coder
   mv.x = val / (radius * 4 + 1) - radius * 2;
   mv.y = val / (radius * 4 + 1) - radius * 2;
   (radius is coded in the codec extradata and most likely it is 1)
  }
 }
}
decode Y plane blocks
decode U plane blocks
decode V plane blocks

Block plane decoding:

for each 8x8 block {
 coded = ac_get_sym(2);
 if block is intra {
  blk[0] = ac_get_sym(256);
  idx = 1;
 } else {
  idx = 0;
 }
 if (coded) {
  while (idx < 64) {
   sym = ac_get_mdl(COEF_MODEL);
   if (sym < 49) {
    level = RL_MAP[sym].level;
    run   = RL_MAP[sym].run;
    eob   = RL_MAP[sym].eob;
    if (ac_get_sym(2))
     level = -level;
   } else {
    level = ac_get_sym(254) - 127;
    if (level >= 0)
     level++;
    run = ac_get_sym(64);
    eob = ac_get_sym(2);
   }
   if (level > 0)
    level *= quant * 2 + 1;
   else
    level *= quant * 2 - 1;
   idx += run;
   blk[zigzag[idx++]] = level;
  }
 }
}

V2K-II

This version of the codec adds wavelet coding as an alternative coding mode for intra-frames and uses context-adaptive arithmetic coding (i.e. usually top, left and top-left elements are used to select a model for decoding and then the same values along with the new decoded value are used to derive the actual output value). Alternatively codebooks can be used to code coefficients.

Frame format:

16 bits - flags
if (flags & 8) {
 8 bits - fade speed
}
if (flags & 1) {
 skip frame - fade the previous frame if needed, do nothing else
}
if (flags & 0x20) {
 4 bits - unknown
 4 bits - unknown
}
8 bits  - quantiser
if (flags & 0x100) {
 8 bits - altquant difference
 if (flags & 0x200) {
  read bit plane with flags telling which quantiser macroblock should use
 }
}
if (flags & 2) { // intra frame
 reset state
} else {
 if (flags & 0x40) {
  8 bits - number of default MVs (0-3)
  if (num_def_mv == 0) {
   8 bits signed - mv_x
   8 bits signed - mv_y
  } else {
   read num_def_mv MVs in the same format as above
   decode per-macroblock default motion vector index using arithmetic coder with top/left/topleft context
  }
 }
 if (flags & 0x400) {
  all macroblocks are inter
 } else {
  decode intra-MB flags using arithmetic coder with top/left/topleft/previous value context
 }
 decode MVS using arithmetic coder with top/left/topleft/previous value context
 add corresponding full-pel default per-macroblock MV to each halfpel block MV if applicable
}
if (!(flags & 4)) {
 decode Y blocks
 decode U blocks
 decode V blocks
} else {
 decode wavelet picture
}

Plane decoding with arithmetic coder and codebooks:

decode block uncoded flags using arithmetic coder and top/left/topleft context
for each 8x8 block {
 if (intra block) {
  read 8-bit DC
  if (!uncoded block) {
   decode coefficients 1-64 for a block
  }
 } else if (!uncoded block) {
  decode coefficients 0-64 for a block
 }
}

Wavelet decoding seems to be based on LGT 5/3 wavelet, discarding HH band, and coding data in bitslicing mode (i.e. all top bits first, then next-to-top bits, etc etc) using binary runs very similar to RL-coding in Domen.

XVD

This is the last instalment in VGM Video series. Now the codec is DCT-only and uses either context-adaptive binary coder, arithmetic coder or a mix of arithmetic coder and variable-length codes. There is still one halfpel-precision motion vector per 8x8 block.

Extradata format

4 bytes - width
4 bytes - height
4 bytes - bitrate?
4 bytes - FPS
4 bytes - edge size (always 4?)
4 bytes - MV radius (always 1?)
4 bytes - flags

Flags meaning:

  • bit 8 - probably interlaced coding
  • bit 9 - use DC prediction
  • bit 10 - use MV prediction
  • bit 11 - use binary coder

Frame format

16 bits - flags
if (flags & 1) {
 this is skip frame, do nothing else
}
8 bits  - quantiser
if (flags & 0x100) {
 8 bits - altquant difference
 if (flags & 0x200) {
  read bit plane with flags telling which quantiser macroblock should use
 }
}
if (flags & 2) { // intra frame
 reset binary coder state
} else {
 if (flags & 0x40) {
  8 bits - number of default MVs (0-3)
  if (num_def_mv == 0) {
   8 bits signed - mv_x
   8 bits signed - mv_y
  } else {
   read num_def_mv MVs in the same format as above
   decode per-macroblock default motion vector index using arithmetic coder with top/left/topleft context
  }
 }

 if (flags & 0x400) {
  all macroblocks are inter
 } else if (use_bincoder) {
  decode intra-MB flags using binary coder with top/left/topleft/previous value context
 } else {
  decode intra-MB flags using arithmetic coder with top/left/topleft/previous value context
 }

 if (!(flags & 0x800)) {
  decode MVS using arithmetic coder with top/left/topleft/previous value context
 } else if (use_bincoder) {
  decode x component using binary coder with top/left/topleft/previous value context
  for values >= 3 read that amount of bits as actual value; read sign bits for component
  decode y component using binary coder with top/left/topleft/previous value context
  for values >= 3 read that amount of bits as actual value; read sign bits for component
  median-predict MVs
 } else {
  decode MV present flags using arithmetic coder with top/left/topleft/previous value context
  decode actual MVs using MV codebook and apply prediction on them if codec flags say so
 }
 add corresponding full-pel default per-macroblock MV to each halfpel block MV if applicable
}
decode Y plane using either binary coder or arithmetic coder and codebooks
decode U plane using either binary coder or arithmetic coder and codebooks
decode V plane using either binary coder or arithmetic coder and codebooks

DC prediction uses gradient prediction from neighbouring intra-coded blocks.

Plane decoding with arithmetic coder and codebooks:

decode block uncoded flags using arithmetic coder and top/left/topleft context
for each 8x8 block {
 if (intra block) {
  if (!use_dc_pred) {
   read 8-bit DC
  } else if (block has no intra-block top neighbours) {
   read DC using raw DC codebook
  } else {
   read DC difference using DC difference codebook
   add predicted DC value
  }
  if (!uncoded block) {
   decode coefficients 1-64 for a block
  }
 } else if (!uncoded block) {
  decode coefficients 0-64 for a block
 }
}

Coefficients decoding is done in this case with a simple run-length inter or intra codebook.

Plane decoding with binary coder:

for each 8x8 block {
 decode block uncoded flag for current block using top/left/topleft context
 if (intra block) {
  decode DC difference by unary coding for actual value length and N bypass bits for the DC difference value
  add DC prediction (use 128 when it is not available)
  if (!uncoded block) {
   decode coefficients 1-64 for a block
  }
 } else if (!uncoded) {
  decode coefficients 0-64 for a block
 }
}

Coefficients decoding is done by decoding the unary value for number of coefficients and N run-length pairs using position-adaptive models.