Motion Wavelets

From MultimediaWiki
Jump to navigation Jump to search

This is a rather simple intra-only wavelet coding that uses static codebooks.

Extradata format

Surprisingly, extradata values are stored in big-endian order. There are the following fields there:

 4 bytes - extradata length including the header
 4 bytes - always seems to be 7
 4 bytes - version
 4 bytes - width
 4 bytes - height
 2 bytes - unknown
 2 bytes - bits per pixel
 4 bytes - source format (e.g. 0 - RGB, '024I' for YUV420)
 4 bytes - raw image size
 4 bytes - always zero?
 4 bytes - always zero?
 4 bytes - always zero?
 4 bytes - always zero?
 4 bytes - codec flags (0x100 seems to mean grayscale)
 3x2x2 bytes - vertical and horizontal transform levels for each YUV component (usually 4, 4, 3, 3, 3, 3)

Frame format

MotionWavelets frame consists of tags starting with FF FF FF FF followed by tag ID.

Known tags are:

  • 0xAA - frame header
  • 0xAB - alternative frame header
  • 0xD1 - wavelet band data
  • 0xD2 - alternative band data?
  • 0xDA - alternative band data?
  • 0xDD - unknown meaning

0xAA tag (frame header)

 4 bytes - tag size
 2 bytes - default Y plane bias?
 (non-grayscale) 2 bytes - default U plane bias?
 (non-grayscale) 2 bytes - default V plane bias?
 (for version > 1) 1 byte - unknown
 (for version > 4) 4 bytes - total packed frame size

0xD1 tag (wavelet band data)

This tag contains data for one wavelet band. Bands are stored in interleaved order (Y LL band, U LL band, V LL band, Y LH band, U LH band...) with their dimensions implicitly derived from the image size and the number of transform levels.

Pixels in bands are coded in boustrophedon order (i.e. first line left to right, next line right to left then left to right again).

Old tag header (before version 3):

 4 bytes - band size?
 4 bytes - unknown
 4 bytes - band quantiser multiplied by 32768 and stored as integer

New tag header (version 3 and later):

 1 byte  - band mode (0 means the band is not coded and no further data is present)
 4 bytes - band quantiser multiplied by 32768 and stored as integer (for non-empty bands)

The following band modes are known:

  • 0 - empty uncoded band
  • 5 - LL band (coefficients are coded as differences to the previous ones, no bias)
  • 1 - same coding as mode 5 but for coefficients instead of deltas, without quantisation bias
  • 9 - same as mode 1 but with quantisation bias
  • 2 - alternative band coding, no quantisation bias
  • 10 - alternative band coding, with quantisation bias

Band data coding

Ordinary band coding:

 get code from codebook 1
 switch (code) {
   case 0: read 8 bits of escape code, remap to -0xFB..-0x7C, 0x7C..0xFB range
   case 1: read 12 bits of escape code, remap to -0x8FB..-0xFC, 0xFC..0x8FB range
   case 0x80: read 16 bits of escape code, remap to -0x88FB..-0x8FC, 0x8FC..0x88FB range
   case 0xFC/0xFD/0xFE/0xFF: zero run of length 1/2/3/4
   case 2/3/4: read 4/8/12 bits and add 5/21/277 in order to obtain zero run value (or repeat count for mode 5)
   default: integer value (or delta for mode 5) is equal to code - 0x80
 }

Alternative band coding:

 get code from codebook 2
 if (code <= 0x40) {
   output code - 32
 } else if (code <= 0x7B) {
   output zero run of (code - 0x40)
 } else if (code >= 0xEF && code <= 0xFA) {
   read 16/14/10/9/8/7/6/5/4/3/2/1 bits, add 0x483A/0x83A/0x43A/0x23A/0x13A/0xBA/0x7A/0x5A/0x4A/0x42/0x3E/0x3C
   output zero run of the resulting length
 } else if (code == 0xFE || code == 0xFF) {
   read 14/10 bits for escape value and remap it to -0x2220..-0x221,0x221..0x2220/-0x220..-0x21,0x21..0x220 range
 } // other codes should not be present

Quantisation without bias is simply value * scale, with a bias it's value > 0 ? (value + 0.5) * scale : (value - 0.5) * scale. Band quantisers on upper levels should be multiplied by power of 2 (i.e. for the smallest bands the multiplier is 1.0, for the next level it's 2.0, for the next one it's 4.0 and so on).

Reconstruction is using simple lifting scheme:

 dst[2n]   = (lo[n] + hi[n]) / 16.0 + (lo[n-1] - lo[n+2]) / 128.0
 dst[2n+1] = (lo[n] - hi[n]) / 16.0 - (lo[n-1] - lo[n+2]) / 128.0

After vertical reconstruction band values should be multiplied by 128.