Motion Pixels

From MultimediaWiki
Jump to navigation Jump to search

Introduction

Motion Pixels is a family of codecs that started with its custom MVI Container format (with MVI extension as well) and later developed into VfW codecs MVI1 and MVI2.

Motion Pixels version 1 (MVI1) was used in a few video games while version 2 (MVI2) was used in a number of Movie CDs. All of these items were published by Sirius Publishing and the Motion Pixels codec is believed to still be owned by the company's CEO, Richard Gnant.

MVI belongs to the "old-school" family of video codecs and relies on interframe differences and adaptive delta-coding for horizontal lines of the picture. Delta coefficients are additionally Huffman-packed. For better compression at the cost of picture quality additional colorspace downsampling may be used. MVI2 adds smoother delta-coding and the ability to dynamically change downsampling for each frame.

MVI Codec

Common things and differences

Essentially there are four (or even five) flavours of the codec:

  • DOS version in MVI Container. Format is set in the header flags. Frames are treated either as intra or inter (depending on the flag set).
  • MVI1 codec. Format is set in the flags. The only known flags include flipped frames, single-field coding, and golden frame.
  • MVI2.0 codec. Format is set in the flags. In addition to MVI1 flags it also supports low-resolution mode and frame cropping.
  • MVI2.1 codec. This one is signalled by a certain flag, actual coding format is set in intra frame header. Compared to MVI2.0 it also enables smooth delta coding mode.
  • MVI2.2 codec. This one is signalled by a different flag but the behaviour seems to be identical to MVI2.1

All variations of MVI use the same coding principle: first special rectangular areas are marked on the frame that require special handling; they are processed and the rest of pixels are decoded using Huffman-compressed deltas. For better compression data in most cases is represented in YUV colourspace with some subsampling. See Appendix A for the code to convert between 15-bit RGB and YUV.

Supported colourspace formats are:

  • 0 - RGB
  • 1 - YUV with one chroma pair per 2x1 block
  • 2 - YUV with one chroma pair per 2x2 block
  • 3 - YUV with one chroma pair per 4x2 block
  • 4 - YUV with one chroma pair per 4x4 block

Overall bitstream format looks like this:

  • intra frame flag (not present in DOS version)
  • coding parameters like colourspace, lowres mode and smooth delta coding (only in intra frames and only in MVI2.1/MVI2.2)
  • skip map (not for intra frames)
  • golden frame map (only when the feature is enabled)
  • fill map
  • low-resolution areas map (only when the feature is enabled)
  • Huffman codes description for deltas
  • top left pixel RGB value (in there are delta codes and top left pixel is marked for normal delta decoding)
  • number of delta codes for vertical prediction and first field (if there are delta codes)
  • number of delta codes for the second field (ditto)
  • Huffman-coded deltas

Bit reading is done MSB first using 32-bit little-endian words (e.g. 00 00 10 80 is represented as 0x80100000 and first read bit will be 1).

Compression Flags

When video is packaged inside AVI files, all vital codec's parameters carried in BITMAPINFOHEADER's biCompression field of the video stream. Actual codec type should be retrieved from fccHandler of AVI stream header. First two bytes of biCompression for 16-bit little-endian flags value, last two bytes should be "i1" for MVI1 and "i2" for MVI2 correspondingly.

Flag bits meaning:

  • 0..3 (0x000F) - colourspace Format (see below)
  • 4 (0x0010) - seems to be recognised but not used by the decoder (MVI2.2 only)
  • 5 (0x0020) - alternative output mode (MVI2.2 only)
  • 6 (0x0040) - seems to be intra-only mode where only intra frames are decoded and inter frames merely repeat them (MVI2.2 only)
  • 7 (0x0080) - MVI2.2 mode
  • 8 (0x0100) - still-frame (or golden frame) mode for MVI2.2
  • 9 (0x0200) - MVI2.1 mode
  • 10 (0x0400) - frame cropping enabled (MVI2 only)
  • 11 (0x0800) - low-resolution mode (MVI2 only)
  • 12 (0x1000) - still-frame (or golden frame) mode
  • 13 (0x2000) - frames are coded upside-down (instead of bottoms-up by default)
  • 14 (0x4000) - video was encoded using trial version of the codec (this flag tells original decoder to put watermark to the frame)
  • 15 (0x8000) - image is single-field, only odd lines are carried

Golden frame mode

When videos with small amount of changes encoded (ex, 'talking heads' or some static scenery) special trick may be performed. Very first frame contains all the scenery, while all successive frames take this first frame as a base and add necessary changes to it. The tricky part is, that first frame contains such a 'change-frame' as well, so you have two logical frames in one physical. Offset to the base frame is located at (biSizeImage - 4) in the encoded data buffer. Change-frame located at it's start.

Please note that both of the parts in the first frame still contain the golden frame map.

Delta decoding modes

Deltas are coded as values in the range -7..7. Since sometimes larger values are needed, there are two ways to deal with it. Motion Pixels before 2.1 used a rule that if delta value is -7 or 7 then next delta value should be doubled before use (and if its original value is -7 or 7 then the next delta value as well etc etc). Smooth delta mode operates differently: if delta value is -7 or 7 then add doubled next delta value to the result (and keep doing it while additional delta values are -7 or 7).

Low-resolution mode

In this mode half of the pixels are coded using interpolation i.e. when only luma changes are coded then first (interpolated pixel) uses half of that delta value added to the previous pixel and second pixel uses full delta. If exactly one pixel is left between low-res and skip areas, it is interpolated in the same way (luma delta is calculated and half of it is added to the predictor value). If exactly one pixel is left between low-res and and normal area, it uses half of the first delta from the first pixel of that area.

Depending on coding mode and line number full and pixel position may be swapped (e.g. for even lines it may be "interpolated full interpolated full" while for odd lines it is "full interpolated full interpolated"). Additionally in smooth deltas mode luma deltas use limited range unless the very first pixel is full one.

Please note that during vertical prediction stage low-resolution mode pixels are treated as skip pixels.

Decompression

Frame Flags

Each frame begins with sequence of local flag bits. These bits are:

# of bits Present if Name Description
1 always KeyframeFlag non-zero if this frame is a key frame
3 flags bit 9 ColorSpace colorspace Format for this group of frames
1 flags bit 9 LowRes low-resolution coding is enabled
1 flags bit 9 SmoothDeltas use smooth delta-coding
1 flags bit 9 HaveSerial (see next field)
20 flags bit 9 + HaveSerial SerialNumber Encoder's serial number
1 flags bit 9 Cropping frame cropping is enabled
8 flags bit 10 or Cropping CropX x offset of the frame start
8 flags bit 10 or Cropping CropY y offset of the frame start

Maps

Map blocks contain rectangles which should be marked as special on the frame. All versions prior to MVI2.1 use the same format:

  • 12 bits number of large rectangles
  • 12 bits number of small rectangles
  • large rectangles
  • small rectangles

Rectangles are stored as <offset, width-1, height-1> with offset taking ceil(log2(frame width * frame height)) bits and width and height using 8 or 4 bits depending on rectangle size. Rectangles for fill map additionally contain RGB555 fill value.

MVI2.1/MVI2.2 use a different format, they split image into chunks of 8192 bytes and for each of them use 8-bit number of large/small rectangles as well as 13-bit offsets relative to the start of that piece.

Deltas Bundle

Deltas start with 4-bit code telling how many unique delta values are being coded. Zero means no deltas are present at all, one value means next four bit contain delta value plus seven and all deltas in the frame are the same. Otherwise there's 4-bit value telling the suggested number of bits to look up during Huffman decoding followed by symbol values (delta value plus seven) and bit-coded Huffman tree shape (0 - this is a leaf; 1 - node has children, read description for it).

Then there are two numbers of deltas per field, using 18 or 19 bits (depending on whether image is larger than 320*240 or not) and the actual delta values (when there are two or more unique delta values).

Assembling the Frame

  • leave areas marked as "skip" (but not "lowres") unchanged
  • copy image data from the golden frame to the areas marked as such (there is no motion displacement)
  • fill areas marked as "fill" with the provided colour
  • perform vertical prediction for the first column (skip if there are no deltas)
  • perform prediction for odd lines using the rest of deltas from the first part (skip if there are no deltas)
  • perform prediction for even lines using deltas from the second part (skip if there are no deltas)

Please note that while old DOS format used sequential line numbers so field 0 consisted of lines 0,2,4,... newer formats decode lines in permuted order. So for most formats field 0 is lines 1,3,5,... and for 4x4 mode lines for field 0 are decoded in this order: 3,1,7,5,11,9,...

Delta prediction is simple: for the line (or column) skip not predicted pixels, taking last of them as a new predictor, convert it from RGB to YUV, add delta for Y and optionally U and V components, put pixel converted to RGB on frame. Pixels in the same block use the same chroma values. The rules for which pixel is supposed to have U and V deltas depend on pixel position, subsampling mode, low-res mode being employed or not, and codec version.

Games Using Motion Pixels

Appendix A

Code for generating RGB to YUV table (it is done with YUV to RGB conversion, which is used during decoding as well):

  for (y = 0; y <= 31; y++) {
       for (v = -31; v <= 31; v++) {
           for (u = -31; u <= 31; u++) {
               r = (y * 1000 + v * 701) / 1000;
               g = (y * 1000 - 357 * v - 172 * u) / 1000;
               b = (y * 1000 + 886 * u) / 1000;
               if (r >= 0 && r < 32 && g >= 0 && g < 32 && b >= 0 && b < 32) {
                   pix = (r << 10) | (g << 5) | b;
                   if (!rgb2yuv[pix][0] && !rgb2yuv[pix][1] && !rgb2yuv[pix][2]) {
                       rgb2yuv[pix][0] = y;
                       rgb2yuv[pix][1] = u;
                       rgb2yuv[pix][2] = v;
                   }
               }
           }
       }
   }
   for (i = 0; i < 32768; i+= 32) {
       for (j = 0; j < 31; j++) {
           for (k = 31; k > j; k--) {
               pix = i + k;
               if (!rgb2yuv[pix][0] && !rgb2yuv[pix][1] && !rgb2yuv[pix][2]) {
                   rgb2yuv[pix][0] = rgb2yuv[pix - 1][0];
                   rgb2yuv[pix][1] = rgb2yuv[pix - 1][1];
                   rgb2yuv[pix][2] = rgb2yuv[pix - 1][2];
               }
           }
           for (k = 0; k < 31 - j; k++) {
               pix = i + k;
               if (!rgb2yuv[pix][0] && !rgb2yuv[pix][1] && !rgb2yuv[pix][2]) {
                   rgb2yuv[pix][0] = rgb2yuv[pix + 1][0];
                   rgb2yuv[pix][1] = rgb2yuv[pix + 1][1];
                   rgb2yuv[pix][2] = rgb2yuv[pix + 1][2];
               }
           }
       }
   }