Interplay Video

From MultimediaWiki
Jump to navigation Jump to search

Interplay Video is encapsulated inside Interplay MVE files. It comes in both 8- and 16-bit flavors, tracking the evolution of PC graphics capabilities in the mid-late 1990s. The page only describes the 8-bit variation right now.

Credits

The technical description on this page is originally based on an anonymous and thorough description of the format published when Interplay was proactive about pursuing individuals who tried to understand their data format. "BG" refers to the PC game Baldur's Gate, the apparent focus of the author's analysis.

Decoding Flow

First, the data is processed in 8x8 pixel blocks. There are 4 bits associated with each block giving the particular encoding to use for that block, giving a total of 16 possible encodings for a given block. These 4-bit pieces come from decoding map defined by the most recent 0xf opcode in the Interplay MVE data stream. All 16 possible opcodes appear to be used (or at least supported by the player). Note that every 8x8 block in the frame should be represented by a 4-bit nibble in the decoding map and that the decoding map should be a predictable size:

 ((width * height)pixels/frame / (8 * 8)pixels/block) * (1/2)bytes/block = bytes/frame

When decoding the block map, the bottom 4 bits (bits 3-0) are used first, then the top 4 bits.

I'll go over the encodings for each of the 16 encoding types. The rendering process keeps track of the most recent frame in a separate buffer, and uses this double-buffering technique in the common way for animation. The current frame's data is used in the construction of the next frame. In the following description, "current frame" will refer to the most recently displayed frame, and "new frame" will refer to the frame currently being constructed for display. "map stream" will refer to the data grabbed from the 0xf Opcode data, and "data stream" will refer to the data grabbed from the 0x11 opcode data.

Encoding 0x0

Block is copied from corresponding block from current frame. (i.e. this block is unchanged).

Encoding 0x1

Block is unmodified. This appears to mean that it has the same value it had 2 frames ago, but the net effect is that nothing is done to this block of 8x8 pixels.

Encoding 0x2

Block is copied from nearby (below and to the right) within the new frame. The offset within the buffer from which to grab the patch of 8 pixels is given by grabbing a byte B from the data stream, which is broken into a positive x and y offset according to the following mapping:

               if B < 56:
                   x = 8 + (B % 7)
                   y = B / 7
               else
                   x = -14 + ((B - 56) % 29)
                   y =   8 + ((B - 56) / 29)

(where % is the 'modulo' operator)

If you draw the region this represents, you'll see it looks like:

                      oooooooo#######
                      oooooooo#######
                      oooooooo#######
                      oooooooo#######
                      oooooooo#######
                      oooooooo#######
                      oooooooo#######
                      oooooooo#######
        #############################
        #############################
        #############################
        #############################
        #############################
        #############################
        ##########################

Where 'o' are the pixels in the destination frame, and # are the locations where the source frame could start.

Encoding 0x3

This is the same as encoding 0x2, with the exception that the x and y offsets are negated giving:

               if B < 56:
                   x = -(8 + (B % 7))
                   y = -(B / 7)
               else
                   x = -(-14 + ((B - 56) % 29))
                   y = -(  8 + ((B - 56) / 29))

(where % is the 'modulo' operator)

If you draw the region this represents, you'll see it looks like:

                  ##########################
               #############################
               #############################
               #############################
               #############################
               #############################
               #############################
               #######oooooooo
               #######oooooooo
               #######oooooooo
               #######oooooooo
               #######oooooooo
               #######oooooooo
               #######oooooooo
               #######oooooooo

Encoding 0x4

Similar to 0x2 and 0x3, except this method copies from the "current" frame, rather than the "new" frame, and instead of the lopsided mapping they use, this one uses one which is symmetric and centered around the top-left corner of the block. This uses only 1 byte still, though, so the range is decreased, since we have to encode all directions in a single byte. The byte we pull from the data stream, I'll call B. Call the highest 4 bits of B BH and the lowest 4 bytes BL. Then the offset from which to copy the data is:

               x = -8 + BL
               y = -8 + BH

Encoding 0x5

Similar to 0x4, but instead of one byte for the offset, this uses two bytes to encode a larger range, the first being the x offset as a signed 8-bit value, and the second being the y offset as a signed 8-bit value.

Encoding 0x6

I can't figure out how any file containing a block of this type could still be playable, since it appears that it would leave the internal bookkeeping in an inconsistent state in the BG player code. Ahh, well. Perhaps it was a bug in the BG player code that just didn't happen to be exposed by any of the included movies. Anyway, this skips the next two blocks, doing nothing to them. Note that if you've reached the end of a row, this means going on to the next row.

Encoding 0x7

Ok, here's where it starts to get really...interesting. This is, incidentally, the part where they started using self-modifying code. So, most of the following encodings are "patterned" blocks, where we are given a number of pixel values and then bitmapped values to specify which pixel values belong to which squares. For this encoding, we are given the following in the data stream:

               P0 P1

These are pixel values (i.e. 8-bit indices into the palette). If P0 <= P1, we then get 8 more bytes from the data stream, one for each row in the block:

               B0 B1 B2 B3 B4 B5 B6 B7

For each row, the rightmost pixel is represented by the low-order bit, and the leftmost by the high-order bit. Use your imagination in between. If a bit is set, the pixel value is P1 and if it is unset, the pixel value is P0.

If, on the other hand, P0 > P1, we get two more bytes from the data stream:

               B0 B1

Each of these bytes contains a 4-bit pattern. This pattern works exactly like the pattern above with 8 bytes, except each bit represents a 2x2 pixel region.

So, for example, if we had:

               11 22 ff 81 81 81 81 81 81 ff

This would represent the following layout:

               22 22 22 22 22 22 22 22     ; ff == 11111111
               22 11 11 11 11 11 11 22     ; 81 == 10000001
               22 11 11 11 11 11 11 22     ; ..
               22 11 11 11 11 11 11 22
               22 11 11 11 11 11 11 22
               22 11 11 11 11 11 11 22
               22 11 11 11 11 11 11 22     ; 81 == 10000001
               22 22 22 22 22 22 22 22     ; ff == 11111111

If, on the other hand, we had:

               22 11 ff 81

The output would be:

               22 22 22 22 22 22 22 22     ; f == 1 1 1 1
               22 22 22 22 22 22 22 22     ; 
               22 22 22 22 22 22 22 22     ; f == 1 1 1 1
               22 22 22 22 22 22 22 22     ; 
               22 11 11 11 11 11 11 11     ; 8 == 1 0 0 0
               22 11 11 11 11 11 11 11     ; 
               11 11 11 11 11 11 11 22     ; 1 == 0 0 0 1
               11 11 11 11 11 11 11 22     ; 

Encoding 0x8

Ok, this one is basically like encoding 0x7, only more complicated. Again, we start out by getting two bytes on the data stream:

               P0 P1

if P0 <= P1 then we get the following from the data stream:

                     B0 B1
               P2 P3 B2 B3
               P4 P5 B4 B5
               P6 P7 B6 B7

P0 P1 and B0 B1 are used for the top-left corner, P2 P3 B2 B3 for the bottom-left corner, P4 P5 B4 B5 for the top-right, P6 P7 B6 B7 for the bottom-right. (So, each codes for a 4x4 pixel array.) Since we have 16 bits in B0 B1, there is one bit for each pixel in the array. The convention for the bit-mapping is, again, left to right and top to bottom.

So, basically, the top-left quarter of the block is an arbitrary pattern with 2 pixels, the bottom-left a different arbitrary pattern with 2 different pixels, and so on. I'll go through a few examples of this after I discuss the other forms for the data in this encoding.

if P0 > P1 then we get 10 more bytes from the data stream:

               B0 B1 B2 B3 P2 P3 B4 B5 B6 B7

Now, if P2 <= P3, then [P0 P1 B0 B1 B2 B3] represent the left half of the block and [P2 P3 B4 B5 B6 B7] represent the right half.

If P2 > P3, [P0 P1 B0 B1 B2 B3] represent the top half of the block and [P2 P3 B4 B5 B6 B7] represent the bottom half.

In these last two cases, each bit represents a 1x1 pixel. Just to work through an example of each case:

               00 22 f9 9f 11 33 cc 33 44 55 aa 55 66 77 01 ef
               22 22 22 22 | 33 33 11 11     ; f = 1111, c = 1100
               22 00 00 22 | 33 33 11 11     ; 9 = 1001, c = 1100
               22 00 00 22 | 11 11 33 33     ; 9 = 1001, 3 = 0011
               22 22 22 22 | 11 11 33 33     ; f = 1111, 3 = 0011
               ------------+------------
               55 44 55 44 | 66 66 66 66     ; a = 1010, 0 = 0000
               55 44 55 44 | 66 66 66 77     ; a = 1010, 1 = 0001
               44 55 44 55 | 77 77 77 66     ; 5 = 0101, e = 1110
               44 55 44 55 | 77 77 77 77     ; 5 = 0101, f = 1111

I've added a dividing line in the above to clearly delineate the quadrants.

Now, for a horizontally split block:

               22 00 01 37 f7 31 11 66 8c e6 73 31
               22 22 22 22 66 11 11 11
               22 22 22 00 66 66 11 11
               22 22 00 00 66 66 66 11
               22 00 00 00 11 66 66 11
               00 00 00 00 11 66 66 66
               22 00 00 00 11 11 66 66
               22 22 00 00 11 11 66 66
               22 22 22 00 11 11 11 66

Finally, for a vertically split block:

               22 00 cc 66 33 19 66 11 18 24 42 81
               00 00 22 22 00 00 22 22
               22 00 00 22 22 00 00 22
               22 22 00 00 22 22 00 00
               22 22 22 00 00 22 22 00
               66 66 66 11 11 66 66 66
               66 66 11 66 66 11 66 66
               66 11 66 66 66 66 11 66
               11 66 66 66 66 66 66 11

Encoding 0x9

Similar to the previous 2 encodings, only more complicated. And it will get worse before it gets better. No longer are we dealing with patterns over two pixel values. Now we are dealing with patterns over 4 pixel values with 2 bits assigned to each pixel (or block of pixels).

So, first on the data stream are our 4 pixel values:

               P0 P1 P2 P3

Now, if P0 <= P1 AND P2 <= P3, we get 16 bytes of pattern, each 2 bits representing a 1x1 pixel (00=P0, 01=P1, 10=P2, 11=P3). The ordering is again left to right and top to bottom. The most significant bits represent the left side at the top, and so on.

If P0 <= P1 AND P2 > P3, we get 4 bytes of pattern, each 2 bits representing a 2x2 pixel. Ordering is left to right and top to bottom.

if P0 > P1 AND P2 <= P3, we get 8 bytes of pattern, each 2 bits representing a 2x1 pixel (i.e. 2 pixels wide, and 1 high).

if P0 > P1 AND P2 > P3, we get 8 bytes of pattern, each 2 bits representing a 1x2 pixel (i.e. 1 pixel wide, and 2 high).

Encoding 0xa

Similar to the previous, only a little more complicated. We are still dealing with patterns over 4 pixel values with 2 bits assigned to each pixel (or block of pixels).

So, first on the data stream are our 4 pixel values:

               P0 P1 P2 P3

Now, if P0 <= P1, the block is divided into 4 quadrants, ordered (as with opcode 0x8) TL, BL, TR, BR. In this case the next data in the data stream should be:

                               B0  B1  B2  B3
               P4  P5  P6  P7  B4  B5  B6  B7
               P8  P9  P10 P11 B8  B9  B10 B11
               P12 P13 P14 P15 B12 B13 B14 B15

Each 2 bits represent a 1x1 pixel (00=P0, 01=P1, 10=P2, 11=P3). The ordering is again left to right and top to bottom. The most significant bits represent the left side at the top, and so on.

If P0 > P1 then the next data on the data stream is:

                           B0 B1 B2  B3  B4  B5  B6  B7
               P4 P5 P6 P7 B8 B9 B10 B11 B12 B13 B14 B15

Now, in this case, if P4 <= P5, [P0 P1 P2 P3 B0 B1 B2 B3 B4 B5 B6 B7] represent the left half of the block and the other bytes represent the right half. If P4 > P5, then [P0 P1 P2 P3 B0 B1 B2 B3 B4 B5 B6 B7] represent the top half of the block and the other bytes represent the bottom half.

Encoding 0xb

In this encoding we get raw pixel data in the data stream -- 64 bytes of pixel data. 1 byte for each pixel, and in the standard order (l->r, t->b).

Encoding 0xc

In this encoding we get raw pixel data in the data stream -- 16 bytes of pixel data. 1 byte for each block of 2x2 pixels, and in the standard order (l->r, t->b).

Encoding 0xd

In this encoding we get raw pixel data in the data stream -- 4 bytes of pixel data. 1 byte for each block of 4x4 pixels, and in the standard order (l->r, t->b).

Encoding 0xe

This encoding represents a solid frame. We get 1 byte of pixel data from the data stream.

Encoding 0xf

This encoding represents a "dithered" frame, which is checkerboarded with alternate pixels of two colors. We get 2 bytes of pixel data from the data stream, and these bytes are alternated:

               P0 P1 P0 P1 P0 P1 P0 P1
               P1 P0 P1 P0 P1 P0 P1 P0
               ...
               P0 P1 P0 P1 P0 P1 P0 P1
               P1 P0 P1 P0 P1 P0 P1 P0

Frame Accounting

The video coding method technically keeps track of 3 frames: The current frame, the last frame, and the second last frame (the frame from 2 frames ago). The video decoder will typically be implemented using 2 frames, one for current frame and one for last frame. When a new frame decode operation begins, the decoder actually begins drawing on the frame that was current 2 frames ago.

Encoding modes 0x1 and 0x2 can be a little confusing without the above piece of knowledge. Mode 0x1 uses the block from the same position from the frame 2 frames ago. If the decoder is currently rendering on the frame created 2 frames ago, the block can safely be skipped in order to be unchanged. Similarly, mode 0x2 copies a block using a motion vector that will point right and down from the current block, but it copies "from the current frame". Due to the assumed double-buffering scheme, the data will actually come from the second last frame.

16-Bit Data

Later versions of the format support 16-bit RGB data. There are some differences between the 8- and 16-bit modes that are not yet documented (and unlikely to be in near future).

Format is mostly the same with these major changes:

  • pixels are 15-bit stored as 16-bit little-endian words
  • data offset is 16, not 14. Those additional two bytes specify offset in the data for motion values for opcodes 0x2-0x4.
  • instead of comparing pair of pixel values, high bit is used (i.e. P0 > P1 --> P0 & 0x8000)

Here are codes that differ in meaning from 8-bit version:

code 0x6

The same block copy as for code 0x5 but from different reference frame.

code 0xF

The same as code 0x1 (do nothing on block).