VMD

From MultimediaWiki
Jump to: navigation, search
This page is about the VMD format used in Sierra computer games. See the Internet Wave page for information about a web reference format with the extension VMD.

This page is based on the document 'Description of the Sierra Video and Music Data (VMD) Format' by Mike Melanson and Vladimir "VAG" Gneushev at http://multimedia.cx/vmd-format.txt.

VMD is the file extension of a multimedia file format used in a number of Sierra CD-ROM computer games. The extension stands for Video and Music Data. The format is most notable for its use in Sierra's beloved 7-CD classic, Phantasmagoria, and is also used in other multimedia-heavy Sierra titles.

File Format

All multi-byte numbers are stored in little-endian format.

A VMD file starts with the following 816- (0x330-)byte header:

 bytes 0-1      length of header, not including this length field; this
                length should be 0x32E (814)
 bytes 2-3      placeholder for VMD handle
 bytes 4-5      unknown
 bytes 6-7      number of blocks in table of contents
 bytes 8-9      top corner coordinate of video frame
 bytes 10-11    left corner coordinate of video frame
 bytes 12-13    width of video frame
 bytes 14-15    height of video frame
 bytes 16-17    flags
 bytes 18-19    frames per block
 bytes 20-23    absolute file offset of multimedia data
 bytes 24-27    unknown (Urban Runner samples contain "iv32" there)
 bytes 28-795   256 RGB palette entries, 3 bytes/entry in R-G-B order
 bytes 796-799  recommended size (bytes) of data frame load buffer
 bytes 800-803  recommended size (bytes) of unpack buffer for video decoding
 bytes 804-805  audio sample rate
 bytes 806-807  audio frame length/sample resolution
 bytes 808-809  number of sound buffers
 bytes 810-811  audio flags
 bytes 812-815  absolute file offset of table of contents

Note that the RGB color components are 6-bit VGA palette components which means that they range from 0..63. The components need to be scaled if they are to be used in rendering typical RGB images where the components are 8 bits.

A VMD file has a table of contents describing all of the file's block and frame information. The absolute file offset of the table of contents is given in the file header and usually points to the end of the file. The table of contents contains 2 parts: The block offset table and the frame information table. Blocks and frames in VMD are different concepts. A frame contains audio or video. A block contains both a video frame and an audio frame. The block offset table consists of a series of 6-byte records. Each record has the following format:

 bytes 0-1      unknown
 bytes 2-5      absolute file offset of block

The number of entries in this table is specified by bytes 6-7 in the file header. After the block offset table is the frame information table. The frame information table consists of a series of 16-byte records with the following format:

 byte 0         frame data type
   1 = audio frame
   2 = video frame
 byte 1         unknown
 bytes 2-5      frame data length
 The meaning of the frame's remaining data depends on the frame type.
 if this is an audio frame:
   byte 6       audio flags
   bytes 7-15   unknown
 if this is a video frame:
   bytes 6-7    left coordinate of video frame
   bytes 8-9    top coordinate of video frame
   bytes 10-11  right coordinate of video frame
   bytes 12-13  bottom coordinate of video frame
   byte 14      unknown
   byte 15      bit 1 (byte[15] & 0x02) indicates a new palette

Generally, a frame information record needs to be made available to the audio or video decoder units of a VMD decoding application as the information is relevant to the decoding process.

Video Format

The VMD video coding method uses the Lempel-Ziv (LZ77) algorithm, run length encoding (RLE), and interframe differencing to compress 8-bit palettized video data.

VMD video embodies both intraframes (a.k.a. keyframes) and interframes. Intraframes update the entire frame. Interframes only update portions of the frame that have changed from the previous frame. The first video frame of a VMD file is implicitly intracoded (the first frame has to paint the entire viewing area). The successive frames are all intercoded.

The frame record for a video frame specifies the frame coordinates of the rectangular region that will be updated. For example, if the file header specifies that the video is 200 pixels wide and 100 pixels high, the left, top, right, and bottom coordinates of the rectangular update region will be 0, 0, 199, and 99, respectively, if the entire frame is to be updated. A subsequent interframe may choose to leave much of the previous frame unchanged and only update the block from (100, 10) -> (150, 40). In this case, the coordinates 100, 10, 150, and 40 would be encoded in the frame record.

The initial palette for decoding VMD video is transported in the main VMD file header. If bit 1 of frame record byte 15 (byte[15] & 0x02) is set to 1, the compressed video data chunk begins with a new palette. The palette data is transported as 770 bytes. The first byte contains the first palette index to modify. The second byte contains the number of palette entries to change. The remaining 768 bytes are 256 R-G-B palette triplets. Again, these are stored as 6-bit VGA DAC values and should be scaled accordingly.

The compressed video data begins with a byte describing how the data is encoded. The byte specifies the following information:

 bit 7      specifies that data chunk is LZ compressed
 bits 6-0   specifies 1 of 3 rendering methods

If bit 7 is 1, the data chunk must be passed through the LZ decoder before progressing to the rendering phase. If bit 7 is 0, the data chunk is passed directly to the rendering phase.

The VMD LZ decoding algorithm takes the compressed video buffer (after the coding method byte described above) as input and outputs a buffer of decoded bytes. The output buffer must be as large as indicated in bytes 800-803 of the main VMD header. The VMD LZ decoding algorithm operates as follows:

 allocate a circular queue of 4096 (0x1000) bytes and initialize all
   elements to 0x20; note that the queue is addressed with a 12-bit
   number
 initialize variable dataleft as the first 4 numbers in the block
 if the next 4 bytes are (0x34 0x12 0x78 0x56)
   advance stream over the 4 marker bytes
   initialize queue position (qpos) to 0x111
   initialize special chain length (speclen) to 18
 else
   initialize qpos to 0xFEE
   initialize speclen to nothing (any value above 18 will suffice in
     this example)
 proceed to main decode loop...
 while there is more data left (dataleft > 0)
   tag = the next byte in the stream
   if (tag is 0xFF) and (dataleft > 8)
     take the next 8 bytes from the stream and place them in both the
       output buffer and the circular queue
     subtract 8 from dataleft
   else
     foreach bit in tag byte, reading from right -> left
       if (bit is 1)
         take the next byte from the stream and place it in both the
           output buffer and the circular queue
         decrement dataleft
       else
         move a chain of bytes from the circular queue to the output
         get the length and beginning offset of the chain from the next
           2 bytes in the stream:
           byte 0: bits 7-0: lower 8 bits of beginning offset
           byte 1: bits 7-4: upper 4 bits of beginning offset
                   bits 3-0: length of chain, minus 3
         thus, add 3 to the length to obtain the actual length
         if (length is equal to speclen)
           length is 18 (max ordinary speclen value) + next byte in
             stream
           copy the byte chain from the circular queue to the output;
             in the process, add the chain back into the queue
           subtract length from dataleft

There are 3 rendering methods that a frame can use to paint the raw or LZ-decoded data (referred to as the video data buffer) onto the final output frame.

Method 1 iterates through each line in the output frame, as indicated by the dimensions specified in the frame information record. For each line:

 offset = 0
 repeat
   length = next byte in video data buffer
   if (bit 7 of length byte is 1)
     mask off bit 7 of length and add 1 (length = (length & 0x7F) + 1)
     copy length bytes from the video data buffer to the output frame
     advance offset by length
   else
     increment length
     copy length bytes from the same position in the previous frame to
       the current frame
 while (offset < frame width)

Method 2 simply copies the entire video data buffer onto the output frame. This is the simplest rendering method, but be sure to take into account the frame's specific decoding dimensions as specified in the frame record.

Method 3 operates just like method 1 except for one small change. When bit 7 of the length byte is 1 and the length byte has been masked and incremented, the next byte in the video data buffer is examined. If the byte is not 0xFF, perform the same copy as in method 1. If the byte is 0xFF, apply a RLE decoding algorithm to unpack the data from the video data buffer into the output frame. The RLE unpacking algorithm operates as follows:

 if the length is odd, copy the next byte from the video data buffer to
   the output frame and decrement length
 divide length by 2
 while (length > 0)
   fetch the next byte from the video data buffer
   if the top bit of the byte is 1 (byte & 0x80)
     drop the top bit of the byte and shift left by 1:
       byte = (byte & 0x7F) << 1
     copy (byte) bytes from video data buffer to output frame
   else
     foreach count in byte
       copy the next 2 bytes from video data buffer to output frame; in
         other words, bytes A and B from the video data buffer will be
         repeated n times: ABABABAB...

Audio Format

8-bits audio stored as a raw PCM samples, all 16-bits sound are 2:1 DPCM-encoded. First, you need to ensure VMD contains any sound by checking bit 12 (& 0x1000) of the file header' flags. Non-zero bit indicates file has sound. Fields audio_sample_rate and audio_frame_length contains playback rate and size of single (compressed) sound block respectively. Negative audio frame length used to indicate 16-bits sound data, in this case you need to invert this field to get the actual block length. Audio flags field keeps other important flags: bit 15 (& 0x8000) indicates old-style stereo-sound, while bit 9 (& 0x200) - new stereo sound format (introduced in Shivers 2 game). These formats a little different in the meaning of several fields, making original playback core is not backward compatible - Shivers 2 can not play old videos properly. Optimal way is check bit 15 first and if it's zero, additionally check bit 9 to determinate number of channels. The main difference between old and new formats - old vmds treat audio frame length field as the number of samples for both channels, but new version - only as number of samples for single channel (i.e. you need to multiply it by 2 for stereo sound).

When you encounter frame information record of type 1, proceed to audio decoding. First, analyse frame's audio_flags byte. It may be eiter:

  • 1 - normal sound block

Decompress single audio block and continue to next frame

  • 2 - multiple sound and silence blocks

Get next 4 bytes of the frame's data. This is a sound mask bits. Starting from bit 0, each non-zero bit indicates silence block. Zero means normal audio block. Thus, iterate number_of_sound_buffer times to fill chain of sound and/or silence blocks.

  • 3 - single silence block

Fill whole block with silence.

The decompression scheme is quite trivial. As stated above, it has been used only if file contains 16-bits audio. First, you need to get one or two (depends on number of the channels) initial samples. For mono and new-stereo sound each initial sample is the first word(s) of audio stream data. Old-style stereo sound has no static samples, instead, they are initialized to zero at the beginning of the playback and carried between successive frames. This may cause some difficulties with random seeking over such kind of files. Using these samples, perform decoding of the rest of chunk' bytes using this formula:

if code & 0x80  sample = sample - Table[code & 0x7F]
else            sample = sample + Table[code & 0x7F]

Where code is the bytes of packed data. Interleave left-right samples decoding for stereo sound.

The delta table is:

   0,    8,   16,   32,   48,   64,   80,   96,  112,  128,  144,  160,  176,  192,  208,  224
 240,  256,  272,  288,  304,  320,  336,  352,  368,  384,  400,  416,  432,  448,  464,  480
 496,  512,  520,  528,  536,  544,  552,  560,  568,  576,  584,  592,  600,  608,  616,  624
 632,  640,  648,  656,  664,  672,  680,  688,  696,  704,  712,  720,  728,  736,  744,  752
 760,  768,  776,  784,  792,  800,  808,  816,  824,  832,  840,  848,  856,  864,  872,  880
 888,  896,  904,  912,  920,  928,  936,  944,  952,  960,  968,  976,  984,  992, 1000, 1008
1016, 1024, 1088, 1152, 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664, 1728, 1792, 1856, 1920
1984, 2048, 2304, 2560, 2816, 3072, 3328, 3584, 3840, 4096, 5120, 6144, 7168, 8192,12288,16384

Games Using VMD

These are some of the Sierra computer games that are known to use the VMD file format: