This page is based on the document 'Description of the Sierra Video and Music Data (VMD) Format' by Mike Melanson and Vladimir "VAG" Gneushev at http://multimedia.cx/vmd-format.txt.
VMD is the file extension of a multimedia file format used in a number of Sierra CD-ROM computer games. The extension stands for Video and Music Data. The format is most notable for its use in Sierra's beloved 7-CD classic, Phantasmagoria, and is also used in other multimedia-heavy Sierra titles.
All multi-byte numbers are stored in little-endian format.
A VMD file starts with the following 816- (0x330-)byte header:
bytes 0-1 length of header, not including this length field; this length should be 0x32E (814) bytes 2-3 placeholder for VMD handle bytes 4-5 unknown bytes 6-7 number of blocks in table of contents bytes 8-9 top corner coordinate of video frame bytes 10-11 left corner coordinate of video frame bytes 12-13 width of video frame bytes 14-15 height of video frame bytes 16-17 flags bytes 18-19 frames per block bytes 20-23 absolute file offset of multimedia data bytes 24-27 unknown bytes 28-795 256 RGB palette entries, 3 bytes/entry in R-G-B order bytes 796-799 recommended size (bytes) of data frame load buffer bytes 800-803 recommended size (bytes) of unpack buffer for video decoding bytes 804-805 audio sample rate bytes 806-807 audio frame length/sample resolution bytes 808-809 number of sound buffers bytes 810-811 audio flags bytes 812-815 absolute file offset of table of contents
Note that the RGB color components are 6-bit VGA palette components which means that they range from 0..63. The components need to be scaled if they are to be used in rendering typical RGB images where the components are 8 bits.
A VMD file has a table of contents describing all of the file's block and frame information. The absolute file offset of the table of contents is given in the file header and usually points to the end of the file. The table of contents contains 2 parts: The block offset table and the frame information table. Blocks and frames in VMD are different concepts. A frame contains audio or video. A block contains both a video frame and an audio frame. The block offset table consists of a series of 6-byte records. Each record has the following format:
bytes 0-1 unknown bytes 2-5 absolute file offset of block
The number of entries in this table is specified by bytes 6-7 in the file header. After the block offset table is the frame information table. The frame information table consists of a series of 16-byte records with the following format:
byte 0 frame data type 1 = audio frame 2 = video frame byte 1 unknown bytes 2-5 frame data length
The meaning of the frame's remaining data depends on the frame type. if this is an audio frame: byte 6 audio flags bytes 7-15 unknown
if this is a video frame: bytes 6-7 left coordinate of video frame bytes 8-9 top coordinate of video frame bytes 10-11 right coordinate of video frame bytes 12-13 bottom coordinate of video frame byte 14 unknown byte 15 bit 1 (byte & 0x02) indicates a new palette
Generally, a frame information record needs to be made available to the audio or video decoder units of a VMD decoding application as the information is relevant to the decoding process.
The VMD video coding method uses the Lempel-Ziv (LZ77) algorithm, run length encoding (RLE), and interframe differencing to compress 8-bit palettized video data.
VMD video embodies both intraframes (a.k.a. keyframes) and interframes. Intraframes update the entire frame. Interframes only update portions of the frame that have changed from the previous frame. The first video frame of a VMD file is implicitly intracoded (the first frame has to paint the entire viewing area). The successive frames are all intercoded.
The frame record for a video frame specifies the frame coordinates of the rectangular region that will be updated. For example, if the file header specifies that the video is 200 pixels wide and 100 pixels high, the left, top, right, and bottom coordinates of the rectangular update region will be 0, 0, 199, and 99, respectively, if the entire frame is to be updated. A subsequent interframe may choose to leave much of the previous frame unchanged and only update the block from (100, 10) -> (150, 40). In this case, the coordinates 100, 10, 150, and 40 would be encoded in the frame record.
The initial palette for decoding VMD video is transported in the main VMD file header. If bit 1 of frame record byte 15 (byte & 0x02) is set to 1, the compressed video data chunk begins with a new palette. The palette data is transported as 770 bytes. The first byte contains the first palette index to modify. The second byte contains the number of palette entries to change. The remaining 768 bytes are 256 R-G-B palette triplets. Again, these are stored as 6-bit VGA DAC values and should be scaled accordingly.
The compressed video data begins with a byte describing how the data is encoded. The byte specifies the following information:
bit 7 specifies that data chunk is LZ compressed bits 6-0 specifies 1 of 3 rendering methods
If bit 7 is 1, the data chunk must be passed through the LZ decoder before progressing to the rendering phase. If bit 7 is 0, the data chunk is passed directly to the rendering phase.
The VMD LZ decoding algorithm takes the compressed video buffer (after the coding method byte described above) as input and outputs a buffer of decoded bytes. The output buffer must be as large as indicated in bytes 800-803 of the main VMD header. The VMD LZ decoding algorithm operates as follows:
allocate a circular queue of 4096 (0x1000) bytes and initialize all elements to 0x20; note that the queue is addressed with a 12-bit number initialize variable dataleft as the first 4 numbers in the block if the next 4 bytes are (0x34 0x12 0x78 0x56) advance stream over the 4 marker bytes initialize queue position (qpos) to 0x111 initialize special chain length (speclen) to 18 else initialize qpos to 0xFEE initialize speclen to nothing (any value above 18 will suffice in this example) proceed to main decode loop... while there is more data left (dataleft > 0) tag = the next byte in the stream if (tag is 0xFF) and (dataleft > 8) take the next 8 bytes from the stream and place them in both the output buffer and the circular queue subtract 8 from dataleft else foreach bit in tag byte, reading from right -> left if (bit is 1) take the next byte from the stream and place it in both the output buffer and the circular queue decrement dataleft else move a chain of bytes from the circular queue to the output get the length and beginning offset of the chain from the next 2 bytes in the stream: byte 0: bits 7-0: lower 8 bits of beginning offset byte 1: bits 7-4: upper 4 bits of beginning offset bits 3-0: length of chain, minus 3 thus, add 3 to the length to obtain the actual length if (length is equal to speclen) length is 18 (max ordinary speclen value) + next byte in stream copy the byte chain from the circular queue to the output; in the process, add the chain back into the queue subtract length from dataleft
There are 3 rendering methods that a frame can use to paint the raw or LZ-decoded data (referred to as the video data buffer) onto the final output frame.
Method 1 iterates through each line in the output frame, as indicated by the dimensions specified in the frame information record. For each line:
offset = 0 repeat length = next byte in video data buffer if (bit 7 of length byte is 1) mask off bit 7 of length and add 1 (length = (length & 0x7F) + 1) copy length bytes from the video data buffer to the output frame advance offset by length else increment length copy length bytes from the same position in the previous frame to the current frame while (offset < frame width)
Method 2 simply copies the entire video data buffer onto the output frame. This is the simplest rendering method, but be sure to take into account the frame's specific decoding dimensions as specified in the frame record.
Method 3 operates just like method 1 except for one small change. When bit 7 of the length byte is 1 and the length byte has been masked and incremented, the next byte in the video data buffer is examined. If the byte is not 0xFF, perform the same copy as in method 1. If the byte is 0xFF, apply a RLE decoding algorithm to unpack the data from the video data buffer into the output frame. The RLE unpacking algorithm operates as follows:
if the length is odd, copy the next byte from the video data buffer to the output frame and decrement length divide length by 2 while (length > 0) fetch the next byte from the video data buffer if the top bit of the byte is 1 (byte & 0x80) drop the top bit of the byte and shift left by 1: byte = (byte & 0x7F) << 1 copy (byte) bytes from video data buffer to output frame else foreach count in byte copy the next 2 bytes from video data buffer to output frame; in other words, bytes A and B from the video data buffer will be repeated n times: ABABABAB...
8-bits audio stored as a raw PCM samples, all 16-bits sound are 2:1 DPCM-encoded. First, you need to ensure VMD contains any sound by checking bit 12 (& 0x1000) of the file header' flags. Non-zero bit indicates file has sound. Fields audio_sample_rate and audio_frame_length contains playback rate and size of single (compressed) sound block respectively. Negative audio frame length used to indicate 16-bits sound data, in this case you need to invert this field to get the actual block length. Audio flags field keeps other important flags: bit 15 (& 0x8000) indicates old-style stereo-sound, while bit 9 (& 0x200) - new stereo sound format (introduced in Shivers 2 game). These formats a little different in the meaning of several fields, making original playback core is not backward compatible - Shivers 2 can not play old videos properly. Optimal way is check bit 15 first and if it's zero, additionally check bit 9 to determinate number of channels. The main difference between old and new formats - old vmds treat audio frame length field as the number of samples for both channels, but new version - only as number of samples for single channel (i.e. you need to multiply it by 2 for stereo sound).
When you encounter frame information record of type 1, proceed to audio decoding. First, analyse frame's audio_flags byte. It may be eiter:
- 1 - normal sound block
Decompress single audio block and continue to next frame
- 2 - multiple sound and silence blocks
Get next 4 bytes of the frame's data. This is a sound mask bits. Starting from bit 0, each non-zero bit indicates silence block. Zero means normal audio block. Thus, iterate number_of_sound_buffer times to fill chain of sound and/or silence blocks.
- 3 - single silence block
Fill whole block with silence.
DECOMPRESSION - TODO
Games Using VMD
These are some of the Sierra computer games that are known to use the VMD file format: