Gremlin Digital Video
This document is based on the document 'Description of the Gremlin Digital Video (GDV) Format' by Mike Melanson and Vladimir "VAG" Gneushev found at http://multimedia.cx/gdv-format.txt.
This document is still a work in progress and is known to be incomplete.
Gremlin Digital Video (GDV) is a multimedia file format used in a number of CD-ROM computer games developed by a company named Gremlin Interactive. The extension stands for Gremlin Digital Video. The format is most notable for its use in the title Realms of the Haunting.
The file format is apparently capable of transporting palettized 8-bit video, or 15-, 16-, or 24-bit data, though only 8-bit data has been observed in games using GDV. The audio format is 8- or 16-bit PCM or DPCM.
File Format
All multi-byte numbers are stored in little endian format.
The general file format is laid out as follows:
GDV header initial palette (only for 8-bit video data) chunk 0 chunk 1 ...
Each frame has the following structure:
sound samples (if sound is present) frame header video data
The GDV header has the following structure:
bytes 0-3 magic number/file signature (should be 0x94 0x19 0x11 0x29) bytes 4-5 size ID see appendix A for this little-used feature bytes 6-7 number of frames in file bytes 8-9 framerate (frames/second) bytes 10-11 sound flags bit 3 packed data (1 = DPCM, 0 = PCM) bit 2 sample width (1 = 16-bit, 0 = 8-bit) bit 1 channels (1 = stereo, 0 = mono) bit 0 audio present (1 = file has audio, 0 = silence) bytes 12-13 sound playback frequency bytes 14-15 image type bits 2-0 video depth: 1 = 8 bits/pixel (palettized) 2 = 15 bits/pixel 3 = 16 bits/pixel 4 = 24 bits/pixel bytes 16-17 frame size (maximum compressed frame size) if this field is 0 then there is no video in the file byte 18 unknown byte 19 lossiness bytes 20-21 frame width bytes 22-23 frame height
If the video is palettized (image type indicates 8 bits/pixel), the header is followed immediately by an initial 768-byte palette. There are 256 3-byte palette entries where the first byte represents the blue component, the second byte is green, and the third byte is red. Each of the palette components is a 6-bit VGA value ranging from 0..63. Thus, the component values must be scaled if rendering to more common 24- or 32-bit RGB formats.
Following the GDV header and possible palette, the GDV file stores a series of chunks. Each chunk has the following layout:
audio data (if audio present) video frame header (if video present) encoded video data (if video present)
The amount of audio data in a chunk is determined as:
amount_of_audio_data = (sample_rate / frames_per_second) * (number_of_channels) * (bits_per_sample / 8)
Further, if the audio is packed then divide amount_of_audio_data by 2.
A video frame header has the following structure:
bytes 0-1 magic number/frame signatures (should be 0x05 0x13) bytes 2-3 total size of frame bytes 4-7 frame type and flags bits 31-8 number of pixels to skip before decoding video data (applies to coding methods 5, 6, and 8) bit 7 unknown bit 6 keyframe (1 = intraframe, 0 = interframe) bit 5 vertical scaling is needed (only half of lines are coded) bit 4 horizontal scaling is needed (only half width is coded) bits 3-0 frame coding method
Thus far, only details of the 8-bit compression format have been determined. A frame's header indicates the coding method used.
Video frame coding
These are the known coding methods for 8-bit data:
0: new palette, no frame change 1: new palette, clear frame 2: basic LZ-like unpacking 3: frame unchanged from the previous frame 5: enhanced LZ-like unpacking 6: advanced LZ-like unpacking 8: most complicated version, mix of everything possible
The various coding methods are described in detail in the following sections.
General Notes about Video Coding
The frames are usually coded with an approach from LZ77 scheme where you either copy chunks of already decoded data (in this case it may be already decoded data for the current frame or data further in the framebuffer from the previously decoded frame), skip over some unchanged pixels or read new data. The back offsets are in the range -4096..-1 and there's a special area before the frame start of exactly 4096 which contains runs of pixels (initially it's each pixel value repeated 8 times and that table duplicated twice, the content will change for coding method 2) that may be used for decoding initial pixels.
Since this is LZ77 scheme, source and destination areas may overlap, so offset -1 is often used to signal that the area should be filled with the previous pixel.
Video Coding Method 0
Coding method 0 simply packs a new 768-byte palette structure into the payload of the video frame. The palette structure is identical to the initial palette stored after the GDV file header in an 8-bit GDV file.
The video frame remains unchanged from the previous frame.
Video Coding Method 1
Coding method 1 packs a new 768-byte palette structure into the payload of the video frame. The palette structure is identical to the initial palette stored after the GDV file header in an 8-bit GDV file.
This coding method also wipes the entire frame. If all of the top 24 bits of the frame header's frame type field (bits 31-8) are 0 or the video bit depth is not 8 bits/pixel, wipe the frame with all 0 values. Otherwise, wipe the the frame with all 0xFF values.
Video Coding Method 2
This method initialised the 4096-byte area before the frame (used for back copy) differently, now it's run of 16 pixels for each pixel value (i.e. 00 00 ... 00 01 01 ... 01 02 ... FF FF
).
Coding method 2 embodies a basic LZ-like scheme. To decode the encoded bytestream, begin by reading the first byte. This byte is a set of 4 2-bit instruction tags laid out as:
bits 76 54 32 10 aa bb cc dd
- If aa is 0 then paint a single pixel by copying the next byte from the encoded bytestream into the decoded image.
- If aa is 1 then copy a run of pixels from the area of the image that has already been painted. First, the source offset and run length must be decoded from the bytestream. For the next 2 bytes in the bytestream, byte_a followed by byte_b:
byte_a byte_b 76543210 76543210
The length of the run to be copied is defined as 3 more than bits 3-0 of byte_a. In C notation, this is expressed as:
length = (byte_a & 0x0F) + 3;
The beginning run offset is defined as the 12-bit quantity specified by the top 4 bits of byte_a combined with byte_b. In C notation, this is expressed as:
offset = ((byte_a & 0xF0) << 4) | byte_b;
The starting offset from which to copy is defined as the current offset in the output image minus the quantity (4096 - offset).
- If aa is 2 then the next pixels in the decoded frame are unchanged from the previous frame. The length of the unchanged pixel run is defined by the next byte in the encoded bytestream, plus 2. This gives the range of 2..257 unchanged pixels.
- If aa is 3 then the frame decode operation is finished. Presumably, a decoder should also stop decoding when it runs out of bytes in the encoded bytestream buffer.
After tag aa is decoded, decode tag bb using the same process as tag aa, then tag cc, followed by tag dd. After decoding tag dd, fetch the next byte from the encoded bytestream as the next tag byte and repeat the decoding process until a tag of 3 is encountered or until the encoded bytestream buffer is exhausted.
Video Coding Method 3
This method doesn't affect video data, but usually used to carry next chunk of sound.
Video Coding Method 5
Coding method 5 is similar to method 2 but extends it with some ways to code data.
Before decoding skip n bytes after the frame header, where n is defined in the frame header.
To decode the encoded bytestream, begin by reading the first byte. This byte is a set of 4 2-bit instruction tags laid out as:
bits 76 54 32 10 aa bb cc dd
- If aa is 0 then paint a single pixel by copying the next byte from the encoded bytestream into the decoded image.
- If aa is 1 then either copy a run of pixels from the area of the image that has already been painted, or fill a run of pixels with a constant pixel. First, the source offset and run length must be decoded from the bytestream. For the next 2 bytes in the bytestream, byte_a followed by byte_b:
byte_a byte_b 76543210 76543210
The length of the run to be copied is defined as 3 more than bits 3-0 of byte_a. In C notation, this is expressed as:
length = (byte_a & 0x0F) + 3;
The beginning run offset is defined as the 12-bit quantity specified by the top 4 bits of byte_a combined with byte_b. In C notation, this is expressed as:
offset = 4096 - (((byte_a & 0xF0) << 4) | byte_b);
Copy length
pixels to the current position in the output image from -offset
.
- If aa is 2 then either the next pixels in the decoded frame are unchanged from the previous frame, or the frame decode is finished. Decode the next byte from the bytestream as the length. If the length is 0 then the frame decode is finished. If the length is 0xFF then decode the next 16-bit value from the bytestream as the length. This length indicates the number of pixels from the current offset in the decoded frame that remain unchanged from the previous frame.
- If aa is 3 then copy pixels from the near area of the image that has already been painted. First, the source offset and run length must be decoded from the bytestream. For the next byte in the bytestream:
byte 76543210
Bits 7-2 define the 6-bit offset. Bits 1-0 plus 2 define the length. In C notation, this is expressed as:
offset = (byte >> 2) + 1; length = (byte & 0x03) + 2;
Copy length
pixels into the current position in the output image using the back position of -offset
.
After tag aa is decoded, decode tag bb using the same process as tag aa, then tag cc, followed by tag dd. After decoding tag dd, fetch the next byte from the encoded bytestream as the next tag byte and repeat the decoding process until a tag of 2 is encountered with an associated length of 0, or until the encoded bytestream buffer is exhausted.
Bit Reading Procedure For Video Coding Methods 6 and 8
Video coding methods 6 and 8 treat the encoded bytestream as a sequence of packed bits and bytes. The best way to illustrate the method is to jump in with an example bytestream:
0x2D 0xAA 0x5A 0x7F 0x26 0x53 0xB1 ...
The bit reader maintains a 32-bit bit queue and a queue size (qsize). Initialize the queue with the first 4 bytes in the bytestream interpreted as a little endian 32-bit number, and initialize the queue size to 16:
queue = 0x7F5AAA2D qsize = 16
Reading bits entails reading the least significant bits from the queue. Reading 4 bits in this example will yield 0xD. Afterwards, the bit reading state variables will be:
queue = 0x07F5AAA2 qsize = 12
As an example, assume the coding mode dictates that the next 3 bits shall be read (010 = 2) followed by the next 1 bit (0) and these codes indicate that the decoder should read the next byte from the encoded bytestream. This next byte is 0x26 in this example. The state variables after reading the next (3 + 1 = 4) bits will be:
queue = 0x007F5AAA qsize = 8
Assume the next decode operation is to read 16 bits from the stream (0x5AAA). The state variables are now:
queue = 0x0000007F qsize = -8
Since qsize is less than or equal to 0, fetch the next 16-bit value from the encoded bytestream and logically or it to the left of the remaining bits in the queue. Then add 16 to the qsize:
queue = 0x00B1537F qsize = 8
The descriptions of video coding method 6 and 8 will use the phrase "read the next n bits from the bit queue." This indicates that the next n bits should be shifted off of the rightmost part of the bit queue, the qsize should be decreased by n, and if qsize is less than or equal to 0 refresh the bit queue and increase qsize as described previously.
Video Coding Method 6
Coding method 6 embodies similar techniques as coding modes 2 and 5. The most significant is that the bytestream is decoded as described in the previous section.
To reach the start of the encoded bitstream, first skip n bytes after the frame header, where n is defined in the frame header. Initialize the bit queue at that point.
To decode the frame, read the next 2 bits from the bit queue as the instruction tag.
- If the tag is 0 then read the next bit from the bit queue. If the bit is 0 then copy the next byte in the bytestream into the output frame as the next pixel. If the bit is 1 then copy a series of pixels from the encoded bytestream into the output frame. The length of the pixel run to copy is obtained by the following process:
length = 2 count = 0 do count++ step = read (count) bits from bit queue length = length + step while (step == ((1 << count) - 1))
- If the tag is 1 then the next series of pixels in the output frame are unchanged from the previous frame. To determine precisely how many pixels are unchanged, read the next bit from the bit queue. If the bit is 0 then read the next 4 bits from the bit queue. These 4 bits plus 2 represent the number of pixels that are unchanged, which is in the range of 2..17.
If the bit is 1 then read the next byte from the bytestream as the length. If the top bit of the length is 0 then the actual length of the unchanged pixel run is length + 18 which yields a range of 18..145 pixels. If the top bit of the decoded length byte is 1 then read the next byte from the bytestream and perform the following calculation:
length = (((length & 0x7F) << 8) | next_byte) + 146;
- If the tag is 2 read the next 2 bits from the bit queue as the sub-tag.
If the sub-tag is 3 then either copy a run of pixels from the portion of the image that has already been decoded into the current offset, or fill a run of pixels with a constant pixel value. Read the next byte from the bytestream as the offset. If the most significant bit of the offset byte (bit 7) is set then the length of the next pixel operation is 3; othewise, the length is 2. Next clear bit 7 of the offset. If offset is 0 then take the most recent pixel in the decoded frame and fill the next (length) pixels with that value. If offset is non-zero then copy (length) pixels from the current offset - (offset - 1) from the decoded image to the current offset.
If the sub-tag is not 3 then read the next 4 bits from the bit queue. These bits comprise bits 11-8 of a 12-bit offset quantity. The bottom 8 bits of the offset quantity come from the next byte read from the bytestream.
If the sub-tag is 0 and the offset is 0xFFF then the frame decode operation is complete. If the sub-tag is 0 and the offset is greater than 0xF80 then read a pair of pixels from the portion of the output image already decoded and place the pair into the output frame a specified number of times. The length of the pixel run is 2 more than the bottom 4 bits of the 12-bit offset quantity. The actual offset is defined as bits 6-4 of the 12-bit offset quantity. In C notation length and offset are computed from offset as:
length = (offset & 0x00F) + 2; // applied then for pairs of pixels offset = (offset >> 4) & 7;
The pair of pixels are retrieved from the decoded image at the current offset - (offset - 1).
If the sub-tag is not 0 or the offset is less than or equal to 0xF80 then add 3 to the length. If the offset is 0xFFF, take the last pixel output into the decoded image and copy it (length) times into the decoded image at the current offset. If the offset is not equal to 0xFFF then copy a run of pixels from the area of the image that has already been painted. The starting offset from which to copy is defined as the current offset in the output image minus the quantity (4096 - offset).
- If the tag is 3 then either copy a run of pixels from the area of the image that has already been painted, or fill a run of pixels with a constant pixel. Read the next byte in the bytestream as the offset. The length is the top 4 bits of this byte (bits 7-4). If the length is 15 then read the next byte from the bytestream and add it to the length. Add 6 more to the length. Read the next byte from the bytestream and make it the bottom 8 bits of the offset quantity. If the offset is 0xFFF then take the
previous pixel from the decoded image and copy it into the decoded image for (length) iterations. If offset is other than 0xFFF then move length pixels into the decoded image at the current offset from the current offset + (offset - 4096).
Video Coding Method 8
Video coding method 8 is precisely the same as video coding method 6 except for the procedure for decoding tag 3. To decode tag 3, decode the next byte in the bytestream (first_byte).
If the top 2 bits of first_byte
are set (i.e. 11xxxxxx
) then the required operation is to copy a run of pixels from the previous frame to the current offset of the current frame. The length of the run is denoted by the bottom 6 bits of first_byte
plus 8. The 12-bit offset quantity is denoted by the next 4 bits is the bit queue (top 4 bits of the quantity) combined with the next byte from the bytestream (bottom 8 bits). Move (length) bytes from the previous frame at the current offset plus the offset quantity + 1 into the current offset of the current frame.
In the other case the required operation is to either copy a run the previous pixels to the current position or repeat the previous pixel for a number of pixels. If the top bit of first_byte is 0 (i.e. 0xxxxxxx
) then the length of the pixel run is defined as the quantity 6 plus bits 6-4 of first_byte
(this yields a range of 6..13). The offset is defined as the 12-bit quantity by combining bits 3-0 (top 4 bits of quantity) and the next byte in the bytestream (bottom 8 bits). If the top bit of first_byte
is set (i.e. 10xxxxxx
) then the length of the pixel run is defined as the quantity 14 plus bits 5-0 of first_byte
(this yields a range of 14..77). The offset is defined as the 12-bit quantity by combining the next 4 bits from the bit queue (top 4 bits of quantity) and the next byte from the bytestream (bottom 8 bits). If the offset is 0xFFF then take the previous pixel from the decoded image and copy it into the decoded image for (length) iterations. If offset is other than 0xFFF then move length pixels into the decoded image at the current offset from the current offset + (offset - 4096).
Scaling
TODO
Audio Format
Audio may be stored as either uncompressed PCM, or simple DPCM-packed data.
DPCM decompression operates on this way:
Initialize two 16-bits state varriables to zero at the beginning of the playback.
Read next byte of compressed data. Update first state varriable as state = state + DeltaTable[packed_byte]. Output this new state varriable as an unpacked sample (if stereo sound expected, this will be a left channel sample). Read next byte and perform same steps using second state varriable (right channel). Repat all over again to decode whole chunk of samples. Note that exactly same scheme used even for mono sound, i.e. still two state varriables used to decompress the data. Decompressing always produce 16-bits output. To convert it to a 8-bits unsigned sample, peek 8 high bits and invert it's top bit.
DeltaTable contains 256 delta-values calculated using following algorithm:
DeltaTable[0] = 0 delta = 0 code = 64 step = 45 repeat 127 times delta = delta + (code >> 5) code = code + step step = step + 2 DeltaTable[1] = delta DeltaTable[2] = - delta Advance to the next pair in DeltaTable DeltaTable[255] = delta + (code >> 5)
Size ID Table
If the width and height fields in the GDV header are 0 then the size ID field determines the frame dimensions. Note that this feature does not appear to be used in production GDV movies but the code is still in place. The following table lists the possible size IDs and their corresponding frame dimensions:
struct { short tag; /* size ID */ short width; short height; } FixedSize[] = { { 0, 320, 200}, { 1, 640, 200}, { 2, 320, 167}, { 3, 320, 180}, { 4, 320, 400}, { 5, 320, 170}, { 6, 160, 85}, { 7, 160, 83}, { 8, 160, 90}, { 9, 280, 128}, {10, 320, 240}, {11, 320, 201}, {16, 640, 400}, {17, 640, 200}, {18, 640, 180}, {19, 640, 167}, {20, 640, 170}, {21, 320, 240} };
Games Using The GDV Format
These games are known to use the GDV file format for their FMV: