SGA: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
Line 89: Line 89:


== Encoded Video $81 ==
== Encoded Video $81 ==
This type of video is probably most appropriately called "encoded" rather than "compressed". Video frames of this type are represented with a series of codes that indicate the size, color, and pixel layout of the graphics to be drawn, rather than, for example, LZSS encoded byte streams as in previous versions of SGA files.
Video frames of this type are represented with a series of codes that indicate the size, color, and pixel layout of the graphics to be drawn, rather than, for example, LZSS encoded byte streams as in previous versions of SGA files.


The encoding scheme has been completely reverse-engineered, and a functional decoder has been written. A detailed description of the codes will be added at a later time.
The encoding scheme has been completely reverse-engineered, and a functional decoder has been written. A detailed description of the codes will be added at a later time.

Revision as of 03:21, 30 August 2014

SGA is a chunk-based multimedia file format used primarily in games by Digital Pictures for the Sega CD console system. Early versions of the format store uncompressed video frames, while later revisions add features such as LZ compression, tile maps, and overlapping chunks. The extension for SGA files is usually ".SGA", but some files from Supreme Warrior have the extensions ".CLP" and ".PT1", and some files from Slam City with Scottie Pippen have extensions that include numbers (e.g. ".SG0", ".S37", etc.). Variations of the format are found in versions of Digital Pictures games for DOS PCs and the 3DO. In those versions all video and audio is stored in large, monolithic files rather than individual files.

File Format

All multi-byte numbers in a SGA file are big-endian, since the Sega Genesis and Sega CD both use big-endian Motorola 68000 CPUs.

There are three known methods of storing data in SGA files:

(1) In the majority of files, each chunk is stored using 2048 byte sectors due to the nature of CD storage. The first sector in the file contains 2048 bytes of data, each subsequent sectors contain a 2 byte header specifying how much of the chunk is left followed by 2046 bytes of data. One item to be aware of is if the value of the header is zero then to skip the next 2046 bytes and check the next header. (I'm presuming this has something to do with padding the CD for faster loading?)

(2) Some files, in particular the audio-only files in Night Trap (all variations), do not use length indicators at the start of sectors. The files contain strictly chunk headers and data.

(3) Later versions of the SGA format, as seen in Slam City with Scottie Pippen, Prize Fighter, and Supreme Warriors, introduced a scheme in which chunks can overlap/interrupt other chunks. In this type of file, each sector begins with a two-byte sector header, with the top four bits set to a non-zero value, which we might call the chunk index. If the following twelve bits are set to zero, then we are at the start of a new chunk using the current index. If the following twelve bits are non-zero, then we are looking at a length value, similar to previous formats. A decoder written to handle this type of file would have to keep track of multiple chunks simultaneously. Videos used in Slam City use a container chunk (type F1) which generally contains a video chunk followed by an audio chunk. Videos in Supreme Warrior begin with what appears to be a global header (type F0) in addition to the F1 container chunk, but at present it is unknown how to interpret the metadata. There is also another chunk type, F2, whose use is presently unknown. F2 chunks can also exist in separate files of the extension ".F2". So far, all that is known about F2 chunks is that they often contain what appear to be filenames.

Most SGA files do not contain a global header, but have headers for each chunk. In files of type (1) and (2), headers are never split across sector boundaries. In files of type (3), chunk headers can be split across sector boundaries as long as they are contained within a chunk of type F1. Chunks are word aligned, can contain video or audio. The basic header is 4 bytes long, and is shared by all chunk types.

Byte  Description
----  -----------
0     Chunk type 
1     Stream index (Sewer Shark in particular uses this for its branching path-based gameplay)
2-3   Payload length

This is followed by more metadata:

Byte  Description
----  -----------
4-7   Time in SMPTE format

Video chunks for the Sega CD contain the following metadata:

Byte  Description
----  -----------
8     Data flags
9     Palette count (1-4)

A value of 1 for the topmost bit of byte 8 means that the chunk uses a tile map. Data for the tile map immediately follows the tile data. Each entry in the tile map is two bytes long, and consists of a tile index and flags for features such as vertical and horizontal flipping. Bits 3 and the lowermost bit 1 are usually set to 1, although their exact purpose is unknown. There may be some connection between bit 1 and tilemaps, though, since tilemap behavior seems to change when the bit is not set.

Metadata in video chunks for the Sega CD 32X are as follows:

Byte  Description
----  -----------
8     Palette start offset (usually 1)
9     Palette update size (a value of 0 means use existing palette)

Then more metadata (both Sega CD and Sega CD 32X):

Byte  Description
----  -----------
10    tiles per column
11    tiles per row

In overlay chunk types D1 (and possibly D4), byte 10 indicates the number of tiles in the chunk, and byte 11 indicates the length of the layout data for the tiles. Since each tile takes up 32 bytes, the length of the tile data is equal to the value in byte 10 * 32. Tile data is immediately followed by layout codes and a palette map.

Certain types of video chunks contain additional metadata, which is detailed below in the sections related to the various chunk types.

Audio chunks have the following metadata:

Byte  Description
----  -----------
8-9   Sample rate
10    (Probably) Number of channels / bytes per sample (usually 1)
11    Unknown (usually 0)

Chunk Types

Some known chunk types are:

  • $81: encoded video (used in most Sega CD 32X games by Digital Pictures)
  • $A1: audio, sign/magnitude 8-bit PCM
  • $C1: uncompressed video (used in Night Trap SCD, Sewer Shark, Corpse Killer SCD, and others)
  • $C2: compressed video (used in Corpse Killer SCD, Slam City with Scottie Pippen, and others?)
  • $C4: compressed? video (used in Slam City with Scottie Pippen)
  • $C6: compressed video (used in Night Trap SCD "DPLOGO.SGA", Sewer Shark, Make My Video C&C, and others)
  • $C7: compressed video (used in Prize Fighter and others)
  • $C8: compressed video (used in Sewer Shark, Make My Video C&C, and others)
  • $CB: compressed video (used in Prize Fighter, Double Switch "DPLOGO.SGA", Corpse Killer 32X)
  • $CD: compressed video (used in Double Switch)
  • $D1: uncompressed overlay video (used in Sewer Shark when killing a Ratigator(TM), etc)
  • $D4: compressed? overlay video (used in Corpse Killer SCD/32X for the zombies)
  • $E7: (un)compressed video (used in the Make My Video series)
  • $E8: compressed video keyframe (used in Ground Zero Texas)
  • $E9: compressed video interframe (used in Ground Zero Texas)
  • $F0: container for other chunks (used in Slam City)
  • $F1: container for other chunks (used in Slam City)
  • $F2: metadata of unknown use

As of this writing, the formats and/or compression schemes of the following types are generally understood: 81, A1, C1, C6, C7, C8, CB, CD, D1, E7, E8, E9, and F1.

Encoded Video $81

Video frames of this type are represented with a series of codes that indicate the size, color, and pixel layout of the graphics to be drawn, rather than, for example, LZSS encoded byte streams as in previous versions of SGA files.

The encoding scheme has been completely reverse-engineered, and a functional decoder has been written. A detailed description of the codes will be added at a later time.

Audio $A1

Audio sample rate can be determined in the following way (for NTSC systems only?):

SamplesPerSecond = ((Byte 8 << 8) + Byte 9) * SEGA_CD_PCM_INCREMENT

SEGA_CD_PCM_INCREMENT is ~15.8945723, and is calculated as SEGA_CD_PCM_FREQUENCY_MAX / 2048

SEGA_CD_PCM_FREQUENCY_MAX is ~32552.084, and is calculated as SEGA_CD_CPU_FREQUENCY / 384

SEGA_CD_CPU_FREQUENCY is 12500000, and is calculated as SEGA_CD_CRYSTAL_FREQUENCY / 4

SEGA_CD_CRYSTAL_FREQUENCY is 50000000

The sample rate can also be used to determine the frame rate of video by using the formula SamplesPerSecond / NumSamples, where NumSamples is the length of the audio data in the audio chunk.

Uncompressed Video $C1

For compatibility with the Genesis' video hardware, video frames in this format (and all its derivatives) are made up of linear 8x8 pixel tiles. Each pixel consists of a 4-bit (one nibble) palette index, thus each tile takes up 32 bytes. The length of the tile data can be calculated as tilesPerColumn * tilesPerRow * 32.

Palette data immediately follows the tile data. Each palette is 18 bytes long. Palettes are stored in an unusual format. As the genesis normally uses either RGB or BGR stored in nibbles (even though only the top 3 bits are used).

bitmap={1,2,4}

 For bit=0 to 2
    for color=0 to 15
        red[color]+=Top Most Bit of Data *bitmap[bit]
    next
 next

Repeat for green and blue.

Reading 2 bits for each tile, determines which of the 4 palettes to use.

 For Row=0 to RowMax
    For Col=0 to ColMax
        PalMap[Row*ColMax+Col]=Top 2 Bits of data
    Next
 Next

When drawing a tile you would select the palette based upon PalMap. Note that palette maps for 2 palette frames use only 1 bit per palette map entry vs 2 bits for 3 or 4 palettes.

If a frame has a tilemap, the tilemap immediately follows the palette data. Each tilemap entry is 16 bits long, and so the length of the tilemap in bytes is tilesPerColumn * tilesPerRow * 2. Tilemaps also contain palette information, and so frames with tilemaps do not contain a separate palette map. The top 4 bits of a tilemap entry indicate the palette index of the tile, and the remaining 12 bits indicate the index of the tile itself.

Compressed Video $C2

Not much is currently known about this type of chunk except that it contains four additional bytes of metadata, and in contrast to other chunk types, palette data immediately follows the metadata instead of following the tile data. The four additional bytes of metadata contain two 16-bit values, which are presumed to represent the lengths of different types of data stored in the chunk, but are as of yet unknown.

Compressed Video $C6, $C7, $C8

Chunks in this format contain the same basic elements as chunks of type C1 (tiles, palette, palette map, tilemap, etc), but are compressed in LZSS format. The compressed data are comprised of several 34 byte LZSS blocks. Each block contains a one word (2 bytes) tag of compression flags followed by 16 words of data.

The tag is read in bits, starting with the most significant (left most) bit. For each bit set to 0, there is an uncompressed word literal. For each bit set to 1, the following word is a displacement/length reference in the following format:

LLLD DDDD DDDD DDDD

L = Amount of words to copy (amount of bytes to copy * 2)
D = Displacement

This may be calculated as:

for count = 0 to top 3 bits of LZ word * 2
   data[current + count] = data[current + count - last 13 bits of LZ word]
next

Note that the displacement, unlike the copy amount, is based on bytes, not words. Also, the displacement does not have to be word aligned.

After the entire tag is read, the next flag block is read, and the process continues. The sequence ends when an reference word's top three bits are all zeros.

In C7 format chunks, the top three bits represent the amount of words to copy minus one. Any decoder that implements C7 decoding must take this into account.

Most chunks in C6, and all chunks C8 and E7 formats require that adjacent pixels in even-numbered lines be swapped for frames to be correctly displayed. The exact reason for this is unclear, although since Sega CD video often contains a lot of checkerboard dithering, swapping pixels would eliminate the checkerboard patterns and lead to higher amounts of identical/redundant data, thus leading to more efficient compression. There is currently no known way to determine which C6 chunks require pixel swapping, however, it seems that pixel swapping only occurs in C6 frames that use fewer than 3 palettes and that do not use a tilemap.

Compressed Video $CB, $CD, $E7

Like other compressed chunk formats, chunks in this format consist of several 34 byte LZSS blocks. Each block contains a one word (2 bytes) tag of compression flags followed by 16 words of data.

For each bit set to 1, the following word is a displacement/length reference in the following format:

LLLL DDDD DDDD DDDD

L = Amount of words to copy minus one (amount of bytes to copy * 2)
D = Displacement

This may be calculated as:

for count = 0 to (top 4 bits of LZ word + 1) * 2
   data[current + count] = data[current + count - last 12 bits of LZ word]
next

Chunks of type E7 are used in the Make My Video series, and contain three subframes, which are stacked on top of each other to complete the whole frame. C7 chunks have an additional six bytes of metadata containing three 16-bit values. The topmost bit of each value indicates whether the data for that frame is raw (1) or compressed (0). The remaining 15 bits represent the length of the data for the subframe, with an apparent maximum value of 0x1500. The data for each subframe immediately follows the metadata in order, which is followed by 180 bytes of palette data in the usual format.

Uncompressed Overlay Video $D1

As far as is currently known, chunks of type D1 are only used in the game Sewer Shark. The purpose of these chunks is to allow the game to overlay tiles onto the current video stream in order to display enemy death animations in a way that is both space-efficient and also compatible with the streaming nature of FMV games.

Chunks of type D1 are basically the same as chunks of type C1, except for a few crucial differences. In chunks of this type, byte 10 indicates the number of tiles in the chunk, and byte 11 indicates the length the data for what are assumed to be codes that specify the layout of the tiles. Since each tile takes up 32 bytes, the length of the tile data is equal to the value in byte 10 * 32. Tile data is immediately followed by layout codes and a palette map. The chunks contain no palette data of their own, but instead inherit the palette data of the normal video frame they most immediately precede. All chunks in Sewer Shark use SMPTE time codes, so a decoder written to handle video overlay chunks can use these codes to ensure that video is being laid over the correct frame.

Layout codes consist of one or two bytes. The most commonly seen codes are $D1, $D000, $Cy0x, $8x, and $4y0x. Code D1 is basically a line break + carriage return. D000 starts a new line and places one tile at column 17 (i.e. the right edge). Cy0x starts a new line and places x+1 tiles starting from column y+1. Code 8x places x+1 tiles starting from column 0. Code 4y0x places x+1 tiles offset y+1 columns to the right of where the last tile was placed. Tiles are laid in the order in which they are stored in the chunk.

Another way to look at it is this: The topmost bit indicates a line break and carriage return. The next most significant bit indicates that the low nibble of the current byte indicates the zero-based number of columns from the left that the next tile should be offset, and that the following byte indicates the zero-based number of tiles to be placed. The fourth bit indicates that the next tile should be placed 16(+1) columns from the left edge. The third bit is unused, although it is possible that it could be used to indicate that the next tile should be placed 32(+1) columns from the edge.

Palette maps in chunks of this type use a single bit to indicate the palette of each tile, since in the case of Sewer Shark, most of the video seen during gameplay uses only two palettes, and so the palette map only needs one bit for each entry (0 = palette #1, and 1 = palette #2).

Compressed Video $E8, $E9

Frames of type E8 and E9 are found in Ground Zero Texas. Video in this game is double-buffered (i.e. there are two frames of video in VRAM at any given time) and runs at 12 frames per second. Each second of video footage in the game consists of a single E8 keyframe followed by eleven E9 interframes. Frames of type E9 can inherit some or all of their frame data (tile, palette, and palette map data) from the frame which most immediately precedes them. When uncompressed, these chunks behave the same as "normal" chunks of type C1.

Frames of type E8 use intraframe compression. Tiles are stored as a series of variable-length chunks. At the beginning of each chunk is a single byte code word. Code word $00 indicates that the tile is raw (i.e. the next 32 bytes represent raw tile data). Code words $01 through $03 indicate compressed tiles. These code words are followed by 32 bits of compression flags, followed by literal data bytes (if any). A bit value of 0 in a compression flag indicates that a literal byte should be copied from data that follows the code word and compression flags. A bit value of 1 indicates that a byte should be copied from already decompressed data at the offset indicated by the code word.

Frames of type E9 use both intraframe and interframe compression with motion compensation. In addition to code words $00 through $03, they use nine additional code words: $05 to $0D. These code words indicate that data should be copied from the immediately preceding frame starting at a specified offset from the end of the data currently being decompressed.

Below is a summary of all the compression code words found in chunks of type E8 and E9:

Code   Frame      Offset
----   -----      ------
$01    current    -1
$02    current    -4
$03    current    -8
$05    previous   0
$06    previous   -1
$07    previous   +1
$08    previous   -4
$09    previous   +4
$0A    previous   -8
$0B    previous   +8
$0C    previous   -32
$0D    previous   +32

There is also a code word $FF, which is generally found in two places: when the length of the data currently being decompressed reaches 10240 bytes, and immediately before the last 32 bytes of data (which are uncompressed). If an $FF is found at an odd offset, then the following byte should be skipped (for proper word alignment). If an $FF is found on an even offset, then it is not necessary to skip the following byte. It is unclear why this code word appears at 10240 bytes, although it may have something to do with RAM addressing, etc.

Games Using SGA

  • Night Trap
  • Sewer Shark
  • Power Factory Featuring C&C Music Factory
  • Make My Video series
  • Ground Zero: Texas
  • Prize Fighter
  • Double Switch
  • Slam City with Scottie Pippen
  • Corpse Killer
  • Supreme Warrior

Decoders/Converters

  • A decoder has been written that can currently decode SGA files from Night Trap (SCD and 32X), Sewer Shark (including stream selection), Corpse Killer (SCD and 32X), Prize Fighter, Double Switch, as well as various other games, and convert them to AVI format files. It has yet to be released to the public.