SGA

From MultimediaWiki
Jump to navigation Jump to search

SGA is a chunk-based multimedia file format used primarily in games by Digital Pictures for the Sega CD console system. Early versions of the format store uncompressed video frames, while later revisions add features such as LZ compression, tile maps, and overlapping chunks. The extension for SGA files is usually ".SGA", but some files from Supreme Warrior have the extensions ".CLP" and ".PT1", and some files from Slam City with Scottie Pippen have extensions that include numbers (e.g. ".SG0", ".S37", etc.). Variations of the format are found in versions of Digital Pictures games for DOS PCs and the 3DO. In those versions all video and audio is stored in large, monolithic files rather than individual files.

File Format

All multi-byte numbers in a SGA file are big-endian, since the Sega Genesis and Sega CD both use big-endian Motorola 68000 CPUs.

There are three known methods of storing data in SGA files:

(1) In the majority of files, each chunk is stored using 2048 byte sectors due to the nature of CD storage. The first sector in the file contains 2048 bytes of data, each subsequent sectors contain a 2 byte header specifying how much of the chunk is left followed by 2046 bytes of data. One item to be aware of is if the value of the header is zero then to skip the next 2046 bytes and check the next header. (I'm presuming this has something to do with padding the CD for faster loading?)

(2) Some files, in particular the audio-only files in Night Trap (all variations), do not use length indicators at the start of sectors. The files contain strictly chunk headers and data.

(3) Later versions of the SGA format, as seen in Slam City with Scottie Pippen, Prize Fighter, and Supreme Warriors, introduced a scheme in which chunks can overlap/interrupt other chunks. In this type of file, each sector begins with a two-byte sector header, with the top four bits set to a non-zero value, which we might call the chunk index. If the following twelve bits are set to zero, then we are at the start of a new chunk using the current index. If the following twelve bits are non-zero, then we are looking at a length value, similar to previous formats. A decoder written to handle this type of file would have to keep track of multiple chunks simultaneously. Videos used in Slam City use a container chunk (type F1) which generally contains a video chunk followed by an audio chunk. Videos in Supreme Warrior begin with what appears to be a global header (type F0) in addition to the F1 container chunk, but at present it is unknown how to interpret the metadata. There is also another chunk type, F2, whose use is presently unknown. F2 chunks can also exist in separate files of the extension ".F2". So far, all that is known about F2 chunks is that they often contain what appear to be filenames.

Most SGA files do not contain a global header, but have headers for each chunk. In files of type (1) and (2), headers are never split across sector boundaries. In files of type (3), chunk headers can be split across sector boundaries as long as they are contained within a chunk of type F1. Chunks are word aligned, can contain video or audio. The basic header is 4 bytes long, and is shared by all chunk types.

Byte  Description
----  -----------
0     Chunk type 
1     Stream index (Sewer Shark in particular uses this for its branching path-based gameplay)
2-3   Payload length

This is followed by more metadata:

Byte  Description
----  -----------
4-7   Time in SMPTE format

Video chunks for the Sega CD contain the following metadata:

Byte  Description
----  -----------
8     Data flags
9     Palette count (1-4)

The data flag bits in byte 8 are used as follows:

 MSB                 LSB
 d7 d6 d5 d4 d3 d2 d1 d0

 d7 = data contains a tile map
 d6 = unknown
 d5 = tile map is compressed
 d4 = unknown
 d3 = unknown
 d2 = unknown, usually set
 d1 = unknown
 d0 = unknown, usually set

There may be some connection between bit d0 and tilemaps, since tilemap behavior seems to change when the bit is not set.

Metadata in video chunks for the Sega CD 32X are as follows:

Byte  Description
----  -----------
8     Palette start offset (usually 1)
9     Palette update size (a value of 0 means use existing palette)

Then more metadata (both Sega CD and Sega CD 32X):

Byte  Description
----  -----------
10    tiles per column
11    tiles per row

In overlay chunk types D1 (and possibly D4), byte 10 indicates the number of tiles in the chunk, and byte 11 indicates the length of the layout data for the tiles. Since each tile takes up 32 bytes, the length of the tile data is equal to the value in byte 10 * 32. Tile data is immediately followed by layout codes and a palette map.

Certain types of video chunks contain additional metadata, which is detailed below in the sections related to the various chunk types.

Audio chunks have the following metadata:

Byte  Description
----  -----------
8-9   Sample rate
10    (Probably) Number of channels / bytes per sample (usually 1)
11    Unknown (usually 0)

Chunk Types

Some known chunk types are:

  • $81: encoded video (used in most Sega CD 32X games by Digital Pictures)
  • $A1: audio, sign/magnitude 8-bit PCM
  • $C1: uncompressed video (used in Night Trap SCD, Sewer Shark, Corpse Killer SCD, and others)
  • $C2: compressed video (used in Corpse Killer SCD, Slam City with Scottie Pippen, and others?)
  • $C4: compressed video (used in Slam City with Scottie Pippen)
  • $C6: compressed video (used in Night Trap SCD "DPLOGO.SGA", Sewer Shark, Make My Video C&C, and others)
  • $C7: compressed video (used in Prize Fighter and others)
  • $C8: compressed video (used in Sewer Shark, Make My Video C&C, and others)
  • $CB: compressed video (used in Prize Fighter, Double Switch "DPLOGO.SGA", Corpse Killer 32X)
  • $CD: compressed video (used in Double Switch)
  • $D1: uncompressed overlay video (used in Sewer Shark when killing a Ratigator(TM), etc)
  • $D4: compressed overlay video (used in Corpse Killer SCD/32X for the zombies)
  • $E7: (un)compressed video (used in the Make My Video series)
  • $E8: compressed video keyframe (used in Ground Zero Texas)
  • $E9: compressed video interframe (used in Ground Zero Texas)
  • $F0: container for other chunks (used in Slam City and other games)
  • $F1: container for other chunks (used in Slam City and other games)
  • $F2: metadata of unknown use

As of this writing, the formats and/or compression schemes of the following types are generally understood: 81, A1, C1, C6, C7, C8, CB, CD, D1, D4, E7, E8, E9, and F1.

Encoded Video $81

Video frames of this type are represented with a series of codes that indicate the size, color, and pixel layout of the graphics to be drawn, rather than, for example, LZSS encoded byte streams as in previous versions of SGA files.

The encoding scheme has been completely reverse-engineered, and a functional decoder has been written. A detailed description of the codes will be added at a later time.

Audio $A1

Audio sample rate can be determined in the following way (for NTSC systems only?):

SamplesPerSecond = ((Byte 8 << 8) + Byte 9) * SEGA_CD_PCM_INCREMENT

SEGA_CD_PCM_INCREMENT is ~15.8945723, and is calculated as SEGA_CD_PCM_FREQUENCY_MAX / 2048

SEGA_CD_PCM_FREQUENCY_MAX is ~32552.084, and is calculated as SEGA_CD_CPU_FREQUENCY / 384

SEGA_CD_CPU_FREQUENCY is 12500000, and is calculated as SEGA_CD_CRYSTAL_FREQUENCY / 4

SEGA_CD_CRYSTAL_FREQUENCY is 50000000

The sample rate can also be used to determine the frame rate of video by using the formula SamplesPerSecond / NumSamples, where NumSamples is the length of the audio data in the audio chunk.

Uncompressed Video $C1

For compatibility with the Genesis' video hardware, video frames in this format (and all its derivatives) are made up of linear 8x8 pixel tiles. Each pixel consists of a 4-bit (one nibble) palette index, thus each tile takes up 32 bytes. The length of the tile data can be calculated as tilesPerColumn * tilesPerRow * 32.

Palette data immediately follows the tile data. Each palette is 18 bytes long. Palettes are stored in an unusual format. As the genesis normally uses either RGB or BGR stored in nibbles (even though only the top 3 bits are used).

bitmap={1,2,4}

 For bit=0 to 2
    for color=0 to 15
        red[color]+=Top Most Bit of Data *bitmap[bit]
    next
 next

Repeat for green and blue.

Reading 2 bits for each tile, determines which of the 4 palettes to use.

 For Row=0 to RowMax
    For Col=0 to ColMax
        PalMap[Row*ColMax+Col]=Top 2 Bits of data
    Next
 Next

When drawing a tile you would select the palette based upon PalMap. Note that palette maps for 2 palette frames use only 1 bit per palette map entry vs 2 bits for 3 or 4 palettes.

If a frame has a tile map, the tile map immediately follows the palette data. Each entry in a raw tile map is 16 bits long, and so the length of the tilemap in bytes is tilesPerColumn * tilesPerRow * 2. Tile maps also contain palette maps, so frames with tile maps do not contain a separate palette map. Below is the structure of a tile map entry:

MSB    LSB
d15 ... d0

Bit        Purpose
---        -------
d15        unknown / reserved (usually set to 0)
d14~13     palette index
d12        flip on Y axis
d11        flip on X axis
d10        unknown / reserved (usually set to 0)
d9~0       tile index (base value of 1)

Tile maps can also be compressed, as seen in "DPLOGO.SGA" in Corpse Killer for the Sega CD. Compressed tile maps contain both regular 16-bit entries and compressed 8-bit entries. The can be distinguished using the topmost bit (d8). If the topmost bit of a given byte is unset, then the entry is not compressed. If the topmost byte is set, then the byte is laid out as follows:

Bit        Purpose
---        -------
d7         compression flag bit
d6~5       palette index
d4~0       unknown / reserved (usually set to 0)

The five lowermost bits of the byte are usually left unset. It's possible that the bits could be used to store information about flipping and repeat counts, etc. but such behavior has not been observed as of this writing.

Note that compressed tile map entries do not contain a tile index. The tile index can be determined by adding up the number of compressed tiles so far encountered in the tile map, as frames that use compressed tile maps store their tiles in order.

Compressed Video $C2, $C4

Not much is currently known about these types of chunk except that they contain four additional bytes of metadata, and in contrast to other chunk types, palette data immediately follows the metadata instead of following the tile data. The four additional bytes of metadata contain two 16-bit values that indicate the lengths of two blocks of data contained within the chunk. The first block of data contains LZSS-like compression codes used for decompressing the tile data in the second block. An additional block of compressed data (whose length can be derived from the length of the chunk minus the lengths of the first and second data blocks) may follow the second block depending on the total size of the compressed data.

Compressed Video $C6, $C7, $C8

Chunks in this format contain the same basic elements as chunks of type C1 (tiles, palette, palette map, tilemap, etc), but are compressed in LZSS format. The compressed data are comprised of several 34 byte LZSS blocks. Each block contains a one word (2 bytes) tag of compression flags followed by 16 words of data.

The tag is read in bits, starting with the most significant (left most) bit. For each bit set to 0, there is an uncompressed word literal. For each bit set to 1, the following word is a displacement/length reference in the following format:

LLLD DDDD DDDD DDDD

L = Amount of words to copy (amount of bytes to copy * 2)
D = Displacement

This may be calculated as:

for count = 0 to top 3 bits of LZ word * 2
   data[current + count] = data[current + count - last 13 bits of LZ word]
next

Note that the displacement, unlike the copy amount, is based on bytes, not words. Also, the displacement does not have to be word aligned.

After the entire tag is read, the next flag block is read, and the process continues. The sequence ends when an reference word's top three bits are all zeros.

In C7 format chunks, the top three bits represent the amount of words to copy minus one. Any decoder that implements C7 decoding must take this into account.

Most chunks in C6, and all chunks C8 and E7 formats require that adjacent pixels in even-numbered lines be swapped for frames to be correctly displayed. The exact reason for this is unclear, although since Sega CD video often contains a lot of checkerboard dithering, swapping pixels would eliminate the checkerboard patterns and lead to higher amounts of identical/redundant data, thus leading to more efficient compression. There is currently no known way to determine which C6 chunks require pixel swapping, however, it seems that pixel swapping only occurs in C6 frames that use fewer than 3 palettes and that do not use a tilemap.

Compressed Video $CB, $CD, $E7

Like other compressed chunk formats, chunks in this format consist of several 34 byte LZSS blocks. Each block contains a one word (2 bytes) tag of compression flags followed by 16 words of data.

For each bit set to 1, the following word is a displacement/length reference in the following format:

LLLL DDDD DDDD DDDD

L = Amount of words to copy minus one (amount of bytes to copy * 2)
D = Displacement

This may be calculated as:

for count = 0 to (top 4 bits of LZ word + 1) * 2
   data[current + count] = data[current + count - last 12 bits of LZ word]
next

Chunks of type E7 are used in the Make My Video series, and contain three subframes, which are stacked on top of each other to complete the whole frame. C7 chunks have an additional six bytes of metadata containing three 16-bit values. The topmost bit of each value indicates whether the data for that frame is raw (1) or compressed (0). The remaining 15 bits represent the length of the data for the subframe, with an apparent maximum value of 0x1500. The data for each subframe immediately follows the metadata in order, which is followed by 180 bytes of palette data in the usual format.

Uncompressed Overlay Video $D1

As far as is currently known, chunks of type D1 are only used in the game Sewer Shark. The purpose of these chunks is to allow the game to overlay tiles onto the current video stream in order to display enemy death animations in a way that is both space-efficient and also compatible with the streaming nature of FMV games.

Chunks of type D1 are basically the same as chunks of type C1, except for a few crucial differences. In chunks of this type, byte 10 indicates the number of tiles in the chunk, and byte 11 indicates the length the data for what are assumed to be codes that specify the layout of the tiles. Since each tile takes up 32 bytes, the length of the tile data is equal to the value in byte 10 * 32. Tile data is immediately followed by layout codes and a palette map. The chunks contain no palette data of their own, but instead inherit the palette data of the normal video frame they most immediately precede. All chunks in Sewer Shark use SMPTE time codes, so a decoder written to handle video overlay chunks can use these codes to ensure that video is being laid over the correct frame.

Layout codes consist of one or two bytes. The most commonly seen codes are $D1, $D000, $Cy0x, $8x, and $4y0x. Code D1 is basically a line break + carriage return. D000 starts a new line and places one tile at column 17 (i.e. the right edge). Cy0x starts a new line and places x+1 tiles starting from column y+1. Code 8x places x+1 tiles starting from column 0. Code 4y0x places x+1 tiles offset y+1 columns to the right of where the last tile was placed. Tiles are laid in the order in which they are stored in the chunk.

Another way to look at it is this: The topmost bit indicates a line break and carriage return. The next most significant bit indicates that the low nibble of the current byte indicates the zero-based number of columns from the left that the next tile should be offset, and that the following byte indicates the zero-based number of tiles to be placed. The fourth bit indicates that the next tile should be placed 16(+1) columns from the left edge. The third bit is unused, although it is possible that it could be used to indicate that the next tile should be placed 32(+1) columns from the edge.

Palette maps in chunks of this type use a single bit to indicate the palette of each tile, since in the case of Sewer Shark, most of the video seen during gameplay uses only two palettes, and so the palette map only needs one bit for each entry (0 = palette #1, and 1 = palette #2).

Compressed Overlay Video $D4

Chunks of this type are used in Corpse Killer (both versions) for storing animated sprites (flying zombies, etc). Files containing these chunks usually contain multiple animations. These chunks contain four extra bytes of metadata. The first byte indicates a horizontal pixel offset, and the next byte is a vertical pixel offset. The next two bytes are a 16-bit value that appears to be an ID for the animation. Due to the unusual way in which tiles are laid out, frames always have a width and height that is a multiple of two.

The first frame of each animation contains a single palette in the usual format, but instead of following the tile data (as in other chunk types such as C1), the palette immediately precedes it. Tile data is LZSS compressed in the same way as C6 chunks, except that the copy count is represented by the top four bits of the block header, and the offset is represented by the remaining bits + 1.

Tiles are stored in an unusual order. For example, the figure below shows the difference between the stored order and the layout order of the first two rows of tiles in a frame that is six tiles wide.

Original order
--------------
00 01 02 03 04 05
06 07 08 09 0A 0B

Display order
-------------
00 06 01 07 02 08
03 09 04 0A 05 0B

Tiles can be laid out using the following algorithm. Note that the backwards slash symbol ("\") means integer division, which is equivalent to the Floor function found in C and other programming languages.

For row = 0 To heightInTiles - 1
    For col = 0 To widthInTiles - 1
       pos.X = ((col * 2) Mod widthInTiles) + (row Mod 2)
       pos.Y = ((row \ 2) * 2) + (col \ (widthInTiles \ 2))
       rearrangedTileIndex(col, row) = originalTileIndex(pos.X, pos.Y)
    Next
Next

Compressed Video $E8, $E9

Frames of type E8 and E9 are found in Ground Zero Texas. Video in this game is double-buffered (i.e. there are two frames of video in VRAM at any given time) and runs at 12 frames per second. Each second of video footage in the game consists of a single E8 keyframe followed by eleven E9 interframes. Frames of type E9 can inherit some or all of their frame data (tile, palette, and palette map data) from the frame which most immediately precedes them. When uncompressed, these chunks behave the same as "normal" chunks of type C1.

Frames of type E8 use intraframe compression. Tiles are stored as a series of variable-length chunks. At the beginning of each chunk is a single byte code word. Code word $00 indicates that the tile is raw (i.e. the next 32 bytes represent raw tile data). Code words $01 through $03 indicate compressed tiles. These code words are followed by 32 bits of compression flags, followed by literal data bytes (if any). A bit value of 0 in a compression flag indicates that a literal byte should be copied from data that follows the code word and compression flags. A bit value of 1 indicates that a byte should be copied from already decompressed data at the offset indicated by the code word.

Frames of type E9 use both intraframe and interframe compression with motion compensation. In addition to code words $00 through $03, they use nine additional code words: $05 to $0D. These code words indicate that data should be copied from the immediately preceding frame starting at a specified offset from the end of the data currently being decompressed.

Below is a summary of all the compression code words found in chunks of type E8 and E9:

Code   Frame      Offset
----   -----      ------
$01    current    -1
$02    current    -4
$03    current    -8
$05    previous   0
$06    previous   -1
$07    previous   +1
$08    previous   -4
$09    previous   +4
$0A    previous   -8
$0B    previous   +8
$0C    previous   -32
$0D    previous   +32

There is also a code word $FF, which is generally found in two places: when the length of the data currently being decompressed reaches 10240 bytes, and immediately before the last 32 bytes of data (which are uncompressed). If an $FF is found at an odd offset, then the following byte should be skipped (for proper word alignment). If an $FF is found on an even offset, then it is not necessary to skip the following byte. It is unclear why this code word appears at 10240 bytes, although it may have something to do with RAM addressing, etc.

Games Using SGA

  • Night Trap
  • Sewer Shark
  • Power Factory Featuring C&C Music Factory
  • Make My Video series
  • Ground Zero: Texas
  • Prize Fighter
  • Double Switch
  • Slam City with Scottie Pippen
  • Corpse Killer
  • Supreme Warrior

Decoders/Converters

  • A decoder called SCAT (SGA Conversion and Analysis Tool) has been written that can currently decode SGA files from Night Trap (SCD and 32X), Sewer Shark (including stream selection), Corpse Killer (SCD and 32X), Prize Fighter, Double Switch, as well as various other games, and convert them to AVI format files. It is available for download at SourceForge ([1]).