Actimagine Video Codec

From MultimediaWiki
Jump to navigation Jump to search

Codec family created by Actimagine, aimed at use in various CE devices - mobile phones, handheld gadgets, etc. Part of the official Nintendo Gameboy Advance and Nintendo DS SDK.

Videos of this format can be extracted with the ndstool application, a frontend for which is available at http://l33t.spod.org/ratx/DS/dslazy/ , the video files seem to use the extension .vx and starts with a signature of VXDS in the first 4 bytes.

File format

Some of the information here was taken from https://github.com/xoreos/xoreos/blob/master/src/video/actimagine.cpp

Header (all 32-bit little endian words):

 * magic "VXDS"
 * number of frames
 * width
 * height
 * unknown
 * frame rate
 * audio sample rate
 * number of audio streams
 * max video frame size
 * audio extradata offset
 * video stream information offset
 * number of video streams

The rest of file consists of video+audio data packed together, audio extradata (3124 bytes long) and video stream information (two 32-bit words with some unknown meaning and video data start position).

Each frame starts with 16-bit size and 16-bit number of audio frames (since single audio frames is 10-20 bytes long there may be several of them packed together). Audio data is stored immediately after video data aligned to 16 bits so in order to decode audio you need to decode video frame first.

Video codec description =

This codec is based on ITU H.264 and uses Elias Gamma' codes in most places. Data is coded in 16x16 macroblocks. Each macroblock starts with gamma' code telling its coding mode:

  0 - partition into 8x16 subblocks
  1 - copy 16x16 block
  2 - partition into 16x8 subblocks
  3 - delta 16x16 with MV
  4 - copy 16x16 block with MV from reference 1
  5 - copy 16x16 block with MV from reference 2
  6 - copy 16x16 block with MV from reference 3
  7 - delta 16x16
  8 - partition into 8x16 subblocks, add residue afterwards
  9 - copy 16x16 block from reference 2
 10 - delta 16x16 with MV, add residue afterwards
 11 - add full-block intra prediction
 12 - copy 16x16 block from reference 1, add residue afterwards
 13 - partition into 16x8 subblocks, add residue afterwards
 14 - copy 16x16 block from reference 3
 15 - intra prediction in 4x4 subblocks
 16 - copy 16x16 block with MV from reference 1, add residue afterwards
 17 - copy 16x16 block with MV from reference 2, add residue afterwards
 18 - copy 16x16 block with MV from reference 3, add residue afterwards
 19 - intra prediction in 4x4 subblocks, add residue afterwards
 20 - copy 16x16 block from reference 2, add residue afterwards
 21 - copy 16x16 block from reference 3, add residue afterwards
 22 - add full-block intra prediction and then residue
 23 - delta 16x16, add residue afterwards

"with MV" means that first you read two signed gamma' codes for motion vector delta.

"delta 16x16" means that data is copied with an offset added to each component value (i.e. you read three signed gamma' values, copy block from elsewhere and add delta value to each copied pixel).

Full-block intra prediction reads two methods (for luma and chroma), luma prediction methods are top, left, DC and plane?; chroma prediction methods are DC, left, top and plane?

4x4 block intra prediction seem to predict first luma mode from neighbours, the following ones use 3-bit field (plus a bit read to skip the actual prediction in each case) and has 9 methods now (top, left, DC, various angles), chroma is the same as in the previous case.

Residue is coded as groups of 8x8 semi-macroblocks made of 4x4 blocks. First there's CBP remapped from the gamma' code, top bit is for chroma blocks. Then there are coefficients for 4x4 blocks.

CBP table:

 0x00, 0x08, 0x04, 0x02, 0x01, 0x1F, 0x0F, 0x0A,
 0x05, 0x0C, 0x03, 0x10, 0x0E, 0x0D, 0x0B, 0x07,
 0x09, 0x06, 0x1E, 0x1B, 0x1A, 0x1D, 0x17, 0x15,
 0x18, 0x12, 0x11, 0x1C, 0x14, 0x13, 0x16, 0x19

Coefficients are coded in 4x4 block. First there's a context-dependent static Huffman codebook for coding mode that tells how many non-zero coefficients are there and how many of them form a tail of plus-minus ones. Then there's a context dependent code for telling how many zeroes are at the end of the block. For the tail of ones only signs are coded, the rest of coefficients are coded as (gamma() << level) | get_bits(level) and then a sign where level is initially zero and is increased by one when the absolute coefficient value is greater than the limit for this level. Between coefficients there's a zero-run coded with context dependent code.

Level limits:

 2, 5, 11, 23, 47, 32768

Audio codec description

Audio seems to be some flavour of ADPCM.