CNM
- Company: Arxel Tribe
- Extension: cnm, ci2
- Samples: http://samples.mplayerhq.hu/game-formats/ring-cnm/
CNM is a multimedia format used in the computer game Ring: The Legend of the Nibelungen. The CI2 is the next iteration of CNM with slightly different compression that is used in Faust: The Seven Games of the Soul.
Container format
Container has the following structure:
- magic
CNM UNR\0
- header
- frame offsets table (video and audio interleaved, audio offsets are zero when audio is not present)
- frames
Header format (all values are little-endian):
4 bytes - number of frames 4 bytes - unknown 1 byte - unknown 4 bytes - image width 4 bytes - image height 2 bytes - unknown 1 byte - number of audio tracks 4 bytes - number of video frames? 4 bytes - number of frames repeated? 4 bytes - size of offsets table (v1 only) 152 bytes - always zero? when audio is present for each track: 1 byte - number of channels 1 bytes - bits per sample 4 bytes - audio rate 10 bytes - unused?
Each frame is prefixed by a byte containing its type. Known frame types:
- 0x41 - audio data
- 0x42 - audio data
- 0x53 - image
- 0x54 - tile data
- 0x55 - image (v2)
- 0x5A - audio data
Audio data is PCM prefixed by 32-bit data size, video frames are reviewed below.
Video compression for version 1
Each frame is an independently compressed image (in bottoms-up format) split into tiles. Frame header:
4 bytes - payload size (not counting the header) 4 bytes - offset to the colour data 2 bytes - number of tiles 2 bytes - tile data size 4 bytes - width 4 bytes - height 4 bytes - unknown 4 bytes - unknown 3 bytes - unused?
Colour data may contain either raw tile pixels (32-bit BGR0) or it may be packed. In that case tile data size is set to 4 or 2 and deltas stored right after it. Overall tile restoration algorithm is the following:
copy 16 bytes (4x1 tile) from the stream for (tile = 1; tile < num_tiles; tile++) { tile_data[tile] = tile_data[tile - 1]; bits = get_bits(3) + 1; //the same bit reading as below, bits=8 should not happen for (i = 0; i < 16; i++) { delta = get_bits(bits); if (delta && get_bit()) delta = -delta; tile_data[tile][i] += delta; } }
Tile control data is compressed using variable amount of bits, bits are stored MSB first. Tile index is read depending on the number of tiles: if it can fit into 10 bits then it's ten bits, if it can fit into 11 bits then it's 11 bits, otherwise it's 12 bits.
Single tile decoding flow:
if (!getbit()) { offset = get_bits(tile_index_bits); copy tile data from the colour data using offset*16 } else { // copy existing tile decode motion vector, copy tile to which it points to (e.g. -1,0 means previous tile and 0,-1 means top tile) }
Motion vector codebook:
1 - 0,-1 0100 - -1, 0 0101 - -1,-1 0110 - 1,-1 0111 - 0,-2 000000 - -2,-3 000001 - 2,-3 000010 - -1,-4 000011 - 1,-4 000100 - -1,-2 000101 - 1,-2 000110 - 0,-3 000111 - 0,-4 001000 - -2, 0 001001 - -2,-1 001010 - 1,-1 001011 - -2,-2 001100 - 2,-2 001101 - -1,-3 001110 - 1,-3 001111 - 0,-5
Actual image may be interlaced, i.e. only half of the lines are decoded.
Video compression for version 2
In this version tile data is usually stored separately, in chunk type 0x54. Also bitstream format has changed to LSB first little-endian.
Tile format
Chunk type 0x54
starts with the usual header: 32-bit data size, 16-bit number of tiles and 16-bit tile size. Tile data is packed almost but not exactly like in version 1:
read raw data for tile 0 for each tile { copy previous tile data for each component of tile { // i.e. all Rs, Gs, Bs and As bits = get_bits(3); if (bits < 7) { for (i = 0; i < tile_size; i++) { delta = get_bits(bits); // get_bits(0)=0 if (delta && get_bit(1)) delta = -delta; tile[component][i] += delta; } } else { for (i = 0; i < tile_size; i++) { tile[component][i] = get_bits(8); } } } }
Frame format
Frame is now packed using a lot of various LRUs and first tile indices are restored and afterwards they are replaced with actual tile data. Frame data is coded in groups of 8 tiles using a bit prefix: 1 - copy 8 tile indices from the previous line, 0 - switch to individual tile index decoding. Individual tile indices are coded in several ways (depending on code):
1
-- copy index from the top line000
-- getceil(log2(tile_size))
bits for a new tile index, add it to LRU list (see below)100
-- get 4-bit delta value, a sign bit, add that to top index value, output and add it to LRU list010
-- form a list of 0-4 context-dependent values (see below), select one using 0-2 bits, output and add it to LRU list110
-- get 4-bit index, output value retrieved from LRU list using that index
LRU list
Decoder keeps context-dependent (i.e. one list for each possible tile index) cyclic list of last 15 values. The actual buffer is selected using the top tile index (so it is not in use for the first line). Initially it contains all zeroes.
Context-dependent list
For one of the modes such list is formed and then used as the pixel source:
// list forming list = (empty); top = y > 0 ? top tile index : NONE; for left, top-left, top-right and top-top positions { idx = tile index at the search position if (!contains(list, idx) && (top == NONE || top != idx)) { push(list, idx) } } //decoding if (length(list) < 2) { new_idx = list[0]; // it should be empty } else if (length(list) == 2) { new_idx = list[get_bit()]; } else { new_idx = list[get_bits(2)]; }