CNM
- Company: Arxel Tribe
- Extension: cnm, ci2
- Samples: http://samples.mplayerhq.hu/game-formats/ring-cnm/
CNM is a multimedia format used in the computer game Ring: The Legend of the Nibelungen. The CI2 is the next iteration of CNM with slightly different compression that is used in Faust: The Seven Games of the Soul.
Container format
Container has the following structure:
- magic
CNM UNR\0
- header
- frame offsets table (video and audio interleaved, audio offsets are zero when audio is not present; completely zero in CI2)
- frames
Header format (all values are little-endian):
4 bytes - number of frames 4 bytes - unknown 1 byte - unknown 4 bytes - image width 4 bytes - image height 2 bytes - unknown 1 byte - number of audio tracks 4 bytes - number of video frames? 4 bytes - number of frames repeated? 4 bytes - size of offsets table (v1 only) 152 bytes - always zero? when audio is present for each track: 1 byte - number of channels 1 bytes - bits per sample 4 bytes - audio rate 10 bytes - unused?
Each frame is prefixed by a byte containing its type. Known frame types:
- 0x41 - audio data
- 0x42 - audio data
- 0x53 - image
- 0x54 - tile data
- 0x55 - image (v2)
- 0x5A - audio data
Audio data is PCM prefixed by 32-bit data size, video frames are reviewed below.
Video compression for version 1
Each frame is an independently compressed image (in bottoms-up format) split into tiles. Frame header:
4 bytes - payload size (not counting the header) 4 bytes - offset to the colour data 2 bytes - number of tiles 2 bytes - tile data size 4 bytes - width 4 bytes - height 4 bytes - unknown 4 bytes - unknown 3 bytes - unused?
Colour data may contain either raw tile pixels (32-bit BGR0) or it may be packed. In that case tile data size is set to 4 or 2 and deltas stored right after it. Overall tile restoration algorithm is the following:
copy 16 bytes (4x1 tile) from the stream for (tile = 1; tile < num_tiles; tile++) { tile_data[tile] = tile_data[tile - 1]; bits = get_bits(3) + 1; //the same bit reading as below, bits=8 should not happen for (i = 0; i < 16; i++) { delta = get_bits(bits); if (delta && get_bit()) delta = -delta; tile_data[tile][i] += delta; } }
Tile control data is compressed using variable amount of bits, bits are stored MSB first. Tile index is read depending on the number of tiles: if it can fit into 10 bits then it's ten bits, if it can fit into 11 bits then it's 11 bits, otherwise it's 12 bits.
Single tile decoding flow:
if (!getbit()) { offset = get_bits(tile_index_bits); copy tile data from the colour data using offset*16 } else { // copy existing tile decode motion vector, copy tile to which it points to (e.g. -1,0 means previous tile and 0,-1 means top tile) }
Motion vector codebook:
1 - 0,-1 0100 - -1, 0 0101 - -1,-1 0110 - 1,-1 0111 - 0,-2 000000 - -2,-3 000001 - 2,-3 000010 - -1,-4 000011 - 1,-4 000100 - -1,-2 000101 - 1,-2 000110 - 0,-3 000111 - 0,-4 001000 - -2, 0 001001 - -2,-1 001010 - 2,-1 001011 - -2,-2 001100 - 2,-2 001101 - -1,-3 001110 - 1,-3 001111 - 0,-5
Actual image may be interlaced, i.e. only half of the lines are decoded.
Video compression for version 2
In this version frames are coded in small groups (usually by four) with the common tile data (chunk 0x54
) preceding keyframe (chunk 0x55
) and inter frames (chunk 0x53
).
Also note that in this version bitstream format is little-endian LSB first.
Tile format
Chunk type 0x54
starts with the usual header: 32-bit data size, 16-bit number of tiles and 16-bit tile size. Tile data is packed almost but not exactly like in version 1:
read raw data for tile 0 for each tile { copy previous tile data for each component of tile { // i.e. all Rs, Gs, Bs and As bits = get_bits(3); if (bits < 7) { for (i = 0; i < tile_size; i++) { delta = get_bits(bits); // get_bits(0)=0 if (delta && get_bit(1)) delta = -delta; tile[component][i] += delta; } } else { for (i = 0; i < tile_size; i++) { tile[component][i] = get_bits(8); } } } }
Frame format
Frame is now packed using various methods of prediction operating on tile indices. In inter frame tile index 0 means unchanged area.
Frame data is split into regions of eight tiles, for each a bit is transmitted. Bit 1 means the whole region should be copied from above, bit 0 means that each individual tile index needs to be treated separately.
Individual tile indices have the following mode codewords:
1
-- copy index from the top line000
-- getceil(log2(tile_size))
bits for a new tile index, add it to context list (see below)100
-- get 4-bit delta value, a sign bit, add/subtractdelta+1
to/from top index value, output and add it to the context list010
-- form a list of 1-4 unique neighbour values (see below), select one using 0-2 bits, output and add it to the context list110
-- get 4-bit index in the corresponding context list and output it (without updating the list)
Context list
Decoder keeps context-dependent (i.e. one list for each possible tile index) cyclic list of last 16 values that had it as a top neighbour value. Initially it contains all zeroes.
For all but one single-index operations the list should be updated:
if (y > 0) { // not the first line top_idx = frame[cur_pos - stride]; contexts[top_idx].list[contexts[top_idx].pos] = cur_idx; contexts[top_idx].pos = (contexts[top_idx].pos + 1) & 15; }
Context-dependent list
For one of the modes such list is formed and then used as the pixel source:
// list forming list = (empty); top = y > 0 ? top tile index : NONE; for left, top-left, top-right and top-top positions { idx = tile index at the search position if (!contains(list, idx) && (top == NONE || top != idx)) { push(list, idx) } } //decoding if (length(list) < 2) { new_idx = list[0]; // it should not be empty } else if (length(list) == 2) { new_idx = list[get_bit()]; } else { new_idx = list[get_bits(2)]; }