CNM

From MultimediaWiki
Jump to navigation Jump to search

CNM is a multimedia format used in the computer game Ring: The Legend of the Nibelungen. The CI2 is the next iteration of CNM with slightly different compression that is used in Faust: The Seven Games of the Soul.

Container format

Container has the following structure:

  • magic CNM UNR\0
  • header
  • frame offsets table (video and audio interleaved, audio offsets are zero when audio is not present; completely zero in CI2)
  • frames

Header format (all values are little-endian):

 4 bytes - number of frames
 4 bytes - unknown
 1 byte  - unknown
 4 bytes - image width
 4 bytes - image height
 2 bytes - unknown
 1 byte  - number of audio tracks
 4 bytes - number of video frames?
 4 bytes - number of frames repeated?
 4 bytes - size of offsets table (v1 only)
 152 bytes - always zero?
 when audio is present for each track:
   1 byte  - number of channels
   1 bytes - bits per sample
   4 bytes - audio rate
   10 bytes - unused?

Each frame is prefixed by a byte containing its type. Known frame types:

  • 0x41 - audio data
  • 0x42 - audio data
  • 0x53 - image
  • 0x54 - tile data
  • 0x55 - image (v2)
  • 0x5A - audio data

Audio data is PCM prefixed by 32-bit data size, video frames are reviewed below.

Video compression for version 1

Each frame is an independently compressed image (in bottoms-up format) split into tiles. Frame header:

 4 bytes - payload size (not counting the header)
 4 bytes - offset to the colour data
 2 bytes - number of tiles
 2 bytes - tile data size
 4 bytes - width
 4 bytes - height
 4 bytes - unknown
 4 bytes - unknown
 3 bytes - unused?

Colour data may contain either raw tile pixels (32-bit BGR0) or it may be packed. In that case tile data size is set to 4 or 2 and deltas stored right after it. Overall tile restoration algorithm is the following:

 copy 16 bytes (4x1 tile) from the stream
 for (tile = 1; tile < num_tiles; tile++) {
   tile_data[tile] = tile_data[tile - 1];
   bits = get_bits(3) + 1; //the same bit reading as below, bits=8 should not happen
   for (i = 0; i < 16; i++) {
     delta = get_bits(bits);
     if (delta && get_bit())
       delta = -delta;
     tile_data[tile][i] += delta;
   }
 }


Tile control data is compressed using variable amount of bits, bits are stored MSB first. Tile index is read depending on the number of tiles: if it can fit into 10 bits then it's ten bits, if it can fit into 11 bits then it's 11 bits, otherwise it's 12 bits.

Single tile decoding flow:

 if (!getbit()) {
   offset = get_bits(tile_index_bits);
   copy tile data from the colour data using offset*16
 } else { // copy existing tile
   decode motion vector, copy tile to which it points to
   (e.g. -1,0 means previous tile and 0,-1 means top tile)
 }

Motion vector codebook:

 1      -  0,-1
 0100   - -1, 0
 0101   - -1,-1
 0110   -  1,-1
 0111   -  0,-2
 000000 - -2,-3
 000001 -  2,-3
 000010 - -1,-4
 000011 -  1,-4
 000100 - -1,-2
 000101 -  1,-2
 000110 -  0,-3
 000111 -  0,-4
 001000 - -2, 0
 001001 - -2,-1
 001010 -  2,-1
 001011 - -2,-2
 001100 -  2,-2
 001101 - -1,-3
 001110 -  1,-3
 001111 -  0,-5

Actual image may be interlaced, i.e. only half of the lines are decoded.

Video compression for version 2

In this version frames are coded in small groups (usually by four) with the common tile data (chunk 0x54) preceding keyframe (chunk 0x55) and inter frames (chunk 0x53).

Also note that in this version bitstream format is little-endian LSB first.

Tile format

Chunk type 0x54 starts with the usual header: 32-bit data size, 16-bit number of tiles and 16-bit tile size. Tile data is packed almost but not exactly like in version 1:

 read raw data for tile 0
 for each tile {
   copy previous tile data
   for each component of tile { // i.e. all Rs, Gs, Bs and As
     bits = get_bits(3);
     if (bits < 7) {
       for (i = 0; i < tile_size; i++) {
         delta = get_bits(bits); // get_bits(0)=0
         if (delta && get_bit(1))
           delta = -delta;
         tile[component][i] += delta;
       }
     } else {
       for (i = 0; i < tile_size; i++) {
         tile[component][i] = get_bits(8);
       }
     }
   }
 }

Frame format

Frame is now packed using various methods of prediction operating on tile indices. In inter frame tile index 0 means unchanged area.

Frame data is split into regions of eight tiles, for each a bit is transmitted. Bit 1 means the whole region should be copied from above, bit 0 means that each individual tile index needs to be treated separately.

Individual tile indices have the following mode codewords:

  •   1 -- copy index from the top line
  • 000 -- get ceil(log2(tile_size)) bits for a new tile index, add it to context list (see below)
  • 100 -- get 4-bit delta value, a sign bit, add/subtract delta+1 to/from top index value, output and add it to the context list
  • 010 -- form a list of 1-4 unique neighbour values (see below), select one using 0-2 bits, output and add it to the context list
  • 110 -- get 4-bit index in the corresponding context list and output it (without updating the list)

Context list

Decoder keeps context-dependent (i.e. one list for each possible tile index) cyclic list of last 16 values that had it as a top neighbour value. Initially it contains all zeroes.

For all but one single-index operations the list should be updated:

 if (y > 0) { // not the first line
   top_idx = frame[cur_pos - stride];
   contexts[top_idx].list[contexts[top_idx].pos] = cur_idx;
   contexts[top_idx].pos = (contexts[top_idx].pos + 1) & 15;
 }

Context-dependent list

For one of the modes such list is formed and then used as the pixel source:

 // list forming
 list = (empty);
 top = y > 0 ? top tile index : NONE;
 for left, top-left, top-right and top-top positions {
   idx = tile index at the search position
   if (!contains(list, idx) && (top == NONE || top != idx)) {
     push(list, idx)
   }
 }
 //decoding
 if (length(list) < 2) {
   new_idx = list[0]; // it should not be empty
 } else if (length(list) == 2) {
   new_idx = list[get_bit()];
 } else {
   new_idx = list[get_bits(2)];
 }