CNM

From MultimediaWiki
Revision as of 06:32, 13 December 2022 by Kostya (talk | contribs) (mention CI2 game)
Jump to navigation Jump to search

CNM is a multimedia format used in the computer game Ring: The Legend of the Nibelungen. The CI2 is the next iteration of CNM with slightly different compression that is used in Faust: The Seven Games of the Soul.

Container format

Container has the following structure:

  • magic CNM UNR\0
  • header
  • frame offsets table (video and audio interleaved, audio offsets are zero when audio is not present)
  • frames

Header format (all values are little-endian):

 4 bytes - number of frames
 4 bytes - unknown
 1 byte  - unknown
 4 bytes - image width
 4 bytes - image height
 2 bytes - unknown
 1 byte  - number of audio tracks
 4 bytes - number of video frames?
 4 bytes - number of frames repeated?
 4 bytes - size of offsets table (v1 only)
 152 bytes - always zero?
 when audio is present for each track:
   1 byte  - number of channels
   1 bytes - bits per sample
   4 bytes - audio rate
   10 bytes - unused?

Each frame is prefixed by a byte containing its type. Known frame types:

  • 0x41 - audio data
  • 0x42 - audio data
  • 0x53 - image
  • 0x54 - tile data
  • 0x55 - image (v2)
  • 0x5A - audio data

Audio data is PCM prefixed by 32-bit data size, video frames are reviewed below.

Video compression for version 1

Each frame is an independently compressed image (in bottoms-up format) split into tiles. Frame header:

 4 bytes - payload size (not counting the header)
 4 bytes - offset to the colour data
 2 bytes - number of tiles
 2 bytes - tile data size
 4 bytes - width
 4 bytes - height
 4 bytes - unknown
 4 bytes - unknown
 3 bytes - unused?

Colour data may contain either raw tile pixels (32-bit BGR0) or it may be packed. In that case tile data size is set to 4 or 2 and deltas stored right after it. Overall tile restoration algorithm is the following:

 copy 16 bytes (4x1 tile) from the stream
 for (tile = 1; tile < num_tiles; tile++) {
   tile_data[tile] = tile_data[tile - 1];
   bits = get_bits(3) + 1; //the same bit reading as below, bits=8 should not happen
   for (i = 0; i < 16; i++) {
     delta = get_bits(bits);
     if (delta && get_bit())
       delta = -delta;
     tile_data[tile][i] += delta;
   }
 }


Tile control data is compressed using variable amount of bits, bits are stored MSB first. Tile index is read depending on the number of tiles: if it can fit into 10 bits then it's ten bits, if it can fit into 11 bits then it's 11 bits, otherwise it's 12 bits.

Single tile decoding flow:

 if (!getbit()) {
   offset = get_bits(tile_index_bits);
   copy tile data from the colour data using offset*16
 } else { // copy existing tile
   decode motion vector, copy tile to which it points to
   (e.g. -1,0 means previous tile and 0,-1 means top tile)
 }

Motion vector codebook:

 1      -  0,-1
 0100   - -1, 0
 0101   - -1,-1
 0110   -  1,-1
 0111   -  0,-2
 000000 - -2,-3
 000001 -  2,-3
 000010 - -1,-4
 000011 -  1,-4
 000100 - -1,-2
 000101 -  1,-2
 000110 -  0,-3
 000111 -  0,-4
 001000 - -2, 0
 001001 - -2,-1
 001010 -  1,-1
 001011 - -2,-2
 001100 -  2,-2
 001101 - -1,-3
 001110 -  1,-3
 001111 -  0,-5

Actual image may be interlaced, i.e. only half of the lines are decoded.

Video compression for version 2

In this version tile data is usually stored separately, in chunk type 0x54. Also bitstream format has changed to LSB first little-endian.

Tile format

Chunk type 0x54 starts with the usual header: 32-bit data size, 16-bit number of tiles and 16-bit tile size. Tile data is packed almost but not exactly like in version 1:

 read raw data for tile 0
 for each tile {
   copy previous tile data
   for each component of tile { // i.e. all Rs, Gs, Bs and As
     bits = get_bits(3);
     if (bits < 7) {
       for (i = 0; i < tile_size; i++) {
         delta = get_bits(bits); // get_bits(0)=0
         if (delta && get_bit(1))
           delta = -delta;
         tile[component][i] += delta;
       }
     } else {
       for (i = 0; i < tile_size; i++) {
         tile[component][i] = get_bits(8);
       }
     }
   }
 }

Frame format

Frame is now packed using a lot of various LRUs and first tile indices are restored and afterwards they are replaced with actual tile data. Frame data is coded in groups of 8 tiles using a bit prefix: 1 - copy 8 tile indices from the previous line, 0 - switch to individual tile index decoding. Individual tile indices are coded in several ways (depending on code):

  •   1 -- copy index from the top line
  • 000 -- get ceil(log2(tile_size)) bits for a new tile index, add it to LRU list (see below)
  • 100 -- get 4-bit delta value, a sign bit, add that to top index value, output and add it to LRU list
  • 010 -- form a list of 0-4 context-dependent values (see below), select one using 0-2 bits, output and add it to LRU list
  • 110 -- get 4-bit index, output value retrieved from LRU list using that index

LRU list

Decoder keeps context-dependent (i.e. one list for each possible tile index) cyclic list of last 15 values. The actual buffer is selected using the top tile index (so it is not in use for the first line). Initially it contains all zeroes.

Context-dependent list

For one of the modes such list is formed and then used as the pixel source:

 // list forming
 list = (empty);
 top = y > 0 ? top tile index : NONE;
 for left, top-left, top-right and top-top positions {
   idx = tile index at the search position
   if (!contains(list, idx) && (top == NONE || top != idx)) {
     push(list, idx)
   }
 }
 //decoding
 if (length(list) < 2) {
   new_idx = list[0]; // it should be empty
 } else if (length(list) == 2) {
   new_idx = list[get_bit()];
 } else {
   new_idx = list[get_bits(2)];
 }