CNM: Difference between revisions

Latest revision as of 09:00, 9 November 2023

Company: Arxel Tribe
Extension: cnm, ci2
Samples: http://samples.mplayerhq.hu/game-formats/ring-cnm/

CNM is a multimedia format used in the computer game Ring: The Legend of the Nibelungen. The CI2 is the next iteration of CNM with slightly different compression that is used in Faust: The Seven Games of the Soul.

Container format

Container has the following structure:

magic CNM UNR\0
header
frame offsets table (video and audio interleaved, audio offsets are zero when audio is not present; completely zero in CI2)
frames

Header format (all values are little-endian):

 4 bytes - number of frames
 4 bytes - unknown
 1 byte  - unknown
 4 bytes - image width
 4 bytes - image height
 2 bytes - unknown
 1 byte  - number of audio tracks
 4 bytes - number of video frames?
 4 bytes - number of frames repeated?
 4 bytes - size of offsets table (v1 only)
 152 bytes - always zero?
 when audio is present for each track:
   1 byte  - number of channels
   1 bytes - bits per sample
   4 bytes - audio rate
   10 bytes - unused?

Each frame is prefixed by a byte containing its type. Known frame types:

0x41 - audio data
0x42 - audio data
0x53 - image
0x54 - tile data
0x55 - image (v2)
0x5A - audio data

Audio data is PCM prefixed by 32-bit data size, video frames are reviewed below.

Video compression for version 1

Each frame is an independently compressed image (in bottoms-up format) split into tiles. Frame header:

 4 bytes - payload size (not counting the header)
 4 bytes - offset to the colour data
 2 bytes - number of tiles
 2 bytes - tile data size
 4 bytes - width
 4 bytes - height
 4 bytes - unknown
 4 bytes - unknown
 3 bytes - unused?

Colour data may contain either raw tile pixels (32-bit BGR0) or it may be packed. In that case tile data size is set to 4 or 2 and deltas stored right after it. Overall tile restoration algorithm is the following:

 copy 16 bytes (4x1 tile) from the stream
 for (tile = 1; tile < num_tiles; tile++) {
   tile_data[tile] = tile_data[tile - 1];
   bits = get_bits(3) + 1; //the same bit reading as below, bits=8 should not happen
   for (i = 0; i < 16; i++) {
     delta = get_bits(bits);
     if (delta && get_bit())
       delta = -delta;
     tile_data[tile][i] += delta;
   }
 }

Tile control data is compressed using variable amount of bits, bits are stored MSB first. Tile index is read depending on the number of tiles: if it can fit into 10 bits then it's ten bits, if it can fit into 11 bits then it's 11 bits, otherwise it's 12 bits.

Single tile decoding flow:

 if (!getbit()) {
   offset = get_bits(tile_index_bits);
   copy tile data from the colour data using offset*16
 } else { // copy existing tile
   decode motion vector, copy tile to which it points to
   (e.g. -1,0 means previous tile and 0,-1 means top tile)
 }

Motion vector codebook:

 1      -  0,-1
 0100   - -1, 0
 0101   - -1,-1
 0110   -  1,-1
 0111   -  0,-2
 000000 - -2,-3
 000001 -  2,-3
 000010 - -1,-4
 000011 -  1,-4
 000100 - -1,-2
 000101 -  1,-2
 000110 -  0,-3
 000111 -  0,-4
 001000 - -2, 0
 001001 - -2,-1
 001010 -  2,-1
 001011 - -2,-2
 001100 -  2,-2
 001101 - -1,-3
 001110 -  1,-3
 001111 -  0,-5

Actual image may be interlaced, i.e. only half of the lines are decoded.

Video compression for version 2

In this version frames are coded in small groups (usually by four) with the common tile data (chunk 0x54) preceding keyframe (chunk 0x55) and inter frames (chunk 0x53).

Also note that in this version bitstream format is little-endian LSB first.

Tile format

Chunk type 0x54 starts with the usual header: 32-bit data size, 16-bit number of tiles and 16-bit tile size. Tile data is packed almost but not exactly like in version 1:

 read raw data for tile 0
 for each tile {
   copy previous tile data
   for each component of tile { // i.e. all Rs, Gs, Bs and As
     bits = get_bits(3);
     if (bits < 7) {
       for (i = 0; i < tile_size; i++) {
         delta = get_bits(bits); // get_bits(0)=0
         if (delta && get_bit(1))
           delta = -delta;
         tile[component][i] += delta;
       }
     } else {
       for (i = 0; i < tile_size; i++) {
         tile[component][i] = get_bits(8);
       }
     }
   }
 }

Frame format

Frame is now packed using various methods of prediction operating on tile indices. In inter frame tile index 0 means unchanged area.

Frame data is split into regions of eight tiles, for each a bit is transmitted. Bit 1 means the whole region should be copied from above, bit 0 means that each individual tile index needs to be treated separately.

Individual tile indices have the following mode codewords:

1 -- copy index from the top line
000 -- get ceil(log2(tile_size)) bits for a new tile index, add it to context list (see below)
100 -- get 4-bit delta value, a sign bit, add/subtract delta+1 to/from top index value, output and add it to the context list
010 -- form a list of 1-4 unique neighbour values (see below), select one using 0-2 bits, output and add it to the context list
110 -- get 4-bit index in the corresponding context list and output it (without updating the list)

Context list

Decoder keeps context-dependent (i.e. one list for each possible tile index) cyclic list of last 16 values that had it as a top neighbour value. Initially it contains all zeroes.

For all but one single-index operations the list should be updated:

 if (y > 0) { // not the first line
   top_idx = frame[cur_pos - stride];
   contexts[top_idx].list[contexts[top_idx].pos] = cur_idx;
   contexts[top_idx].pos = (contexts[top_idx].pos + 1) & 15;
 }

Context-dependent list

For one of the modes such list is formed and then used as the pixel source:

 // list forming
 list = (empty);
 top = y > 0 ? top tile index : NONE;
 for left, top-left, top-right and top-top positions {
   idx = tile index at the search position
   if (!contains(list, idx) && (top == NONE || top != idx)) {
     push(list, idx)
   }
 }
 //decoding
 if (length(list) < 2) {
   new_idx = list[0]; // it should not be empty
 } else if (length(list) == 2) {
   new_idx = list[get_bit()];
 } else {
   new_idx = list[get_bits(2)];
 }

@@ Line 3: / Line 3: @@
 * Samples: [http://samples.mplayerhq.hu/game-formats/ring-cnm/ http://samples.mplayerhq.hu/game-formats/ring-cnm/]
-CNM is a multimedia format used in the computer game [http://www.mobygames.com/game/windows/ring-the-legend-of-the-nibelungen Ring: The Legend of the Nibelungen].
+CNM is a multimedia format used in the computer game [http://www.mobygames.com/game/windows/ring-the-legend-of-the-nibelungen Ring: The Legend of the Nibelungen]. The CI2 is the next iteration of CNM with slightly different compression that is used in [https://www.mobygames.com/game/seven-games-of-the-soul Faust: The Seven Games of the Soul].
 == Container format ==
@@ Line 10: / Line 10: @@
 * magic <code>CNM UNR\0</code>
 * header
-* frame offsets table (video and audio interleaved, audio offsets are zero when audio is not present)
+* frame offsets table (video and audio interleaved, audio offsets are zero when audio is not present; completely zero in CI2)
 * frames
@@ Line 98: / Line 98: @@
 - -2, 0
 - -2,-1
--  1,-1
+-  2,-1
 - -2,-2
 -  2,-2
@@ Line 108: / Line 108: @@
 == Video compression for version 2 ==
-In this version tile data is usually stored separately, in chunk type 0x54. Also bitstream format has changed to LSB first little-endian.
+In this version frames are coded in small groups (usually by four) with the common tile data (chunk <code>0x54</code>) preceding keyframe (chunk <code>0x55</code>) and inter frames (chunk <code>0x53</code>).
+Also note that in this version bitstream format is little-endian LSB first.
 === Tile format ===
@@ Line 134: / Line 136: @@
 === Frame format ===
-Frame is now packed using a lot of various LRUs and first tile indices are restored and afterwards they are replaced with actual tile data. Frame data is coded in groups of 8 tiles using a bit prefix: 1 - copy 8 tile indices from the previous line, 0 - switch to individual tile index decoding. Individual tile indices are coded in several ways (depending on code):
+Frame is now packed using various methods of prediction operating on tile indices. In inter frame tile index 0 means unchanged area.
+Frame data is split into regions of eight tiles, for each a bit is transmitted. Bit 1 means the whole region should be copied from above, bit 0 means that each individual tile index needs to be treated separately.
+Individual tile indices have the following mode codewords:
 * <code>&nbsp;&nbsp;1</code> -- copy index from the top line
-* <code>000</code> -- get <code>ceil(log2(tile_size))</code> bits for a new tile index, add it to LRU list (see below)
+* <code>000</code> -- get <code>ceil(log2(tile_size))</code> bits for a new tile index, add it to context list (see below)
-* <code>100</code> -- get 4-bit delta value, a sign bit, add that to top index value, output and add it to LRU list
+* <code>100</code> -- get 4-bit delta value, a sign bit, add/subtract <code>delta+1</code> to/from top index value, output and add it to the context list
-* <code>010</code> -- form a list of 0-4 context-dependent values (see below), select one using 0-2 bits, output and add it to LRU list
+* <code>010</code> -- form a list of 1-4 unique neighbour values (see below), select one using 0-2 bits, output and add it to the context list
-* <code>110</code> -- get 4-bit index, output value retrieved from LRU list using that index
+* <code>110</code> -- get 4-bit index in the corresponding context list and output it (without updating the list)
+==== Context list ====
+Decoder keeps context-dependent (i.e. one list for each possible tile index) cyclic list of last 16 values that had it as a top neighbour value. Initially it contains all zeroes.
-==== LRU list ====
+For all but one single-index operations the list should be updated:
-Decoder keeps context-dependent (i.e. one list for each possible tile index) cyclic list of last 15 values. The actual buffer is selected using the top tile index (so it is not in use for the first line). Initially it contains all zeroes.
+  if (y > 0) { // not the first line
+    top_idx = frame[cur_pos - stride];
+    contexts[top_idx].list[contexts[top_idx].pos] = cur_idx;
+    contexts[top_idx].pos = (contexts[top_idx].pos + 1) & 15;
+  }
 ==== Context-dependent list ====
@@ Line 158: / Line 172: @@
    //decoding
    if (length(list) < 2) {
-     new_idx = list[0]; // it should be empty
+     new_idx = list[0]; // it should not be empty
    } else if (length(list) == 2) {
      new_idx = list[get_bit()];

CNM: Difference between revisions

Latest revision as of 09:00, 9 November 2023

Contents

Container format

Video compression for version 1

Video compression for version 2

Tile format

Frame format

Context list

Context-dependent list

Navigation menu

CNM: Difference between revisions

Latest revision as of 09:00, 9 November 2023

Container format

Video compression for version 1

Video compression for version 2

Tile format

Frame format

Context list

Context-dependent list

Navigation menu

Search