Difference between revisions of "GoToMeeting Codec"

From MultimediaWiki
Jump to navigation Jump to search
m
 
(7 intermediate revisions by 3 users not shown)
Line 4: Line 4:
 
** G2M2: http://samples.mplayerhq.hu/V-codecs/G2M2/
 
** G2M2: http://samples.mplayerhq.hu/V-codecs/G2M2/
 
** G2M3: http://samples.mplayerhq.hu/V-codecs/G2M3/
 
** G2M3: http://samples.mplayerhq.hu/V-codecs/G2M3/
** G2M4: http://samples.mplayerhq.hu/V-codecs/G2M4/
+
** G2M4: http://samples.ffmpeg.org/V-codecs/G2M4/
  
 
This is a codec used to save recordings in GoToMeeting.  The codec also calls itself GoToWebinar (see [http://www.gotowebinar.com/ http://www.gotowebinar.com/]).
 
This is a codec used to save recordings in GoToMeeting.  The codec also calls itself GoToWebinar (see [http://www.gotowebinar.com/ http://www.gotowebinar.com/]).
Line 10: Line 10:
 
Win32 binary decoder available here: [http://www.gotomeeting.com/codec http://www.gotomeeting.com/codec]
 
Win32 binary decoder available here: [http://www.gotomeeting.com/codec http://www.gotomeeting.com/codec]
  
According to samples, all G2M2 video frames begin with the characters 'G2M2', followed by a series of chunks. Each chunk has the following layout:
+
According to samples, all G2M video frames begin with the characters 'G2M[2-4]', followed by a series of chunks. Each chunk has the following layout:
  
 
  bytes 0-3    length of chunk payload, not including this length field
 
  bytes 0-3    length of chunk payload, not including this length field
Line 18: Line 18:
 
Supported chunk types are 0xC8-0xCD.
 
Supported chunk types are 0xC8-0xCD.
  
It appears that the minimum size for a G2M2 frame (possibly a no-change frame) is 14 bytes. This includes the 4 signature bytes, a 4-byte length indicating a chunk length of 6, and a 6-byte payload of type 0xCA followed by 5 more bytes.
+
It appears that the minimum size for a G2M frame (possibly a no-change frame) is 14 bytes. This includes the 4 signature bytes, a 4-byte length indicating a chunk length of 6, and a 6-byte payload of type 0xCA followed by 5 more bytes.
  
G2M3 bears much similarity to G2M2 at the surface level. Naturally, each frame has a signature of 'G2M3'.
+
G2M3 is the same as G2M2. G2M4 introduces new compression method but the structure remains the same.
 +
 
 +
In general frame is divided into the number of tiles and each tile is coded separately. Usual tile size is 192x128 pixels
 +
 
 +
== Frame structure ==
  
 
=== Chunk C8 ===
 
=== Chunk C8 ===
  
This seems to contain display information.
+
Display information.
  
 
Chunk contents (all values are big-endian):
 
Chunk contents (all values are big-endian):
Line 39: Line 43:
 
=== Chunk C9 ===
 
=== Chunk C9 ===
  
Should be image update.
+
Image update.
  
   1 byte tile position in row?
+
   1 byte tile position in row
   1 byte tile position in column?
+
   1 byte tile position in column
 
   ... compressed data
 
   ... compressed data
  
'''REing compressed data format is left as an exercise to the reader, an example is provided below
+
=== Chunk CA ===
 +
 
 +
Mouse cursor position.
 +
 
 +
  2 bytes cursor position X
 +
  2 bytes cursor position Y
 +
  1 byte  seems to be always 1
 +
 
 +
=== Chunk CB ===
 +
 
 +
Mouse cursor shape:
 +
 
 +
  4 bytes data size
 +
  1 byte  width
 +
  1 byte  height
 +
  1 byte  hotspot x
 +
  1 byte  hotspot y
 +
  ...    cursor bitmask and its inverse (in M$ format)
 +
 
 +
=== Chunk CC ===
 +
 
 +
Maybe some resync chunk, it's supposed to contain only 4-byte value equal to 2000.
 +
 
 +
=== Chunk CD ===
  
==== Compression 2 ====
+
One dword, something to do with time.
  
  ELS-coded data size
+
== Video compression methods ==
  ELS-coded data for transparency pixel
 
  ELS-coded data for whole image
 
  JPEG data
 
  
ELS-coded data size:
+
=== Compression method 1 (ELS image) ===
  
  0xxxxxxx
+
Vanilla augmented ELS coder is used ([http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=582144 The ELS-coder: a rapid entropy coder]) with 36 jots per byte.
  10xxxxxx xxxxxxxx
 
  110xxxxx xxxxxxxx xxxxxxxx
 
  111xxxxx xxxxxxxx xxxxxxxx xxxxxxxx
 
  
===== ELS data =====
+
==== ELS values ====
  
ELS-coded data seems to consist of coded flags and differences for RGB triplets.
+
Unsigned values are coded using Exponential Golomb notation:
  
Vanilla augmented ELS coder is used ([http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=582144 The ELS-coder: a rapid entropy coder]).
+
  unary prefix + stop bit + remainder, where number of bits in the prefix = number of bits in the remainder
.
 
  
Unsigned values are coded as a number with all bits set to one and an addition to it:
+
The unary prefix is coded as n -1 zero bits followed by a one bit. E.g. number 5 will be coded as <code>00 1 10</code>.
  
  mask = 1;
+
Here is the decoding algorithm:
  val  = 0;
 
 
 
  while (decode_bit()) {
 
    val  += mask;
 
    mask <<= 1;
 
  }
 
  while (mask > 1) {
 
    mask >>= 1;
 
    if (decode_bit())
 
        val += mask;
 
  }
 
  
E.g. number 5 will be coded as <code>11 0 10</code> (i.e. 3 + stop bit + 2).
+
  count number of bits in the prefix by reading bits from the arithmetic ELS decoder until a "1" is encountered
 +
  n = number of zero bits in the prefix, i.e "001" ==> n = 2
 +
  read the remainder r as plain binary number of n bits: "10" ==> 2
 +
  value = 2<sup>n</sup> - 1 + r = 2<sup>2</sup> - 1 + 2 = 5
  
Signed values use last bit for sign:
+
Signed values are coded as unsigned ones where the LSB indicates the sign:
  
 
   if (val & 1)
 
   if (val & 1)
Line 92: Line 104:
 
     val = val >> 1;
 
     val = val >> 1;
  
Image data is composed this way:
+
To make compression better every decoded bit uses a context-depended state, so every bit is decoded this way:
  
   single pixel value that is used for transparency colour
+
   bit = els_decode_bit(ctx, &ctx->current_state->rung);
  full tile image
+
  if (bit) {
 +
    if (!ctx->current_state->next1) {
 +
        ctx->current_state->next1 = new State();
 +
        ctx->current_state->next1->rung = 0;
 +
    }
 +
    ctx->current_state = ctx->current_state->next1;
 +
  } else {
 +
    if (!ctx->current_state->next0) {
 +
        ctx->current_state->next0 = new State();
 +
        ctx->current_state->next0->rung = 0;
 +
    }
 +
    ctx->current_state = ctx->current_state->next0;
 +
  }
 +
 
 +
==== Single pixel coding ====
 +
 
 +
Decoding single pixel is performed like this:
  
Pixels in image can be coded as the one of 10 already decoded neighbours depending on which of them equal to which (a bit like JBIG compression), as some cached value or as the difference to the predicted one from left+top+topleft neighbours.
+
  if (!x && !y) {
 +
    R = decode_vlc();
 +
    G = decode_vlc();
 +
    B = decode_vlc();
 +
  } else if (!x || !y) {
 +
    if (!y) {
 +
      pR = rgb[x - 1, y].R;
 +
      pG = rgb[x - 1, y].G;
 +
      pB = rgb[x - 1, y].B;
 +
    } else {
 +
      pR = rgb[x, y - 1].R;
 +
      pG = rgb[x, y - 1].G;
 +
      pB = rgb[x, y - 1].B;
 +
    }
 +
    R = pR + decode_vlc_signed();
 +
    G = pG + decode_vlc_signed();
 +
    B = pB + decode_vlc_signed();
 +
  } else {
 +
    G = decode_pred(rgb[x - 1, y].G, rgb[x, y - 1].G, rgb[x - 1, y - 1].G);
 +
    R = G + decode_pred(rgb[x - 1, y    ].R - rgb[x - 1, y    ].G,
 +
                        rgb[x,    y - 1].R - rgb[x,    y - 1].G,
 +
                        rgb[x - 1, y - 1].R - rgb[x - 1, y - 1].G);
 +
    B = G + decode_pred(rgb[x - 1, y    ].B - rgb[x - 1, y    ].G,
 +
                        rgb[x,    y - 1].B - rgb[x,    y - 1].G,
 +
                        rgb[x - 1, y - 1].B - rgb[x - 1, y - 1].G);
 +
  }
  
===== JPEG data =====
+
Where <code>decode_pred(A, B, C)</code> looks like this:
JPEG data seems to be scan data with escapes but no headers (default Huffman tables and quantisation matrices are used).
 
  
=== Chunk CA ===
+
  diff = decode_vlc_signed();
 +
  if (B < max(A, C)) {
 +
    if (B > min(A, C)) {
 +
      return A - B + C - diff;
 +
    } else {
 +
      return max(A, C) - diff;
 +
    }
 +
  } else {
 +
    return min(A, C) - diff;
 +
  }
 +
 
 +
==== Image decoding ====
 +
 
 +
Image decoding is context-dependent and always tries to check some flags and retrieve pixel value from cache instead of decoding it directly.
 +
 
 +
Overall decoding scheme:
 +
 
 +
  for (y = 0; y < height; y++) {
 +
  x = 0;
 +
  while (x < width) {
 +
    if (x > 1 && y > 0 &&
 +
      rgb[x - 1, y] != rgb[x - 2, y] &&
 +
      rgb[x - 1, y] != rgb[x,    y - 1] &&
 +
      rgb[x - 1, y] != rgb[x - 1, y - 1] &&
 +
      rgb[x - 1, y] != rgb[x - 2, y - 1] &&
 +
      !pixel_in_cache(rgb[x - 1, y])) {
 +
    rgb[x, y] = decode_pixel_with_prediction(x, y);
 +
    x++;
 +
    continue;
 +
    }
 +
    decode_run(x, y, &run_length, &pix);
 +
    if (run_length > 0) {
 +
    // pixel value may get changed here
 +
    reuse_top_neighbours_if_possible(x, y, run_length, &pix);
 +
    while (run_length--) {
 +
      rgb[x, y] = pix;
 +
      x++;
 +
    }
 +
    } else if (x > 0 && decode_from_list(rgb[x - 1, y], &pix)) {
 +
    rgb[x, y] = pix;
 +
    x++;
 +
    } else {
 +
    rgb[x, y] = decode_pixel_with_prediction(x, y);
 +
    if (x)
 +
      add_to_list(pixel_list[rgb[x - 1, y]], rgb[x, y]);
 +
    x++;
 +
    }
 +
  }
 +
  }
 +
 
 +
Run decoding:
 +
 
 +
  run_length = 0;
 +
 
 +
  if (x > 1 && x < width - 1 && y > 1) {
 +
    L  = rgb[x - 1, y];
 +
    LL  = rgb[x - 2, y];
 +
    TR  = rgb[x + 1, y - 1];
 +
    T  = rgb[x,    y - 1];
 +
    TL  = rgb[x - 1, y - 1];
 +
    TLL = rgb[x - 2, y - 1];
 +
    TTR = rgb[x + 1, y - 2];
 +
    TT  = rgb[x,    y - 2];
 +
    TTL = rgb[x - 1, y - 2];
 +
   
 +
    if (x != ctx->last_run_end) {
 +
      idx = (TTL != TL) << 0|
 +
            (TT  != T)  << 1 |
 +
            (TTR != TR) << 2 |
 +
            (TLL != TL) << 3 |
 +
            (TL  != T)  << 4 |
 +
            (TR  != T)  << 5 |
 +
            (TL  != L)  << 6 |
 +
            (LL  != L)  << 7;
 +
      flag = els_decode_bit(ctx->left_context[idx]);
 +
    } else {
 +
      flag = 1;
 +
    }
 +
    if (flag)
 +
      add_to_cache(L);
 +
    else
 +
      pixel_val = L;
 +
    for (;;) {
 +
      if (flag) {
 +
        // not perfect
 +
        idx = (TTL != TL) << 0|
 +
              (TT  != T)  << 1 |
 +
              (TTR != TR) << 2 |
 +
              (TLL != TL) << 3 |
 +
              (TL  != T)  << 4 |
 +
              (TR  != T)  << 5 |
 +
              (TL  != L)  << 6 |
 +
              (LL  != L)  << 7;
 +
        if (els_decode_bit(ctx->top_context[idx])) {
 +
          pixel_val = T;
 +
          flag2 = 0;
 +
        } else {
 +
          if (!pixel_in_cache(T))
 +
            add_to_cache(T);
 +
          flag2 = 1;
 +
        }
 +
      } else {
 +
        flag2 = (pixel_val != T);
 +
      }
 +
      x++;
 +
      if (x >= width - 1)
 +
        break;
 +
      update L, LL, LR, T, TL, TLL, TTR, TT and TTL;
 +
      if (!flag2 && TL == T && T == TR) {
 +
        if (!decode_run_length(&x))
 +
          break;
 +
        update L, LL, LR, T, TL, TLL, TTR, TT and TTL;
 +
      }
 +
      idx = (TTL != TL) << 0|
 +
            (TT  != T)  << 1 |
 +
            (TTR != TR) << 2 |
 +
            (TLL != TL) << 3 |
 +
            (TL  != T)  << 4 |
 +
            (TR  != T)  << 5 |
 +
            (TL  != L)  << 6 |
 +
            (LL  != L)  << 7;
 +
      if (els_decode_bit(ctx->left_context[idx]))
 +
        break;
 +
    }
 +
    ctx->last_run_end = x;
 +
    run_length = x - old_x;
 +
    return !flag;
 +
  }
 +
  if (x > 0) {
 +
    if (!els_decode_bit(ctx->left_flag_ctx)) {
 +
      pixel_val = rgb[x - 1, y];
 +
      run_length = 1;
 +
    } else {
 +
      add_to_cache(rgb[x - 1, y]);
 +
    }
 +
  }
 +
  if (y > 0) {
 +
    top_pix = rgb[x, y - 1];
 +
    if (empty_pixel_cache() || first_pixel_in_cache() != top_pix) {
 +
      if (!els_decode_bit(ctx->top_flag_ctx)) {
 +
        pixel_val = top_pix;
 +
        run_length = 1;
 +
      } else {
 +
        add_to_cache(top_pix);
 +
      }
 +
    }
 +
  }
 +
 
 +
Decoding run length (essentially the run on the above line is used as a reference and is either returned immediately or a value not greater than it is decoded):
 +
 
 +
  pos_R  = x + 1;
 +
  pos_RR = x + 2;
 +
  while (pos_RR < width && rgb[pos_RR, y - 1] == rgb[pos_R, y]) {
 +
    pos_R++;
 +
    pos_RR++;
 +
  }
 +
  bits = log2_int(pos_R - x);
 +
  if (els_decode_bit(ctx->dist_context[bits]))
 +
    return pos_R - x;
 +
  flag = 0;
 +
  bit = 1 << (bits - 1);
 +
  mask = 0;
 +
  run_length = 0;
 +
  while (bits >= 0) {
 +
    if (((run_length & mask) | bit) < pos_R - x) {
 +
      if (els_decode_bit(flag ? ctx->one_context : ctx->length_context[bits])) {
 +
        flag = 1;
 +
        run_length |= 1 << bits;
 +
      }
 +
    }
 +
    mask |= bit;
 +
    bit >>= 1;
 +
    bits--;
 +
  }
 +
 
 +
Reuse neighbours if possible:
 +
 
 +
  if (x > 0 && y > 0) {
 +
    TL = rgb[x - 1, y - 1];
 +
    L  = rgb[x - 1, y];
 +
    T  = rgb[x,    y - 1];
 +
    if (TL != L && TL != T && !pixel_is_in_cache(TL)) {
 +
      if (els_decode_bit(ctx->TL_context[TL])) {
 +
        modify current pixel value to be TL
 +
        return
 +
      }
 +
      add_to_cache(TL);
 +
    }
 +
  }
 +
  if (x + run_size < width - 1 && y > 0) {
 +
    TR = rgb[x + 1, y - 1];
 +
    T  = rgb[x,    y - 1];
 +
    if (T != TR && !pixel_is_in_cache(TR)) {
 +
      if (els_decode_bit(ctx->TR_context[TR])) {
 +
        modify current pixel value to be TR
 +
        return
 +
      }
 +
      add_to_cache(TR);
 +
    }
 +
  }
 +
 
 +
Decoding from list:
 +
 
 +
  list = get_list_for_pixel(rgb[x - 1, y]);
 +
  while (list) {
 +
    if (!pixel_is_in_cache(list->pix_val)) {
 +
      if (els_decode_bit(list->rung)) {
 +
        output_pixel_value = list->pix_val;
 +
        remove current entry from the list;
 +
        return success;
 +
      }
 +
      add_to_cache(list->pix_val);
 +
    }
 +
    list = list->next;
 +
  }
 +
  return fail;
 +
 
 +
=== Compression method 2 (ELS image + JPEG) ===
 +
 
 +
This enhances compression method 1 by separating image into two pictures - the one with sharp details and the one with smooth details. The former is compressed as in compression method 1, the latter is coded as JPEG image. One of the layers can be absent in the tile.
 +
 
 +
Overall coding is quite simple: ELS layer is coded as first 1x1 image containing value that will be used as a transparent color (i.e. the value that should be replaced with JPEG data) and the whole picture.
 +
 
 +
JPEG data consists of raw scan data for the baseline JPEG with the standard quantisation matrix and VLCs. Only the macroblocks for the ELS image blocks with transparency are coded (or the whole image when ELS data is not present).
 +
 
 +
  ELS-coded data size
 +
  ELS-coded data for transparency pixel
 +
  ELS-coded data for whole image
 +
  JPEG data
  
Probably mouse cursor position.
+
ELS-coded data size:
  
   2 bytes cursor position X
+
   0xxxxxxx
   2 bytes cursor position Y
+
   10xxxxxx xxxxxxxx
   1 byte  seems to be always 1
+
   110xxxxx xxxxxxxx xxxxxxxx
 +
  111xxxxx xxxxxxxx xxxxxxxx xxxxxxxx
  
=== Chunk CB ===
+
=== Compression method 3 (deflated image + JPEG) ===
  
This one seems to define mouse cursor shape:
+
This method resembles compression method 2 except that ELS image is replaced with simple deflated image and macroblock map (what blocks in image to code) is stored explicitly too.
  
   4 bytes data size
+
   compression subtype (1 byte)
   1 byte width
+
  transparent pixel value (3 bytes)
   1 byte height
+
   number of palette entries minus one (1 byte)
   1 byte  hotspot x
+
   palette (3-byte entries)
   1 byte  hotspot y
+
  deflated data size (2 bytes big-endian)
   ...    cursor bitmask and its inverse (in M$ format 98% sure)
+
   deflated data
 +
   JPEG macroblock map
 +
   JPEG data
 +
 
 +
Compression subtype (top 3 bits) tells what exact parts are present and how they should be decoded.
  
=== Chunk CC ===
+
* 0 - fill block with the following pixel value
 +
* 1 - decode JPEG only, only JPEG data is present
 +
* 2 - decode only deflated data, no transparent pixel or JPEG data present
 +
* 3 - all features are present
  
Maybe some resync chunk, it's supposed to contain only 4-byte value equal to 2000.
+
Deflated image data describes palettised mask image (or "synthetic layer"). The image is also compressed further by using the minimal amount of bits for palette indices (e.g. only 2 bits for 3- or 4-colour images) and every line can be skipped instead of coding.
  
=== Chunk CD ===
+
  for (y = 0; y < height; y++, dst += stride) {
 +
    if (get_bits(8)) // 'line coded' flag
 +
        continue;
 +
    for (x = 0; x < width; x++)
 +
        dst[x] = get_bit(bits_per_index);
 +
  }
  
One dword, something to do with time.
+
JPEG macroblock map consists of byte with the number of macroblocks coded minus one and an array of flags packed into bytes LSB first. Zero bit means that the next macroblock should be skipped, set bit means that the next decoded macroblock should be put here. This array continues until all coded macroblocks are flagges. Right after that information an actual JPEG data is stored.
  
 
[[Category:Video Codecs]]
 
[[Category:Video Codecs]]
[[Category:Undiscovered Video Codecs]]
+
[[Category:Incomplete Video Codecs]]
[[Category:Formats missing in FFmpeg]]
 
 
[[Category:Screen Capture Video Codecs]]
 
[[Category:Screen Capture Video Codecs]]

Latest revision as of 01:27, 3 February 2014

This is a codec used to save recordings in GoToMeeting. The codec also calls itself GoToWebinar (see http://www.gotowebinar.com/).

Win32 binary decoder available here: http://www.gotomeeting.com/codec

According to samples, all G2M video frames begin with the characters 'G2M[2-4]', followed by a series of chunks. Each chunk has the following layout:

bytes 0-3    length of chunk payload, not including this length field
byte 4       type of chunk
bytes 5..    remainder of payload, format unknown

Supported chunk types are 0xC8-0xCD.

It appears that the minimum size for a G2M frame (possibly a no-change frame) is 14 bytes. This includes the 4 signature bytes, a 4-byte length indicating a chunk length of 6, and a 6-byte payload of type 0xCA followed by 5 more bytes.

G2M3 is the same as G2M2. G2M4 introduces new compression method but the structure remains the same.

In general frame is divided into the number of tiles and each tile is coded separately. Usual tile size is 192x128 pixels

Frame structure

Chunk C8

Display information.

Chunk contents (all values are big-endian):

  4 bytes  image width
  4 bytes  image height
  4 bytes  compression mode (should be 2 or 3)
  4 bytes  tile width
  4 bytes  tile height
  1 byte   colour depth (4, 8, 16, 24 or 32)
  for 4/8bpp there is a palette in standard RGBTUPLE format
  for 16-32bpp there are four bitmasks for each field

Chunk C9

Image update.

 1 byte tile position in row
 1 byte tile position in column
 ... compressed data

Chunk CA

Mouse cursor position.

 2 bytes cursor position X
 2 bytes cursor position Y
 1 byte  seems to be always 1

Chunk CB

Mouse cursor shape:

 4 bytes data size
 1 byte  width
 1 byte  height
 1 byte  hotspot x
 1 byte  hotspot y
 ...     cursor bitmask and its inverse (in M$ format)

Chunk CC

Maybe some resync chunk, it's supposed to contain only 4-byte value equal to 2000.

Chunk CD

One dword, something to do with time.

Video compression methods

Compression method 1 (ELS image)

Vanilla augmented ELS coder is used (The ELS-coder: a rapid entropy coder) with 36 jots per byte.

ELS values

Unsigned values are coded using Exponential Golomb notation:

 unary prefix + stop bit + remainder, where number of bits in the prefix = number of bits in the remainder

The unary prefix is coded as n -1 zero bits followed by a one bit. E.g. number 5 will be coded as 00 1 10.

Here is the decoding algorithm:

 count number of bits in the prefix by reading bits from the arithmetic ELS decoder until a "1" is encountered
 n = number of zero bits in the prefix, i.e "001" ==> n = 2
 read the remainder r as plain binary number of n bits: "10" ==> 2
 value = 2n - 1 + r = 22 - 1 + 2 = 5

Signed values are coded as unsigned ones where the LSB indicates the sign:

 if (val & 1)
   val = - ((val + 1) >> 1);
 else
   val = val >> 1;

To make compression better every decoded bit uses a context-depended state, so every bit is decoded this way:

 bit = els_decode_bit(ctx, &ctx->current_state->rung);
 if (bit) {
   if (!ctx->current_state->next1) {
       ctx->current_state->next1 = new State();
       ctx->current_state->next1->rung = 0;
   }
   ctx->current_state = ctx->current_state->next1;
 } else {
   if (!ctx->current_state->next0) {
       ctx->current_state->next0 = new State();
       ctx->current_state->next0->rung = 0;
   }
   ctx->current_state = ctx->current_state->next0;
 }

Single pixel coding

Decoding single pixel is performed like this:

 if (!x && !y) {
   R = decode_vlc();
   G = decode_vlc();
   B = decode_vlc();
 } else if (!x || !y) {
   if (!y) {
     pR = rgb[x - 1, y].R;
     pG = rgb[x - 1, y].G;
     pB = rgb[x - 1, y].B;
   } else {
     pR = rgb[x, y - 1].R;
     pG = rgb[x, y - 1].G;
     pB = rgb[x, y - 1].B;
   }
   R = pR + decode_vlc_signed();
   G = pG + decode_vlc_signed();
   B = pB + decode_vlc_signed();
 } else {
   G = decode_pred(rgb[x - 1, y].G, rgb[x, y - 1].G, rgb[x - 1, y - 1].G);
   R = G + decode_pred(rgb[x - 1, y    ].R - rgb[x - 1, y    ].G,
                       rgb[x,     y - 1].R - rgb[x,     y - 1].G,
                       rgb[x - 1, y - 1].R - rgb[x - 1, y - 1].G);
   B = G + decode_pred(rgb[x - 1, y    ].B - rgb[x - 1, y    ].G,
                       rgb[x,     y - 1].B - rgb[x,     y - 1].G,
                       rgb[x - 1, y - 1].B - rgb[x - 1, y - 1].G);
 }

Where decode_pred(A, B, C) looks like this:

 diff = decode_vlc_signed();
 if (B < max(A, C)) {
   if (B > min(A, C)) {
     return A - B + C - diff;
   } else {
     return max(A, C) - diff;
   }
 } else {
   return min(A, C) - diff;
 }

Image decoding

Image decoding is context-dependent and always tries to check some flags and retrieve pixel value from cache instead of decoding it directly.

Overall decoding scheme:

 for (y = 0; y < height; y++) {
  x = 0;
  while (x < width) {
   if (x > 1 && y > 0 &&
     rgb[x - 1, y] != rgb[x - 2, y] &&
     rgb[x - 1, y] != rgb[x,     y - 1] &&
     rgb[x - 1, y] != rgb[x - 1, y - 1] &&
     rgb[x - 1, y] != rgb[x - 2, y - 1] &&
     !pixel_in_cache(rgb[x - 1, y])) {
    rgb[x, y] = decode_pixel_with_prediction(x, y);
    x++;
    continue;
   }
   decode_run(x, y, &run_length, &pix);
   if (run_length > 0) {
    // pixel value may get changed here
    reuse_top_neighbours_if_possible(x, y, run_length, &pix);
    while (run_length--) {
     rgb[x, y] = pix;
     x++;
    }
   } else if (x > 0 && decode_from_list(rgb[x - 1, y], &pix)) {
    rgb[x, y] = pix;
    x++;
   } else {
    rgb[x, y] = decode_pixel_with_prediction(x, y);
    if (x)
      add_to_list(pixel_list[rgb[x - 1, y]], rgb[x, y]);
    x++;
   }
  }
 }

Run decoding:

 run_length = 0;
 
 if (x > 1 && x < width - 1 && y > 1) {
   L   = rgb[x - 1, y];
   LL  = rgb[x - 2, y];
   TR  = rgb[x + 1, y - 1];
   T   = rgb[x,     y - 1];
   TL  = rgb[x - 1, y - 1];
   TLL = rgb[x - 2, y - 1];
   TTR = rgb[x + 1, y - 2];
   TT  = rgb[x,     y - 2];
   TTL = rgb[x - 1, y - 2];
   
   if (x != ctx->last_run_end) {
     idx = (TTL != TL) << 0|
           (TT  != T)  << 1 |
           (TTR != TR) << 2 |
           (TLL != TL) << 3 |
           (TL  != T)  << 4 |
           (TR  != T)  << 5 |
           (TL  != L)  << 6 |
           (LL  != L)  << 7;
     flag = els_decode_bit(ctx->left_context[idx]);
   } else {
     flag = 1;
   }
   if (flag)
     add_to_cache(L);
   else
     pixel_val = L;
   for (;;) {
     if (flag) {
       // not perfect
       idx = (TTL != TL) << 0|
             (TT  != T)  << 1 |
             (TTR != TR) << 2 |
             (TLL != TL) << 3 |
             (TL  != T)  << 4 |
             (TR  != T)  << 5 |
             (TL  != L)  << 6 |
             (LL  != L)  << 7;
       if (els_decode_bit(ctx->top_context[idx])) {
         pixel_val = T;
         flag2 = 0;
       } else {
         if (!pixel_in_cache(T))
           add_to_cache(T);
         flag2 = 1;
       }
     } else {
       flag2 = (pixel_val != T);
     }
     x++;
     if (x >= width - 1)
       break;
     update L, LL, LR, T, TL, TLL, TTR, TT and TTL;
     if (!flag2 && TL == T && T == TR) {
       if (!decode_run_length(&x))
         break;
       update L, LL, LR, T, TL, TLL, TTR, TT and TTL;
     }
     idx = (TTL != TL) << 0|
           (TT  != T)  << 1 |
           (TTR != TR) << 2 |
           (TLL != TL) << 3 |
           (TL  != T)  << 4 |
           (TR  != T)  << 5 |
           (TL  != L)  << 6 |
           (LL  != L)  << 7;
     if (els_decode_bit(ctx->left_context[idx]))
       break;
   }
   ctx->last_run_end = x;
   run_length = x - old_x;
   return !flag;
 } 
 if (x > 0) {
   if (!els_decode_bit(ctx->left_flag_ctx)) {
     pixel_val = rgb[x - 1, y];
     run_length = 1;
   } else {
     add_to_cache(rgb[x - 1, y]);
   }
 }
 if (y > 0) {
   top_pix = rgb[x, y - 1];
   if (empty_pixel_cache() || first_pixel_in_cache() != top_pix) {
     if (!els_decode_bit(ctx->top_flag_ctx)) {
       pixel_val = top_pix;
       run_length = 1;
     } else {
       add_to_cache(top_pix);
     }
   }
 }

Decoding run length (essentially the run on the above line is used as a reference and is either returned immediately or a value not greater than it is decoded):

 pos_R  = x + 1;
 pos_RR = x + 2;
 while (pos_RR < width && rgb[pos_RR, y - 1] == rgb[pos_R, y]) {
   pos_R++;
   pos_RR++;
 }
 bits = log2_int(pos_R - x);
 if (els_decode_bit(ctx->dist_context[bits]))
   return pos_R - x;
 flag = 0;
 bit = 1 << (bits - 1);
 mask = 0;
 run_length = 0;
 while (bits >= 0) {
   if (((run_length & mask) | bit) < pos_R - x) {
     if (els_decode_bit(flag ? ctx->one_context : ctx->length_context[bits])) {
       flag = 1;
       run_length |= 1 << bits;
     }
   }
   mask |= bit;
   bit >>= 1;
   bits--;
 }

Reuse neighbours if possible:

 if (x > 0 && y > 0) {
   TL = rgb[x - 1, y - 1];
   L  = rgb[x - 1, y];
   T  = rgb[x,     y - 1];
   if (TL != L && TL != T && !pixel_is_in_cache(TL)) {
     if (els_decode_bit(ctx->TL_context[TL])) {
       modify current pixel value to be TL
       return
     }
     add_to_cache(TL);
   }
 }
 if (x + run_size < width - 1 && y > 0) {
   TR = rgb[x + 1, y - 1];
   T  = rgb[x,     y - 1];
   if (T != TR && !pixel_is_in_cache(TR)) {
     if (els_decode_bit(ctx->TR_context[TR])) {
       modify current pixel value to be TR
       return
     }
     add_to_cache(TR);
   }
 }

Decoding from list:

 list = get_list_for_pixel(rgb[x - 1, y]);
 while (list) {
   if (!pixel_is_in_cache(list->pix_val)) {
     if (els_decode_bit(list->rung)) {
       output_pixel_value = list->pix_val;
       remove current entry from the list;
       return success;
     }
     add_to_cache(list->pix_val);
   }
   list = list->next;
 }
 return fail;

Compression method 2 (ELS image + JPEG)

This enhances compression method 1 by separating image into two pictures - the one with sharp details and the one with smooth details. The former is compressed as in compression method 1, the latter is coded as JPEG image. One of the layers can be absent in the tile.

Overall coding is quite simple: ELS layer is coded as first 1x1 image containing value that will be used as a transparent color (i.e. the value that should be replaced with JPEG data) and the whole picture.

JPEG data consists of raw scan data for the baseline JPEG with the standard quantisation matrix and VLCs. Only the macroblocks for the ELS image blocks with transparency are coded (or the whole image when ELS data is not present).

 ELS-coded data size
 ELS-coded data for transparency pixel
 ELS-coded data for whole image
 JPEG data

ELS-coded data size:

 0xxxxxxx
 10xxxxxx xxxxxxxx
 110xxxxx xxxxxxxx xxxxxxxx
 111xxxxx xxxxxxxx xxxxxxxx xxxxxxxx

Compression method 3 (deflated image + JPEG)

This method resembles compression method 2 except that ELS image is replaced with simple deflated image and macroblock map (what blocks in image to code) is stored explicitly too.

 compression subtype (1 byte)
 transparent pixel value (3 bytes)
 number of palette entries minus one (1 byte)
 palette (3-byte entries)
 deflated data size (2 bytes big-endian)
 deflated data
 JPEG macroblock map
 JPEG data

Compression subtype (top 3 bits) tells what exact parts are present and how they should be decoded.

  • 0 - fill block with the following pixel value
  • 1 - decode JPEG only, only JPEG data is present
  • 2 - decode only deflated data, no transparent pixel or JPEG data present
  • 3 - all features are present

Deflated image data describes palettised mask image (or "synthetic layer"). The image is also compressed further by using the minimal amount of bits for palette indices (e.g. only 2 bits for 3- or 4-colour images) and every line can be skipped instead of coding.

 for (y = 0; y < height; y++, dst += stride) {
   if (get_bits(8)) // 'line coded' flag
       continue;
   for (x = 0; x < width; x++)
       dst[x] = get_bit(bits_per_index);
 }

JPEG macroblock map consists of byte with the number of macroblocks coded minus one and an array of flags packed into bytes LSB first. Zero bit means that the next macroblock should be skipped, set bit means that the next decoded macroblock should be put here. This array continues until all coded macroblocks are flagges. Right after that information an actual JPEG data is stored.