Difference between revisions of "Microsoft Screen Codec"

From MultimediaWiki
Jump to navigation Jump to search
 
 
(28 intermediate revisions by 9 users not shown)
Line 1: Line 1:
* FOURCCs: MSS1, MSS2
Also known as Windows Media Screen Codec.


[[Category:Undiscovered Video Codecs]]
* FourCCs: MSS1, MSS2, MSA1
* Samples:
** http://samples.mplayerhq.hu/V-codecs/MSS1/
** http://samples.mplayerhq.hu/V-codecs/MSS2/
 
MSA1 is created by Live Meeting 2007
 
 
== Some details about format ==
 
Both MSS1 and MSS2 are quite close (thus are decoded with single decoder). They employ arithmetic coding -  real one, with probability coding. This coding is used with several adaptive models, which look a bit like PPM.
 
=== MSS1 details ===
 
MSS1 (aka Windows Media Screen V7 codec) compresses only palettised images.
 
==== Extradata format ====
(for some reason, data in .wmv is stored in big-endian order)
 
  4- 7  header length
  8-11  major version (1 for MSS1, 2 for MSS2)
12-15  minor version
16-19  display width
20-23  display height
24-27  coded width
28-31  coded height
32-35  frames per second (float)
36-39  bitrate
40-43  max lead time (float)
44-47  max lag time (float)
48-51  max seek time (float)
52-55  nFreeColors
56-823 palette (256 RGB triplets)
Only for MSS2:
824-827 threadingSplit (domain: -1, 0, 1..codedH)
828-831 numSymbolsEscapeModel (domain: 0..256)
 
Both width and height must be in the range 1..4096.
 
==== Frame format ====
 
Codec uses arithmetic decoders for all operations and adaptive models. All code for them is suspiciously similar to the one in [http://www.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf | 1987 paper by Witten, Neal and Cleary].
 
Codec uses delta compression and can change top palette entries with every intra frame:
 
  is_inter = coder->decode_bit();
  if (!is_inter) {
      if (nFreeColors) {
          num_entries = coder->decode_number(nFreeColors + 1);
          for (i = 0; i < num_entries; i++) {
              pal[(256 - nFreeColors) + i].R = coder->decode_bits(8);
              pal[(256 - nFreeColors) + i].G = coder->decode_bits(8);
              pal[(256 - nFreeColors) + i].B = coder->decode_bits(8);
          }
      }
      recursive_decode_intra(0, 0, width, height);
  } else {
      recursive_decode_inter(0, 0, width, height);
  }
 
Frame coding is done by recursively partitioning picture horizontally or vertically and coding partitions in some way:
 
  recursive_decode_intra(x, y, width, height) {
      mode = coder->decode_model(split_mode_model);
      switch (mode) {
      case 0:
          pivot = decode_pivot(height);
          recursive_decode_intra(x, y, width, pivot);
          recursive_decode_intra(x, y + pivot, width, height - pivot);
          break;
      case 1:
          pivot = decode_pivot(width);
          recursive_decode_intra(x, y, pivot, height);
          recursive_decode_intra(x + pivot, y, width - pivot, height
          break;
      case 2:
          mode = coder->decode_model(intra_decode_model);
          if (!mode) {
              pix = decode_pixel();
              fill_rect(x, y, width, height, pixel);
          } else {
              decode_area(x, y, width, height);
          }
          break;
      }
  }
 
  recursive_decode_inter(x, y, width, height) {
      mode = coder->decode_model(split_mode_model);
      switch (mode) {
      case 0:
          pivot = decode_pivot(height);
          recursive_decode_inter(x, y, width, pivot);
          recursive_decode_inter(x, y + pivot, width, height - pivot);
          break;
      case 1:
          pivot = decode_pivot(width);
          recursive_decode_inter(x, y, pivot, height);
          recursive_decode_inter(x + pivot, y, width - pivot, height
          break;
      case 2:
          mode = coder->decode_model(inter_decode_model);
          if (!mode) {
              pix = decode_pixel();
              // same meaning as mask values, see below
              // for MSS2, pix == 4 means a motion compensated rectangle
              if (pix != 0xFF) {
                  copy_rect(x, y, width, height, pixel);
              } else {
                  mode = coder->decode_model(intra_decode_model);
                  if (!mode) {
                      pix = decode_pixel();
                      fill_rect(x, y, width, height, pixel);
                  } else {
                      decode_area(x, y, width, height);
                  }
              }
          } else {
              // this decoded change mask first and then
              // checks - if mask value is 0xFF then decode pixel
              // otherwise copy if from the previous frame
              mask = decode_area(x, y, width, height);
              decode_area_masked(x, y, width, height);
          }
          break;
      }
  }
 
Mask values:
{| border="1"
! Type !! Value in MSS1 !! Value in MSS2
|-
| copy from same location || 0x80 || 0x02
|-
| copy motion compensated || N/A || 0x04
|-
| decode new || 0xFF || 0x01
|-
|}
 
In decode_area_masked(), decode new pixels as described in "Context modeller" even if the neighboring pixels were copied.
 
==== other decoding routines ====
 
Decoding pivot point:
 
  decode_pivot(ref_value) {
      edge  = coder->decode_model(edge_model);
      coord = coder->decode_model(pivot_model) + 1;
      if (coord > 2)
          coord = coder->decode_number((ref_value + 1) / 2 - 2) + 3;
      if (edge)
          return ref_value - coord;
      else
          return coord;
  }
 
Decoding pixels is not that trivial.
Codec uses neighbour pixels (left, top-left, top, top-right) to form a cache which is used
along with cached move-to-front queue and several models to restore pixel.
 
==== Models ====
 
Models are reinitialised at every intraframe. Initially all symbols have weigth = 1.
With every update weight is increased by one and when they're too large they get rescaled.
 
Rescaling weights is performed when total cumulative probability is bigger than threshold, which can be static or adaptive.
Static threshold is calculated as <code>num_symbols * symbol_threshold</code>, adaptive one is recalculated every time as
<code>min(0x3FFF, ((2 * weights[num_symbols] - 1) / 2 + 4 * cumulative_probability[0]) / (2 * weights[num_symbols] - 1))</code>.
 
Scaling weights is simply <code>weight' = (weight + 1) >> 1</code>.
 
Main models:
 
{| border="1"
! Name !! Purpose !! Number of symbols !! Threshold per symbol
|-
| intra_decode_model || region decoding mode for intra (solid fill or not) || 2 || adaptive
|-
| inter_decode_model || region decoding mode for inter (full region decoder or masked) || 2 || adaptive
|-
| split_mode_model || region split mode (horizontal/vertical/none) || 3 || 50
|-
| edge_model || signals from which edge pivot point is decoded  || 2 || 50
|-
| pivot_model || rough coordinates for pivot point (1, 2, escape) || 3 || 15
|-
|}
 
==== Context modeller ====
 
Context modeller is used for modelling pixel context by using its neighbours and caching last decoded values.
There are two context modellers used by decoder — one for decoding picture data (in both kinds of frames),
another one is used solely for decoding mask in interframes.
 
Modeller components (values in {brackets} are for MSS2):
 
* last decoded pixels cache (8 for picture data, 2 {3} for mask), initially filled with 0, 1, 2... and reset to that every intraframe
* primary model for decoding pixel (<code>(cache_size + 1)</code> symbols, <code>15</code> symbol threshold)
* escape model for decoding pixel value not in cache (<code>256</code> {<code>numSymbolsEscapeModel</code>} symbols, <code>50</code> symbol threshold)
* secondary models for context-modelled pixels, four layers of models for different combinations of non-equal neighbours:
** first layer - 1x4 models (<code>2</code> symbols, adaptive symbol threshold)
** second layer - 7x4 models (<code>3</code> symbols, <code>15</code> symbol threshold)
** third layer - 6x4 models (<code>4</code> symbols, <code>15</code> symbol threshold)
** fourth layer - 1x4 models (<code>5</code> symbols, <code>15</code> symbol threshold)
 
Decoding top left pixel (for it no neighbourhood is provided):
 
  val = coder->decode_model(modeller->primary_model);
  if (val < modeller->cache_size) {
      pix = modeller->cache[pix];
      if pix is found in the provided neighbourhood, insert it to the first position in the cache
        (it doesn't matter if it's already in the cache)
      else move it to the first position shifting other values by one
  } else {
      pix = coder->decode_model(modeller->escape_model);
      if pix is found in cache, move it to the first position shifting other values by one
      else just insert it at the first position in cache
  }
 
Decoding other pixels:
 
  get neighbourhood (left, top, top-right and top-left pixels)
  select secondary model depending on neighbourhood
  if decoded value is less than number of neighbours, pick corresponding neighbour
  else decode pixel like top left one but provide neighbourhood for the reference this time
 
Determine neighborhood as:
If top pixel isn't available (first row): top = top-right = top-left = left
    (left is available, as it was decoded above)
If right pixel isn't available (last column): top-right = top
If left pixel isn't available (first column): left = top-left = top
If neither right nor left are available (single column):  top-right = top-left = left = top
note: pixels outside the current area aren't considered available
 
Determine secondary model as:
 
layer = number of different neighborhoods (1 if all equal, 4 if all different, 2 if ABBB or AABB
or ABBA or any other such combination, 3 if ABCC or ABBC or ABCA or any other such combination)
sublayer = identify which neighborhoods are equal to each other. For example:
if layer == 1:    # all equal
    sublayer = 0
if layer == 2:    # 2-2 or 3-1
    if top == topLeft:
        if topRight == topLeft:
            sublayer = 3
        elsif left == topLeft:
            sublayer = 2
        else:
            sublayer = 4
    elsif topRight == topLeft:
        if left == topLeft:
            sublayer = 1
        else:
            sublayer = 5
    else
        if left == topLeft:
            sublayer = 6
        else:
            sublayer = 0
if layer == 3:    # 2-1-1
    if top == topLeft:
        sublayer = 0
    elsif topRight == topLeft:
        sublayer = 1
    elsif left == topLeft:
        sublayer = 2
    elsif topRight == top:
        sublayer = 3
    elsif left == top:
        sublayer = 4
    else
        sublayer = 5
if layer == 4:    # all different
    sublayer = 0
subsublayer = 0
if left-left pixel is available (column >= 2) and its value is equal to the left pixel:
    subsublayer += 1
if top-top pixel is available (row >= 2) and its value is equal to the top pixel:
    subsublayer += 2
 
Last decoded pixels cache use:
 
This cache internally has 4 more entries (12 total for picture data, 6 {7} for mask). The extra entries are to skip neighboring colors which we already know aren't the ones we're looking for.
 
Example:
 
Get neiborhood pixels, in this order: topLeft = 140, top = 134, topRight = 140, left = 136
Remove duplicates: [140, 134, 136]
We have 3 unique colors, therefore we use the third layer in the secondary model.
Since topRight == topLeft, we use sublayer 1. The subsublayer doesn't mater for the sake of this example.
 
Now we fetch a value x using the corresponding secondary model:
 
if x == 0, output 140
 
if x == 1, output 134
 
if x == 2, output 136
 
if x == 3, the secondary model can't code the color. Fall back to the primary model to try and decode it from the cache.
 
Assume the cache contents are [25, 140, 136, 134, 50, 23, ...
 
If the primary model returned 0, output 25
 
If it returned 1, since we know the color isn't 134, 136, or 140, output 50
 
If it returned 2, output 23, and so on, until 8 which means the color isn't in the cache either and we have to fall back to the escape model. In this example, the last cache entry was unreachable. For the top-left pixel, there are zero neighbors and the last 4 entries are unreachable.
 
=== MSS2 (Windows Media Video 9 Screen codec) details ===
 
In MSS2, the frame header, RLE modes, palette updates, motion vector coding
and WMV9 data are not arithmetic coded, whereas the rectangle info data and
paletted recursive subdivision modes are. Each block is byte-aligned and
consumes a integral number of bytes. The coders have to be re-initialized
between blocks, even if they are of the same type.
 
alignByteStream() {
    < Align to byte boundary discarding any partially read bytes.
      When using VLC decoding, use
    get_bits_count() + 7 >> 3;
      to determine the number of consumed bytes. For the AC
      portions, use
    ac2_get_consumed_byes(); >
}
 
RLE555Decode(x, y, w, h) {
    //outputs RGB555
   
    if (!isIntra) {
        x = get_bits(12);
        w = x - get_bits(12) + 1;
        y = y + get_bits(12);
        h = y - get_bits(12) + 1;
    }
    for each pixel in the (x, y, w, h) rectangle:
    read a byte, and switch:
        0..127, 134..255:
            read another byte b and put the color (byte<<8 && b)
            #note that if the byte was >133, this will generate colors >32767
        128: copy from prev line
        129: copy from prev frame (leave unchanged)
        130..133:
            r = 0
            repeat (value-130) times:
                r = (r << 8) + (read another byte)
            r += 1
            repeat the previous decoded symbol r times
            #the previous symbol can be a color or a copy instruction
}
 
RLEDecode(x, y, w, h) {
    if (!isIntra) {
        x = get_bits(12);
        y = get_bits(12);
        w = get_bits(12) + 1;
        h = get_bits(12) + 1;
    }
    // This mode uses a single tree of VLC codes.
    // It is built using the code lengths, which are read as follows:
    usedCodes = 0
    currentCodeLength = 1
    loop:
        remainingCodes = (1 << currentCodeLength) - usedCodes
        codesOfThisLength = get_bits(ceil_log2(remainingCodes + 1))
        if codesOfThisLength == remainingCodes:
            we're done, all of the remainign codes have the current length
        otherwise, for each codesOfThisLength:
            x = get 8 bits
            if x < 190:
                addcode(get_bits1() + (x << 1) - 190, currentCodeLength)
            if x < 204 - isIntra:
                addcode(x, currentCodeLength)
            #if x >= 204:
            addcode(x + 14 - isIntra, currentCodeLength)
        usedCodes = (usedCodes + codesOfThisLength) << 1
        currentCodeLength++
    // main decoding loop
    for each pixel in the (x, y, w, h) rectangle:
    read an VLC code using the tree generated above, and switch:
        0-255: put that color
        256-267:
            q = value - 256
            if q == 11:
                q = get_bits(4) + 10
            if !q: r=1
            else: r = get_bits(q) + 1
            while q--:
                r += 1 << q
            repeat the previous symbol r times
        268: copy from prev line
        269: copy from prev frame (leave unchanged)
    alignByteStream();
}
 
SubDivDecode(contextSet, x, y, w, h) {
    < load the corresponding contextSet, with includes
      all the 5 main models plus both color contexts    >
    if (isIntra) {
        reset_contextSet();
        recursive_decode_intra(x, y, w, h); // same as MSS1, with the differences outlined above
    } else
        recursive_decode_inter(x, y, w, h); // same as MSS1, with the differences outlined above
    alignByteStream();
}
 
The WMV9 rectangle coordinates are read like a tree, but
we're only interested on the list that will be placed on root_rect.children.
There shouldn't be any grandchildren. levelIsPal determines whether
the root rect is a paletted subdivision node and its children are WMV9 nodes
or viceversa. In practice it should always be 1. There's a limit of 20 WMV9
rectangles per frame.
 
WMV9RectRecursive(rect, offset, depth, levelIsPal, flagsRead) {
    int n = 0;
    while (ac2_get_bit()) {
        if (!n)
            new_rect.x = ac2_get_number(rect.w);
        else
            new_rect.x = ac2_get_number(rect.h - rect.children[n-1].x) +
                rect.children[n-1].x;
        new_rect.y = ac2_get_number(rect.h);
        new_rect.w = ac2_get_number(rect.w - new_rect.x);
        new_rect.h = ac2_get_number(rect.h - new_rect.y);
        new_rect.children = [];
        rect.children += new_rect; // append to list
        n++;
    }
    if (!levelIsPAL && !flagsRead) {
        if (offset == 0) {
            if (maskWMV9 = ac2_get_bit())
                maskWMV9color = ac2_get_number(256);
        }
        WMV9RectIsCoded[offset] = ac2_get_number(2);
        flagsRead = 1;
    }
    n = 0;
    foreach crect in rect.children {
        WMV9RectRecursive(crect, n, depth + 1, !levelIsPal, flagsRead);
        n++;
    }
}
 
WMV9RectInfoDecode() {
    topLevelIsPal = ac2_get_bit();
   
    root_rect.x = 0;
    root_rect.y = 0;
    root_rect.w = codedWidth;
    root_rect.h = Height;
    root_rect.children = [];
    WMV9RectRecursive(root_rect, 0, 0, topLevelIsPal, 0);
    alignByteStream();
}
 
decodeWMV9Rect(rect) {
    < Initialize a WMV9 decoder. The sequence header, which is usually
    stored in the extradata for WM9 files, is implicit here, as follows:
    codec tag = MKTAG('W', 'M', 'V', '9')
    codedWidth/Height = rect.w/h (might be odd, round up if necessary)
   
    profile = PROFILE_MAIN
    res_y411 = 0
    res_sprite = 0
    frmrtq_postproc = 7
    bitrtq_postproc = 31
    s.loop_filter = 1
    res_x8 = 0
    multires = 0
    res_fasttx = 1
    fastuvmc = 0
    extended_mv = 0
    dquant = 1
    vstransform = 1
    res_transtab = 0
    overlap = 0
    s.resync_marker = 0
    rangered = 0
    s.max_b_frames = 0
    quantizer_mode = 0
    finterpflag = 0
    res_rtm_flag = 1
    Then read the frame header (ff_vc1_parse_frame_header()) and
    frame data (vc1_decode_i_blocks()).
   
    If maskWMV9 is set, blit the resulting picture only over the pixels
    that had maskWMV9color. >
}
 
==== Main frame decoding ====
 
===== Header =====
 
    isIntra = get_bits1();
    if (isIntra)
        skip_bits(7);
    hasWMV9 = getBits1();
    hasMotionVector = isIntra ? 0 : getBits1();
    isRLE = getBits1();
    isRLE555 = isRLE && getBits1();
    if (threadingSplit > 0)
        splitPosition = threadingSplit
    else if (threadingSplit == -1) {
        if (getBits1()) {
            if (getBits1()) {
                if (getBits1())
                    splitPosition = get_bits(16);
                else
                    splitPosition = get_bits(12);
            } else splitPosition = get_bits(8) << 4;
            splitPosition = readSplit
        } else {
            splitPosition = isIntra ? codedHeight / 2 : oldSplitPosition;
        }
        oldSplitPosition = splitPosition;
    }
    if (threadingSplit) {
        rectangle1 = {0, 0, codedWidth, splitPosition};
        rectangle2 = {0, splitPosition, codedWidth, codedHeight - splitPosition};
    } else {
        rectangle1 = {0, 0, codedWidth, codedHeight};
    }
    alignByteStream();
 
===== Frame =====
 
    if (isRLE555) {
        RLE555Decode(rectangle1);
        if (threadingSplit)
            RLE555Decode(rectangle2);
        return;
    }
    if (hasWMV9) WMV9RectInfoDecode();
    if (isIntra)
        getPalette(); /* similar to MSS1, but here both the number of
                        changed colors and the colors themselves are
                        directly read as bytes */
    else {
        if (hasMotionVector) {
            mvX = get_bits(16) - codedWidth;
            mvY = get_bits(16) - codedHeight;
        } else {
            mvX = mvY = 0;
        }
    }
    if (isRLE) {
        RLEDecode(rectangle1);
        if (threadingSplit)
            RLEDecode(rectangle2);
    } else {
        SubDivDecode(contextSet1, rectangle1);
        if (threadingSplit)
            SubDivDecode(contextSet2, rectangle2);
    }
    if (hasWMV9) {
        int i = 0;
        foreach rect in root_rect.children {
            if (WMV9RectIsCoded[i]) {
                WMV9codedFrameSize = get_le24();
                decodeWMV9Rect(rect);
            } else {
                fillGrey(rect); // fill with 128,128,128
            }
            i++;
        }
    }
 
==== V2 Arithmetic Coder ====
 
void ac2_init(AC2 *c, GetByteContext *gb)
{
    c->low  = 0;
    c->high  = 0xFFFFFF;
    c->value = bytestream2_get_be24(gb);
    c->gb    = gb;
}
 
void ac2_renorm(AC2 *c)
{
    while ((c->high >> 15) - (c->low >> 15) < 2) {
        if ((c->low ^ c->high) & 0x10000) {
            c->high  ^= 0x8000;
            c->value ^= 0x8000;
            c->low  ^= 0x8000;
        }
        c->high  = c->high  << 8 & 0xFFFFFF | 0xFF;
        c->value = c->value << 8 & 0xFFFFFF | bytestream2_get_byte(gb);
        c->low  = c->low  << 8 & 0xFFFFFF;
    }
}
 
int ac2_get_bit(AC2 *c);
// Identical to its MSS1 counterpart, except it renormalizes using ac2_renorm()
 
/* decodes a number dividing the range into two linear pieces: one
    whose values have probability 1 and another whose values have
    probability 2, so that it maps to n values ( range/2 < n <= range ) */
int ac2_get_scaled_value(int value, int n, int range) {
    split = (n << 1) - range;
    if (value > split)
      return split + (value - split >> 1);
    else
      return value;
}
 
/* rescales the interval considering the piecewise linear division */
void ac2_rescale_interval(AC2 *c, int range,
                                int low, int high, int n) {
    split = (n << 1) - range;
   
    if (high > split)
        c->high = split + (high - split << 1);
    else
        c->high = high;
    c->high += c->low;
    if (low > split)
        c->low += split + (low - split << 1);
    else
        c->low += low;
}
 
int ac2_get_number(AC2 *c, int n)
{
    int range = c->high - c->low + 1;
    int scale = av_log2(range) - av_log2(n);
    int val;
   
    if ( n << scale > range )
        scale--;
   
    n <<= scale;
   
    val = ac2_get_scaled_value(c->value - c->low, n, range) >> scale;
   
    ac2_rescale_interval(c, range, val << scale, (val + 1) << scale, n);
   
    ac2_renorm(c);
    return val;
}
 
int ac2_get_prob(AC2 *c, int *probs)
{
    int range = c->high - c->low + 1, n = *probs;
    int scale = av_log2(range) - av_log2(n);
    int i = 0, val;
    if ( n << scale > range )
        scale--;
    n <<= scale;
    val = ac2_get_scaled_value(c->value - c->low, n, range) >> scale;
    while (probs[++i] > val) ;
    ac2_rescale_interval(c, range, probs[i] << scale, probs[i-1] << scale, n);
    return i;
}
 
int ac2_get_model_sym(AC2 *c, Model *m);
// Identical to its MSS1 counterpart, except it gets the symbol index
// using ac2_get_prob() and renormalizes using ac2_renorm()
 
int ac2_get_consumed_byes(AC2 *c)
{
    int diff = (c->high >> 16) - (c->low >> 16);
    int bp  = bytestream2_tell(c->gb) - 3 << 3;
    int bits = 1;
    while (!(diff & 0x80)) {
        bits++;
        diff <<= 1;
    }
    return (bits + bp + 7 >> 3) + ((c->low >> 16) + 1 == c->high >> 16);
}
 
=== MSA1 Details ===
 
Internally it calls itself MS ATC Screen codec and MSS3.
 
The codec has several coding possibilities: fill area, decode area with prediction (somewhat like MSS1), decode with Haar transform, decode with 8x8 DCT.
 
==== Frame header ====
 
  0- 3 frame type (0x301 - intra, 0x300 - inter)
  4    should be always 1?
  5    should be always 0
  6- 9 should be 0x380
  10-11 probably x offset for the frame
  12-13 probably y offset for the frame
  14-15 probably frame width
  16-17 probably frame height
  18-21 ignored
  22    quality (used in image decoders, 0-100)
  23-26 seems to be always 1
 
The rest is range-coded data.
 
==== Codec organisation ====
 
MSA1 codes YUV 4:2:0 planes in 16x16 macroblocks.
 
  for (mb_y = 0; mb_y < mb_height; y++) {
      for (mb_x = 0; mb_x < mb_width; mb_x++) {
          btype = block_info[0]->get_type(acoder);
          coders[0][btype]->decode_block(acoder, Y, mb_x, mb_y);
          btype = block_info[1]->get_type(acoder);
          coders[1][btype]->decode_block(acoder, U, mb_x, mb_y);
          btype = block_info[2]->get_type(acoder);
          coders[2][btype]->decode_block(acoder, V, mb_x, mb_y);
      }
  }
 
Possible coders (starting from zero):
* solid fill (aka "smooth block")
* predicted image (aka "text block")
* DCT-coded block (aka "image block")
* Haar wavelet block (aka "hybrid block")
* skipped block
 
===== Range coder =====
 
 
Normalisation:
 
    for (;;) {
        c->range <<= 8;
        c->low  <<= 8;
        if (c->src < c->src_end) {
            c->low |= *c->src++;
        } else if (!c->low) {
            return error;
        }
        if (c->range >= 0x01000000)
            return 0;
    }
 
Reading bits:
 
    c->range >>= nbits;
    val = c->low / c->range;
    c->low -= c->range * val;
   
    if (c->range < RAC_BOTTOM)
        rac_normalise(c);
   
    return val;
 
Obtaining symbol from model:
 
    prob      = 0;
    prob2      = c->range;
    c->range >>= MODEL_SCALE;
    val        = 0;
    end        = model->num_syms >> 1;
    end2      = model->num_syms;
    do {
        helper = model->freqs[end] * c->range;
        if (helper <= c->low) {
            val  = end;
            prob  = helper;
        } else {
            end2  = end;
            prob2 = helper;
        }
        end = (end2 + val) >> 1;
    } while (end != val);
    c->low  -= prob;
    c->range = prob2 - prob;
    if (c->range < 0x01000000)
        rac_normalise(c);
   
    model_update(model, val);
   
    return val;
 
===== Model =====
 
 
Models used by coders:
 
{| border="1"
! Coder !! Designation !! Number!! Number of symbols
|-
| Block type || block type || 5 || 5
|-
| Fill block || number of bits for fill value || 1 || 12
|-
| Image block || cache size || 1 || 3
|-
| Image block || cache entry || 1 || 256
|-
| Image block || escape value || 1 || 256
|-
| Image block || cache value || 125 || 5
|-
| DCT block || coded AC element || 1 || 256
|-
| DCT block || DC length || 1 || 12
|-
| Haar block || some coefficients' length || 1 || 12
|-
| Haar block || other coefficients || 1 || 256
|-
|}
 
The main difference from plain models (like in MSS1) is that they update frequencies only after some iterations
and the frequencies are not simple cumulative weights:
 
 
  void model_update(Model *m, int val)
  {
      m->weights[val]++;
      m->times_till_update--;
      if (m->times_till_update)
          return;
      m->total_weight += m->update_value;
      if (m->total_weight > ...)
          // rescale weights
     
      scale = 0x80000000u / m->total_weight;
      for (i = 0; i < m->num_symbols; i++)
          m->freq[i] = m->weight[i] * scale >> (31 - m->scale);
      m->update_value = (m->update_value * 5) >> 2;
      if (m->update_value > 8 * m->num_symbols + 48)
          m->update_value = 8 * m->num_symbols + 48;
      m->times_till_update = m->update_value;
  }
 
There is also a special case for sign model - it has only two values and use <code>scale=13</code> instead of <code>15</code> for other models.
 
===== Block type decoding =====
 
Block type coder uses one of five models to decode block type, the number of model to use is the previously decoded block type (or 4 for the first block).
 
===== Solid fill =====
 
This coder decodes difference from the last fill value, uses it to restore new fill value and fill the region.
The difference is coded exactly like DC coefficient in DCT block.
 
===== Predicted image =====
 
This coder decodes cache index value with one of 125 cache models (<code>index = top_left_cache_val * 25 + top_cache_val * 5 + left_cache_val</code>). If decoded cache value is 5 then decode a pixel from model, otherwise retrieve it from cache position.
 
===== DCT-coded image =====
 
DC coefficient coding:
 
  code_len = get_model(acoder, dc_model);
  if (code_len > 0) {
      sign = get_bit(acoder);
      val  = (1 << (code_len - 1)) + get_bits(acoder, code_len - 1);
      if (sign)
          val = -val;
  } else {
      val  = 0;
  }
 
Then decoded value is added to the predicted DC value.
 
The rest of coefficients (sign is coded with special binary model):
 
  val  = decode_model(acoder, coef_model);
  skip = (val == 0xF0) ? 16 : (val >> 4);
  coef = val & 0xF;
  if (!coef && val != 0xF0) break; // last coded coefficient
  sign = get_binary_model(acoder, sign_model);
  coef = (1 << (coef - 1)) + get_bits(acoder, coef - 1);
  if (sign)
      coef = -coef;
 
Scan order is normal zigzag. There are two quantisation matrices (for luma and for chroma).
 
  if (quality >= 50)
      q = 200 - 2 * quality;
  else
      q = 5000 / quality;
  for (i = 0; i < 64; i++)
      quant[i] = 65536 / MAX((qmatrix[zigzag[i]] * q + 50) / 100, 1);
 
===== Wavelet-coded image =====
 
This coder codes quantised coefficients and restores them as:
 
  A1 a2 ... B1 b2 ...        ((A1 - B1) + (C1 - D1)) ((A1 + B1) - (C1 + D1)) ,,,
  ...      ....        ->  ((A1 - B1) + (C1 - D1)) ((A1 + B1) + (C1 + D1)) ...
  C1 c2 ... D1 d2 ...        ....                    ....
  ...      ....              ....                    ....
 
Coefficients are coded as DC coefficients or as model.
Quantiser is calculated as <code>17 - 7 * quality / 50</code>.
 
[[Category:Video Codecs]]
[[Category:Screen Capture Video Codecs]]
[[Category:Incomplete Video Codecs]]

Latest revision as of 09:48, 1 July 2012

Also known as Windows Media Screen Codec.

MSA1 is created by Live Meeting 2007


Some details about format

Both MSS1 and MSS2 are quite close (thus are decoded with single decoder). They employ arithmetic coding - real one, with probability coding. This coding is used with several adaptive models, which look a bit like PPM.

MSS1 details

MSS1 (aka Windows Media Screen V7 codec) compresses only palettised images.

Extradata format

(for some reason, data in .wmv is stored in big-endian order)

 4- 7  header length
 8-11  major version (1 for MSS1, 2 for MSS2)
12-15  minor version
16-19  display width
20-23  display height
24-27  coded width
28-31  coded height
32-35  frames per second (float)
36-39  bitrate
40-43  max lead time (float)
44-47  max lag time (float)
48-51  max seek time (float)
52-55  nFreeColors
56-823 palette (256 RGB triplets)

Only for MSS2:

824-827 threadingSplit (domain: -1, 0, 1..codedH)
828-831 numSymbolsEscapeModel (domain: 0..256)

Both width and height must be in the range 1..4096.

Frame format

Codec uses arithmetic decoders for all operations and adaptive models. All code for them is suspiciously similar to the one in | 1987 paper by Witten, Neal and Cleary.

Codec uses delta compression and can change top palette entries with every intra frame:

 is_inter = coder->decode_bit();
 if (!is_inter) {
     if (nFreeColors) {
         num_entries = coder->decode_number(nFreeColors + 1);
         for (i = 0; i < num_entries; i++) {
             pal[(256 - nFreeColors) + i].R = coder->decode_bits(8);
             pal[(256 - nFreeColors) + i].G = coder->decode_bits(8);
             pal[(256 - nFreeColors) + i].B = coder->decode_bits(8);
         }
     }
     recursive_decode_intra(0, 0, width, height);
 } else {
     recursive_decode_inter(0, 0, width, height);
 }

Frame coding is done by recursively partitioning picture horizontally or vertically and coding partitions in some way:

 recursive_decode_intra(x, y, width, height) {
     mode = coder->decode_model(split_mode_model);
     switch (mode) {
     case 0:
         pivot = decode_pivot(height);
         recursive_decode_intra(x, y, width, pivot);
         recursive_decode_intra(x, y + pivot, width, height - pivot);
         break;
     case 1:
         pivot = decode_pivot(width);
         recursive_decode_intra(x, y, pivot, height);
         recursive_decode_intra(x + pivot, y, width - pivot, height
         break;
     case 2:
         mode = coder->decode_model(intra_decode_model);
         if (!mode) {
             pix = decode_pixel();
             fill_rect(x, y, width, height, pixel);
         } else {
             decode_area(x, y, width, height);
         }
         break;
     }
 }
 
 recursive_decode_inter(x, y, width, height) {
     mode = coder->decode_model(split_mode_model);
     switch (mode) {
     case 0:
         pivot = decode_pivot(height);
         recursive_decode_inter(x, y, width, pivot);
         recursive_decode_inter(x, y + pivot, width, height - pivot);
         break;
     case 1:
         pivot = decode_pivot(width);
         recursive_decode_inter(x, y, pivot, height);
         recursive_decode_inter(x + pivot, y, width - pivot, height
         break;
     case 2:
         mode = coder->decode_model(inter_decode_model);
         if (!mode) {
             pix = decode_pixel();
             // same meaning as mask values, see below
             // for MSS2, pix == 4 means a motion compensated rectangle
             if (pix != 0xFF) {
                 copy_rect(x, y, width, height, pixel);
             } else {
                 mode = coder->decode_model(intra_decode_model);
                 if (!mode) {
                     pix = decode_pixel();
                     fill_rect(x, y, width, height, pixel);
                 } else {
                     decode_area(x, y, width, height);
                 }
             }
         } else {
             // this decoded change mask first and then
             // checks - if mask value is 0xFF then decode pixel
             // otherwise copy if from the previous frame
             mask = decode_area(x, y, width, height);
             decode_area_masked(x, y, width, height);
         }
         break;
     }
 }

Mask values:

Type Value in MSS1 Value in MSS2
copy from same location 0x80 0x02
copy motion compensated N/A 0x04
decode new 0xFF 0x01

In decode_area_masked(), decode new pixels as described in "Context modeller" even if the neighboring pixels were copied.

other decoding routines

Decoding pivot point:

 decode_pivot(ref_value) {
     edge  = coder->decode_model(edge_model);
     coord = coder->decode_model(pivot_model) + 1;
     if (coord > 2)
         coord = coder->decode_number((ref_value + 1) / 2 - 2) + 3;
     if (edge)
         return ref_value - coord;
     else
         return coord;
 }

Decoding pixels is not that trivial. Codec uses neighbour pixels (left, top-left, top, top-right) to form a cache which is used along with cached move-to-front queue and several models to restore pixel.

Models

Models are reinitialised at every intraframe. Initially all symbols have weigth = 1. With every update weight is increased by one and when they're too large they get rescaled.

Rescaling weights is performed when total cumulative probability is bigger than threshold, which can be static or adaptive. Static threshold is calculated as num_symbols * symbol_threshold, adaptive one is recalculated every time as min(0x3FFF, ((2 * weights[num_symbols] - 1) / 2 + 4 * cumulative_probability[0]) / (2 * weights[num_symbols] - 1)).

Scaling weights is simply weight' = (weight + 1) >> 1.

Main models:

Name Purpose Number of symbols Threshold per symbol
intra_decode_model region decoding mode for intra (solid fill or not) 2 adaptive
inter_decode_model region decoding mode for inter (full region decoder or masked) 2 adaptive
split_mode_model region split mode (horizontal/vertical/none) 3 50
edge_model signals from which edge pivot point is decoded 2 50
pivot_model rough coordinates for pivot point (1, 2, escape) 3 15

Context modeller

Context modeller is used for modelling pixel context by using its neighbours and caching last decoded values. There are two context modellers used by decoder — one for decoding picture data (in both kinds of frames), another one is used solely for decoding mask in interframes.

Modeller components (values in {brackets} are for MSS2):

  • last decoded pixels cache (8 for picture data, 2 {3} for mask), initially filled with 0, 1, 2... and reset to that every intraframe
  • primary model for decoding pixel ((cache_size + 1) symbols, 15 symbol threshold)
  • escape model for decoding pixel value not in cache (256 {numSymbolsEscapeModel} symbols, 50 symbol threshold)
  • secondary models for context-modelled pixels, four layers of models for different combinations of non-equal neighbours:
    • first layer - 1x4 models (2 symbols, adaptive symbol threshold)
    • second layer - 7x4 models (3 symbols, 15 symbol threshold)
    • third layer - 6x4 models (4 symbols, 15 symbol threshold)
    • fourth layer - 1x4 models (5 symbols, 15 symbol threshold)

Decoding top left pixel (for it no neighbourhood is provided):

 val = coder->decode_model(modeller->primary_model);
 if (val < modeller->cache_size) {
     pix = modeller->cache[pix];
     if pix is found in the provided neighbourhood, insert it to the first position in the cache
       (it doesn't matter if it's already in the cache)
     else move it to the first position shifting other values by one
 } else {
     pix = coder->decode_model(modeller->escape_model);
     if pix is found in cache, move it to the first position shifting other values by one
     else just insert it at the first position in cache
 }

Decoding other pixels:

 get neighbourhood (left, top, top-right and top-left pixels)
 select secondary model depending on neighbourhood
 if decoded value is less than number of neighbours, pick corresponding neighbour
 else decode pixel like top left one but provide neighbourhood for the reference this time

Determine neighborhood as:

If top pixel isn't available (first row): top = top-right = top-left = left
    (left is available, as it was decoded above)

If right pixel isn't available (last column): top-right = top

If left pixel isn't available (first column): left = top-left = top

If neither right nor left are available (single column):  top-right = top-left = left = top

note: pixels outside the current area aren't considered available

Determine secondary model as:

layer = number of different neighborhoods (1 if all equal, 4 if all different, 2 if ABBB or AABB
or ABBA or any other such combination, 3 if ABCC or ABBC or ABCA or any other such combination)

sublayer = identify which neighborhoods are equal to each other. For example:

if layer == 1:    # all equal
    sublayer = 0

if layer == 2:    # 2-2 or 3-1
    if top == topLeft:
        if topRight == topLeft:
            sublayer = 3
        elsif left == topLeft:
            sublayer = 2
        else:
            sublayer = 4
    elsif topRight == topLeft:
        if left == topLeft:
            sublayer = 1
        else:
            sublayer = 5
    else
        if left == topLeft:
            sublayer = 6
        else:
            sublayer = 0

if layer == 3:    # 2-1-1
    if top == topLeft:
        sublayer = 0
    elsif topRight == topLeft:
        sublayer = 1
    elsif left == topLeft:
        sublayer = 2
    elsif topRight == top:
        sublayer = 3
    elsif left == top:
        sublayer = 4
    else
        sublayer = 5

if layer == 4:    # all different
    sublayer = 0

subsublayer = 0
if left-left pixel is available (column >= 2) and its value is equal to the left pixel:
    subsublayer += 1
if top-top pixel is available (row >= 2) and its value is equal to the top pixel:
    subsublayer += 2

Last decoded pixels cache use:

This cache internally has 4 more entries (12 total for picture data, 6 {7} for mask). The extra entries are to skip neighboring colors which we already know aren't the ones we're looking for.

Example:

Get neiborhood pixels, in this order: topLeft = 140, top = 134, topRight = 140, left = 136 Remove duplicates: [140, 134, 136] We have 3 unique colors, therefore we use the third layer in the secondary model. Since topRight == topLeft, we use sublayer 1. The subsublayer doesn't mater for the sake of this example.

Now we fetch a value x using the corresponding secondary model:

if x == 0, output 140

if x == 1, output 134

if x == 2, output 136

if x == 3, the secondary model can't code the color. Fall back to the primary model to try and decode it from the cache.

Assume the cache contents are [25, 140, 136, 134, 50, 23, ...

If the primary model returned 0, output 25

If it returned 1, since we know the color isn't 134, 136, or 140, output 50

If it returned 2, output 23, and so on, until 8 which means the color isn't in the cache either and we have to fall back to the escape model. In this example, the last cache entry was unreachable. For the top-left pixel, there are zero neighbors and the last 4 entries are unreachable.

MSS2 (Windows Media Video 9 Screen codec) details

In MSS2, the frame header, RLE modes, palette updates, motion vector coding and WMV9 data are not arithmetic coded, whereas the rectangle info data and paletted recursive subdivision modes are. Each block is byte-aligned and consumes a integral number of bytes. The coders have to be re-initialized between blocks, even if they are of the same type.

alignByteStream() {
   < Align to byte boundary discarding any partially read bytes.
     When using VLC decoding, use
   get_bits_count() + 7 >> 3;
     to determine the number of consumed bytes. For the AC
     portions, use
   ac2_get_consumed_byes(); >
}
RLE555Decode(x, y, w, h) {
   //outputs RGB555
   
   if (!isIntra) {
       x = get_bits(12);
       w = x - get_bits(12) + 1;
       y = y + get_bits(12);
       h = y - get_bits(12) + 1;
   }

   for each pixel in the (x, y, w, h) rectangle:
   read a byte, and switch:
       0..127, 134..255:
           read another byte b and put the color (byte<<8 && b)
           #note that if the byte was >133, this will generate colors >32767
       128: copy from prev line
       129: copy from prev frame (leave unchanged)
       130..133:
           r = 0
           repeat (value-130) times:
               r = (r << 8) + (read another byte)
           r += 1
           repeat the previous decoded symbol r times
           #the previous symbol can be a color or a copy instruction
}
RLEDecode(x, y, w, h) {
   if (!isIntra) {
       x = get_bits(12);
       y = get_bits(12);
       w = get_bits(12) + 1;
       h = get_bits(12) + 1;
   }

   // This mode uses a single tree of VLC codes.
   // It is built using the code lengths, which are read as follows:
   usedCodes = 0
   currentCodeLength = 1
   loop:
       remainingCodes = (1 << currentCodeLength) - usedCodes
       codesOfThisLength = get_bits(ceil_log2(remainingCodes + 1))
       if codesOfThisLength == remainingCodes:
           we're done, all of the remainign codes have the current length
       otherwise, for each codesOfThisLength:
           x = get 8 bits
           if x < 190:
               addcode(get_bits1() + (x << 1) - 190, currentCodeLength)
           if x < 204 - isIntra:
               addcode(x, currentCodeLength)
           #if x >= 204:
           addcode(x + 14 - isIntra, currentCodeLength)
       usedCodes = (usedCodes + codesOfThisLength) << 1
       currentCodeLength++

   // main decoding loop
   for each pixel in the (x, y, w, h) rectangle:
   read an VLC code using the tree generated above, and switch:
       0-255: put that color
       256-267:
           q = value - 256
           if q == 11:
               q = get_bits(4) + 10
           if !q: r=1
           else: r = get_bits(q) + 1
           while q--:
               r += 1 << q
           repeat the previous symbol r times
       268: copy from prev line
       269: copy from prev frame (leave unchanged)

   alignByteStream();
}
SubDivDecode(contextSet, x, y, w, h) {
   < load the corresponding contextSet, with includes
     all the 5 main models plus both color contexts    >
   if (isIntra) {
       reset_contextSet();
       recursive_decode_intra(x, y, w, h); // same as MSS1, with the differences outlined above
   } else
       recursive_decode_inter(x, y, w, h); // same as MSS1, with the differences outlined above
   alignByteStream();
}

The WMV9 rectangle coordinates are read like a tree, but we're only interested on the list that will be placed on root_rect.children. There shouldn't be any grandchildren. levelIsPal determines whether the root rect is a paletted subdivision node and its children are WMV9 nodes or viceversa. In practice it should always be 1. There's a limit of 20 WMV9 rectangles per frame.

WMV9RectRecursive(rect, offset, depth, levelIsPal, flagsRead) {
   int n = 0;
   while (ac2_get_bit()) {
       if (!n)
           new_rect.x = ac2_get_number(rect.w);
       else
           new_rect.x = ac2_get_number(rect.h - rect.children[n-1].x) +
               rect.children[n-1].x;
       new_rect.y = ac2_get_number(rect.h);
       new_rect.w = ac2_get_number(rect.w - new_rect.x);
       new_rect.h = ac2_get_number(rect.h - new_rect.y);
       new_rect.children = [];
       rect.children += new_rect; // append to list
       n++;
   }
   if (!levelIsPAL && !flagsRead) {
       if (offset == 0) {
           if (maskWMV9 = ac2_get_bit())
               maskWMV9color = ac2_get_number(256);
       }
       WMV9RectIsCoded[offset] = ac2_get_number(2);
       flagsRead = 1;
   }
   n = 0;
   foreach crect in rect.children {
       WMV9RectRecursive(crect, n, depth + 1, !levelIsPal, flagsRead);
       n++;
   }
}
WMV9RectInfoDecode() {

   topLevelIsPal = ac2_get_bit();
   
   root_rect.x = 0;
   root_rect.y = 0;
   root_rect.w = codedWidth;
   root_rect.h = Height;
   root_rect.children = [];

   WMV9RectRecursive(root_rect, 0, 0, topLevelIsPal, 0);

   alignByteStream();
}
decodeWMV9Rect(rect) {
   < Initialize a WMV9 decoder. The sequence header, which is usually
   stored in the extradata for WM9 files, is implicit here, as follows:

   codec tag = MKTAG('W', 'M', 'V', '9')
   codedWidth/Height = rect.w/h (might be odd, round up if necessary)
   
   profile = PROFILE_MAIN
   res_y411 = 0
   res_sprite = 0
   frmrtq_postproc = 7
   bitrtq_postproc = 31
   s.loop_filter = 1
   res_x8 = 0
   multires = 0
   res_fasttx = 1
   fastuvmc = 0
   extended_mv = 0
   dquant = 1
   vstransform = 1
   res_transtab = 0
   overlap = 0
   s.resync_marker = 0
   rangered = 0
   s.max_b_frames = 0
   quantizer_mode = 0
   finterpflag = 0
   res_rtm_flag = 1

   Then read the frame header (ff_vc1_parse_frame_header()) and
   frame data (vc1_decode_i_blocks()).
   
   If maskWMV9 is set, blit the resulting picture only over the pixels
   that had maskWMV9color. >
}

Main frame decoding

Header
   isIntra = get_bits1();
   if (isIntra)
       skip_bits(7);
   hasWMV9 = getBits1();
   hasMotionVector = isIntra ? 0 : getBits1();
   isRLE = getBits1();
   isRLE555 = isRLE && getBits1();
   if (threadingSplit > 0)
       splitPosition = threadingSplit
   else if (threadingSplit == -1) {
       if (getBits1()) {
           if (getBits1()) {
               if (getBits1())
                   splitPosition = get_bits(16);
               else
                   splitPosition = get_bits(12);
           } else splitPosition = get_bits(8) << 4;
           splitPosition = readSplit
       } else {
           splitPosition = isIntra ? codedHeight / 2 : oldSplitPosition;
       }
       oldSplitPosition = splitPosition;
   }

   if (threadingSplit) {
       rectangle1 = {0, 0, codedWidth, splitPosition};
       rectangle2 = {0, splitPosition, codedWidth, codedHeight - splitPosition};
   } else {
       rectangle1 = {0, 0, codedWidth, codedHeight};
   }

   alignByteStream();
Frame
   if (isRLE555) {
       RLE555Decode(rectangle1);
       if (threadingSplit)
           RLE555Decode(rectangle2);
       return;
   }

   if (hasWMV9) WMV9RectInfoDecode();
   if (isIntra)
       getPalette(); /* similar to MSS1, but here both the number of
                        changed colors and the colors themselves are
                        directly read as bytes */
   else {
       if (hasMotionVector) {
           mvX = get_bits(16) - codedWidth;
           mvY = get_bits(16) - codedHeight;
       } else {
           mvX = mvY = 0;
       }
   }

   if (isRLE) {
       RLEDecode(rectangle1);
       if (threadingSplit)
           RLEDecode(rectangle2);
   } else {
       SubDivDecode(contextSet1, rectangle1);
       if (threadingSplit)
           SubDivDecode(contextSet2, rectangle2);
   }

   if (hasWMV9) {
       int i = 0;
       foreach rect in root_rect.children {
           if (WMV9RectIsCoded[i]) {
               WMV9codedFrameSize = get_le24();
               decodeWMV9Rect(rect);
           } else {
               fillGrey(rect); // fill with 128,128,128
           }
           i++;
       }
   }

V2 Arithmetic Coder

void ac2_init(AC2 *c, GetByteContext *gb)
{
   c->low   = 0;
   c->high  = 0xFFFFFF;
   c->value = bytestream2_get_be24(gb);
   c->gb    = gb;
}
void ac2_renorm(AC2 *c)
{
   while ((c->high >> 15) - (c->low >> 15) < 2) {
       if ((c->low ^ c->high) & 0x10000) {
           c->high  ^= 0x8000;
           c->value ^= 0x8000;
           c->low   ^= 0x8000;
       }
       c->high  = c->high  << 8 & 0xFFFFFF | 0xFF;
       c->value = c->value << 8 & 0xFFFFFF | bytestream2_get_byte(gb);
       c->low   = c->low   << 8 & 0xFFFFFF;
   }
}
int ac2_get_bit(AC2 *c);
// Identical to its MSS1 counterpart, except it renormalizes using ac2_renorm()
/* decodes a number dividing the range into two linear pieces: one
   whose values have probability 1 and another whose values have
   probability 2, so that it maps to n values ( range/2 < n <= range ) */
int ac2_get_scaled_value(int value, int n, int range) {
   split = (n << 1) - range;
   if (value > split)
     return split + (value - split >> 1);
   else
     return value;
}
/* rescales the interval considering the piecewise linear division */
void ac2_rescale_interval(AC2 *c, int range,
                               int low, int high, int n) {
   split = (n << 1) - range;
   
   if (high > split)
       c->high = split + (high - split << 1);
   else
       c->high = high;

   c->high += c->low;

   if (low > split)
       c->low += split + (low - split << 1);
   else
       c->low += low;
}
int ac2_get_number(AC2 *c, int n)
{
   int range = c->high - c->low + 1;
   int scale = av_log2(range) - av_log2(n);
   int val;
   
   if ( n << scale > range )
       scale--;
   
   n <<= scale;
   
   val = ac2_get_scaled_value(c->value - c->low, n, range) >> scale;
   
   ac2_rescale_interval(c, range, val << scale, (val + 1) << scale, n);
   
   ac2_renorm(c);

   return val;
}
int ac2_get_prob(AC2 *c, int *probs)
{
   int range = c->high - c->low + 1, n = *probs;
   int scale = av_log2(range) - av_log2(n);
   int i = 0, val;

   if ( n << scale > range )
       scale--;

   n <<= scale;

   val = ac2_get_scaled_value(c->value - c->low, n, range) >> scale;
   while (probs[++i] > val) ;

   ac2_rescale_interval(c, range, probs[i] << scale, probs[i-1] << scale, n);

   return i;
}
int ac2_get_model_sym(AC2 *c, Model *m);
// Identical to its MSS1 counterpart, except it gets the symbol index
// using ac2_get_prob() and renormalizes using ac2_renorm()
int ac2_get_consumed_byes(AC2 *c)
{
   int diff = (c->high >> 16) - (c->low >> 16);
   int bp   = bytestream2_tell(c->gb) - 3 << 3;
   int bits = 1;

   while (!(diff & 0x80)) {
       bits++;
       diff <<= 1;
   }

   return (bits + bp + 7 >> 3) + ((c->low >> 16) + 1 == c->high >> 16);
}

MSA1 Details

Internally it calls itself MS ATC Screen codec and MSS3.

The codec has several coding possibilities: fill area, decode area with prediction (somewhat like MSS1), decode with Haar transform, decode with 8x8 DCT.

Frame header

  0- 3 frame type (0x301 - intra, 0x300 - inter)
  4    should be always 1?
  5    should be always 0
  6- 9 should be 0x380
 10-11 probably x offset for the frame
 12-13 probably y offset for the frame
 14-15 probably frame width
 16-17 probably frame height
 18-21 ignored
 22    quality (used in image decoders, 0-100)
 23-26 seems to be always 1

The rest is range-coded data.

Codec organisation

MSA1 codes YUV 4:2:0 planes in 16x16 macroblocks.

 for (mb_y = 0; mb_y < mb_height; y++) {
     for (mb_x = 0; mb_x < mb_width; mb_x++) {
         btype = block_info[0]->get_type(acoder);
         coders[0][btype]->decode_block(acoder, Y, mb_x, mb_y);
         btype = block_info[1]->get_type(acoder);
         coders[1][btype]->decode_block(acoder, U, mb_x, mb_y);
         btype = block_info[2]->get_type(acoder);
         coders[2][btype]->decode_block(acoder, V, mb_x, mb_y);
     }
 }

Possible coders (starting from zero):

  • solid fill (aka "smooth block")
  • predicted image (aka "text block")
  • DCT-coded block (aka "image block")
  • Haar wavelet block (aka "hybrid block")
  • skipped block
Range coder

Normalisation:

   for (;;) {
       c->range <<= 8;
       c->low   <<= 8;
       if (c->src < c->src_end) {
           c->low |= *c->src++;
       } else if (!c->low) {
           return error;
       }
       if (c->range >= 0x01000000)
           return 0;
   }

Reading bits:

   c->range >>= nbits;
   val = c->low / c->range;
   c->low -= c->range * val;
   
   if (c->range < RAC_BOTTOM)
       rac_normalise(c);
   
   return val;

Obtaining symbol from model:

   prob       = 0;
   prob2      = c->range;
   c->range >>= MODEL_SCALE;
   val        = 0;
   end        = model->num_syms >> 1;
   end2       = model->num_syms;
   do {
       helper = model->freqs[end] * c->range;
       if (helper <= c->low) {
           val   = end;
           prob  = helper;
       } else {
           end2  = end;
           prob2 = helper;
       }
       end = (end2 + val) >> 1;
   } while (end != val);
   c->low  -= prob;
   c->range = prob2 - prob;
   if (c->range < 0x01000000)
       rac_normalise(c);
   
   model_update(model, val);
   
   return val;
Model

Models used by coders:

Coder Designation Number Number of symbols
Block type block type 5 5
Fill block number of bits for fill value 1 12
Image block cache size 1 3
Image block cache entry 1 256
Image block escape value 1 256
Image block cache value 125 5
DCT block coded AC element 1 256
DCT block DC length 1 12
Haar block some coefficients' length 1 12
Haar block other coefficients 1 256

The main difference from plain models (like in MSS1) is that they update frequencies only after some iterations and the frequencies are not simple cumulative weights:


 void model_update(Model *m, int val)
 {
     m->weights[val]++;
     m->times_till_update--;
     if (m->times_till_update)
         return;
     m->total_weight += m->update_value;
     if (m->total_weight > ...)
         // rescale weights
     
     scale = 0x80000000u / m->total_weight;
     for (i = 0; i < m->num_symbols; i++)
         m->freq[i] = m->weight[i] * scale >> (31 - m->scale);
     m->update_value = (m->update_value * 5) >> 2;
     if (m->update_value > 8 * m->num_symbols + 48)
         m->update_value = 8 * m->num_symbols + 48;
     m->times_till_update = m->update_value;
 }

There is also a special case for sign model - it has only two values and use scale=13 instead of 15 for other models.

Block type decoding

Block type coder uses one of five models to decode block type, the number of model to use is the previously decoded block type (or 4 for the first block).

Solid fill

This coder decodes difference from the last fill value, uses it to restore new fill value and fill the region. The difference is coded exactly like DC coefficient in DCT block.

Predicted image

This coder decodes cache index value with one of 125 cache models (index = top_left_cache_val * 25 + top_cache_val * 5 + left_cache_val). If decoded cache value is 5 then decode a pixel from model, otherwise retrieve it from cache position.

DCT-coded image

DC coefficient coding:

 code_len = get_model(acoder, dc_model);
 if (code_len > 0) {
     sign = get_bit(acoder);
     val  = (1 << (code_len - 1)) + get_bits(acoder, code_len - 1);
     if (sign)
         val = -val;
 } else {
     val  = 0;
 }

Then decoded value is added to the predicted DC value.

The rest of coefficients (sign is coded with special binary model):

 val  = decode_model(acoder, coef_model);
 skip = (val == 0xF0) ? 16 : (val >> 4);
 coef = val & 0xF;
 if (!coef && val != 0xF0) break; // last coded coefficient
 sign = get_binary_model(acoder, sign_model);
 coef = (1 << (coef - 1)) + get_bits(acoder, coef - 1);
 if (sign)
     coef = -coef;

Scan order is normal zigzag. There are two quantisation matrices (for luma and for chroma).

 if (quality >= 50)
     q = 200 - 2 * quality;
 else
     q = 5000 / quality;
 for (i = 0; i < 64; i++)
     quant[i] = 65536 / MAX((qmatrix[zigzag[i]] * q + 50) / 100, 1);
Wavelet-coded image

This coder codes quantised coefficients and restores them as:

 A1 a2 ... B1 b2 ...         ((A1 - B1) + (C1 - D1)) ((A1 + B1) - (C1 + D1)) ,,,
 ...       ....         ->   ((A1 - B1) + (C1 - D1)) ((A1 + B1) + (C1 + D1)) ...
 C1 c2 ... D1 d2 ...         ....                    ....
 ...       ....              ....                    ....

Coefficients are coded as DC coefficients or as model. Quantiser is calculated as 17 - 7 * quality / 50.