Microsoft Screen Codec: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
Line 140: Line 140:
Codec uses neighbour pixels (left, top-left, top, top-right) to form a cache which is used
Codec uses neighbour pixels (left, top-left, top, top-right) to form a cache which is used
along with cached move-to-front queue and several models to restore pixel.
along with cached move-to-front queue and several models to restore pixel.
==== Models ====
Models are reinitialised at every intraframe. Initially all symbols have weigth = 1.
With every update weight is increased by one and when they're too large they get rescaled.
Rescaling weights is performed when total cumulative probability is bigger than threshold, which can be static or adaptive.
Static threshold is calculated as <code>num_symbols * symbol_threshold</code>, adaptive one is recalculated every time as
<code>min(0x3FFF, ((2 * weights[num_symbols] - 1) / 2 + 4 * cumulative_probability[0]) / (2 * weights[num_symbols] - 1))</code>.
Scaling weights is simply <code>weight' = (weight + 1) >> 1</code>.
Main models:
{| border="1"
! Name !! Purpose !! Number of symbols !! Threshold per symbol
|-
| intra_decode_model || region decoding mode for intra (solid fill or not) || 2 || adaptive
|-
| inter_decode_model || region decoding mode for inter (full region decoder or masked) || 2 || adaptive
|-
| split_mode_model || region split mode (horizontal/vertical/none) || 3 || 50
|-
| edge_model || signals from which edge pivot point is decoded  || 2 || 50
|-
| pivot_model || rough coordinates for pivot point (1, 2, escape) || 3 || 15
|-
|}
==== Context modeller ====
Context modeller is used for modelling pixel context by using its neighbours and caching last decoded values.
There are two context modellers used by decoder — one for decoding picture data (in both kinds of frames),
another one is used solely for decoding mask in interframes.
Modeller components:
* last decoded pixels (8 for picture data, 2 for mask)
* primary model for decoding pixel (<code>(cache_size + 1)</code> symbols, <code>15</code> symbol threshold)
* escape model for decoding pixel value not in cache (<code>256</code> symbols, <code>50</code> symbol threshold)
* secondary models for context-modelled pixels, four layers of models for different combinations of non-equal neighbours:
** first layer - 1x4 models (<code>2</code> symbols, adaptive symbol threshold)
** second layer - 7x4 models (<code>3</code> symbols, <code>15</code> symbol threshold)
** third layer - 6x4 models (<code>4</code> symbols, <code>15</code> symbol threshold)
** fourth layer - 1x4 models (<code>5</code> symbols, <code>15</code> symbol threshold)
Decoding top left pixel (for it no neighbourhood is provided):
  val = coder->decode_model(modeller->primary_model);
  if (val < modeller->cache_size) {
      pix = modeller->cache[pix];
      if pix is found in the provided neighbourhood, insert it to the first position in the cache
        (it doesn't matter if it's already in the cache)
      else move it to the first position shifting other values by one
  } else {
      pix = coder->decode_model(modeller->escape_model);
      if pix is found in cache, move it to the first position shifting other values by one
      else just insert it at the first position in cache
  }
Decoding other pixels:
  get neighbourhood (left, top, top-right and top-left pixels)
  select secondary model depending on neighbourhood
  if decoded value is less than number of neighbours, pick corresponding neighbour
  else decode pixel like top left one but provide neighbourhood for the reference this time


[[Category:Video Codecs]]
[[Category:Video Codecs]]
[[Category:Screen Capture Video Codecs]]
[[Category:Screen Capture Video Codecs]]
[[Category:Undiscovered Video Codecs]]
[[Category:Undiscovered Video Codecs]]

Revision as of 07:04, 10 June 2012

Also known as Windows Media Screen Codec.

MSA1 is created by Live Meeting 2007


Some details about format

Both MSS1 and MSS2 are quite close (thus are decoded with single decoder). They employ arithmetic coding - real one, with probability coding. This coding is used with several adaptive models, which look a bit like PPM.

MSS1 details

MSS1 (aka Windows Media Screen codec) compresses only palletised images.

Extradata format

(for some reason, data in .wmv is stored in big-endian order)

 4- 7  header length
 8-11  major version
12-15  minor version
16-19  display width
20-23  display height
24-27  coded width
28-31  coded height
32-35  frames per second (float)
36-39  bitrate
40-43  max lead time (float)
44-47  max lag time
48-51  max seek time
52-55  nFreeColors
56-... palette (256 RGB triplets)

Frame format

Codec uses arithmetic decoders for all operations and adaptive models. All code for them is suspiciously similar to the one in | 1987 paper by Witten, Neal and Cleary.

Codec uses delta compression and can change top palette entries with every intra frame:

 is_inter = coder->decode_bit();
 if (!is_inter) {
     if (nFreeColors) {
         num_entries = coder->decode_number(nFreeColors + 1);
         for (i = 0; i < num_entries; i++) {
             pal[(256 - nFreeColors) + i].R = coder->decode_bits(8);
             pal[(256 - nFreeColors) + i].G = coder->decode_bits(8);
             pal[(256 - nFreeColors) + i].B = coder->decode_bits(8);
         }
     }
     recursive_decode_intra(0, 0, width, height);
 } else {
     recursive_decode_inter(0, 0, width, height);
 }

Frame coding is done by recursively partitioning picture horizontally or vertically and coding partitions in some way:

 recursive_decode_intra(x, y, width, height) {
     mode = coder->decode_model(split_mode_model);
     switch (mode) {
     case 0:
         pivot = decode_pivot(height);
         recursive_decode_intra(x, y, width, pivot);
         recursive_decode_intra(x, y + pivot, width, height - pivot);
         break;
     case 1:
         pivot = decode_pivot(width);
         recursive_decode_intra(x, y, pivot, height);
         recursive_decode_intra(x + pivot, y, width - pivot, height
         break;
     case 2:
         mode = coder->decode_model(intra_decode_model);
         if (!mode) {
             pix = decode_pixel();
             fill_rect(x, y, width, height, pixel);
         } else {
             decode_area(x, y, width, height);
         }
         break;
     }
 }
 
 recursive_decode_inter(x, y, width, height) {
     mode = coder->decode_model(split_mode_model);
     switch (mode) {
     case 0:
         pivot = decode_pivot(height);
         recursive_decode_inter(x, y, width, pivot);
         recursive_decode_inter(x, y + pivot, width, height - pivot);
         break;
     case 1:
         pivot = decode_pivot(width);
         recursive_decode_inter(x, y, pivot, height);
         recursive_decode_inter(x + pivot, y, width - pivot, height
         break;
     case 2:
         mode = coder->decode_model(inter_decode_model);
         if (!mode) {
             pix = decode_pixel();
             if (pix != 0xFF) {
                 copy_rect(x, y, width, height, pixel);
             } else {
                 mode = coder->decode_model(intra_decode_model);
                 if (!mode) {
                     pix = decode_pixel();
                     fill_rect(x, y, width, height, pixel);
                 } else {
                     decode_area(x, y, width, height);
                 }
             }
         } else {
             // this decoded change mask first and then
             // checks - if mask value is 0xFF then decode pixel
             // otherwise copy if from the previous frame
             mask = decode_area(x, y, width, height);
             decode_area_masked(x, y, width, height);
         }
         break;
     }
 }

other decoding routines

Decoding pivot point:

 decode_pivot(ref_value) {
     edge  = coder->decode_model(edge_model);
     coord = coder->decode_model(pivot_model) + 1;
     if (coord > 2)
         coord = coder->decode_number((ref_value + 1) / 2 - 2) + 3;
     if (edge)
         return ref_value - coord;
     else
         return coord;
 }

Decoding pixels is not that trivial. Codec uses neighbour pixels (left, top-left, top, top-right) to form a cache which is used along with cached move-to-front queue and several models to restore pixel.

Models

Models are reinitialised at every intraframe. Initially all symbols have weigth = 1. With every update weight is increased by one and when they're too large they get rescaled.

Rescaling weights is performed when total cumulative probability is bigger than threshold, which can be static or adaptive. Static threshold is calculated as num_symbols * symbol_threshold, adaptive one is recalculated every time as min(0x3FFF, ((2 * weights[num_symbols] - 1) / 2 + 4 * cumulative_probability[0]) / (2 * weights[num_symbols] - 1)).

Scaling weights is simply weight' = (weight + 1) >> 1.

Main models:

Name Purpose Number of symbols Threshold per symbol
intra_decode_model region decoding mode for intra (solid fill or not) 2 adaptive
inter_decode_model region decoding mode for inter (full region decoder or masked) 2 adaptive
split_mode_model region split mode (horizontal/vertical/none) 3 50
edge_model signals from which edge pivot point is decoded 2 50
pivot_model rough coordinates for pivot point (1, 2, escape) 3 15

Context modeller

Context modeller is used for modelling pixel context by using its neighbours and caching last decoded values. There are two context modellers used by decoder — one for decoding picture data (in both kinds of frames), another one is used solely for decoding mask in interframes.

Modeller components:

  • last decoded pixels (8 for picture data, 2 for mask)
  • primary model for decoding pixel ((cache_size + 1) symbols, 15 symbol threshold)
  • escape model for decoding pixel value not in cache (256 symbols, 50 symbol threshold)
  • secondary models for context-modelled pixels, four layers of models for different combinations of non-equal neighbours:
    • first layer - 1x4 models (2 symbols, adaptive symbol threshold)
    • second layer - 7x4 models (3 symbols, 15 symbol threshold)
    • third layer - 6x4 models (4 symbols, 15 symbol threshold)
    • fourth layer - 1x4 models (5 symbols, 15 symbol threshold)

Decoding top left pixel (for it no neighbourhood is provided):

 val = coder->decode_model(modeller->primary_model);
 if (val < modeller->cache_size) {
     pix = modeller->cache[pix];
     if pix is found in the provided neighbourhood, insert it to the first position in the cache
       (it doesn't matter if it's already in the cache)
     else move it to the first position shifting other values by one
 } else {
     pix = coder->decode_model(modeller->escape_model);
     if pix is found in cache, move it to the first position shifting other values by one
     else just insert it at the first position in cache
 }

Decoding other pixels:

 get neighbourhood (left, top, top-right and top-left pixels)
 select secondary model depending on neighbourhood
 if decoded value is less than number of neighbours, pick corresponding neighbour
 else decode pixel like top left one but provide neighbourhood for the reference this time