Microsoft Screen Codec: Difference between revisions
(Add details about context and cache color decoding) |
|||
Line 236: | Line 236: | ||
If left pixel isn't available (first column): left = top-left = top | If left pixel isn't available (first column): left = top-left = top | ||
If neither right | If neither right nor left are available (single column): top-right = top-left = left = top | ||
note: pixels outside the current area aren't considered available | note: pixels outside the current area aren't considered available | ||
Line 250: | Line 250: | ||
sublayer = 0 | sublayer = 0 | ||
if layer == 2: # 2-2 | if layer == 2: # 2-2 or 3-1 | ||
if top == topLeft: | if top == topLeft: | ||
if topRight == topLeft: | if topRight == topLeft: | ||
Line 291: | Line 291: | ||
if top-top pixel is available (row >= 2) and its value is equal to the top pixel: | if top-top pixel is available (row >= 2) and its value is equal to the top pixel: | ||
subsublayer += 2 | subsublayer += 2 | ||
Last decoded pixels cache use: | |||
This cache internally has 4 more entries (12 total for picture data, 6 {7} for mask). The extra entries are to skip neighboring colors which we already know aren't the ones we're looking for. | |||
Example: | |||
Get neiborhood pixels, in this order: topLeft = 140, top = 134, topRight = 140, left = 136 | |||
Remove duplicates: [140, 134, 136] | |||
We have 3 unique colors, therefore we use the third layer in the secondary model. | |||
Since topRight == topLeft, we use sublayer 1. The subsublayer doesn't mater for the sake of this example. | |||
Now we fetch a value x using the corresponding secondary model: | |||
if x == 0, output 140 | |||
if x == 1, output 134 | |||
if x == 2, output 136 | |||
if x == 3, the secondary model can't code the color. Fall back to the primary model to try and decode it from the cache. | |||
Assume the cache contents are [25, 140, 136, 134, 50, 23, ... | |||
If the primary model returned 0, output 25 | |||
If it returned 1, since we know the color isn't 134, 136, or 140, output 50 | |||
If it returned 2, output 23, and so on, until 8 which means the color isn't in the cache either and we have to fall back to the escape model. In this example, the last cache entry was unreachable. For the top-left pixel, there are zero neighbors and the last 4 entries are unreachable. | |||
=== MSS2 details === | === MSS2 details === |
Revision as of 02:01, 12 June 2012
Also known as Windows Media Screen Codec.
- FourCCs: MSS1, MSS2, MSA1
- Samples:
MSA1 is created by Live Meeting 2007
Some details about format
Both MSS1 and MSS2 are quite close (thus are decoded with single decoder). They employ arithmetic coding - real one, with probability coding. This coding is used with several adaptive models, which look a bit like PPM.
MSS1 details
MSS1 (aka Windows Media Screen V7 codec) compresses only palettised images.
Extradata format
(for some reason, data in .wmv is stored in big-endian order)
4- 7 header length 8-11 major version (1 for MSS1, 2 for MSS2) 12-15 minor version 16-19 display width 20-23 display height 24-27 coded width 28-31 coded height 32-35 frames per second (float) 36-39 bitrate 40-43 max lead time (float) 44-47 max lag time (float) 48-51 max seek time (float) 52-55 nFreeColors 56-823 palette (256 RGB triplets)
Only for MSS2:
824-827 threadingSplit (domain: -1, 0, 1..codedH) 828-831 numSymbolsEscapeModel (domain: 0..256)
Both width and height must be in the range 1..4096.
Frame format
Codec uses arithmetic decoders for all operations and adaptive models. All code for them is suspiciously similar to the one in | 1987 paper by Witten, Neal and Cleary.
Codec uses delta compression and can change top palette entries with every intra frame:
is_inter = coder->decode_bit(); if (!is_inter) { if (nFreeColors) { num_entries = coder->decode_number(nFreeColors + 1); for (i = 0; i < num_entries; i++) { pal[(256 - nFreeColors) + i].R = coder->decode_bits(8); pal[(256 - nFreeColors) + i].G = coder->decode_bits(8); pal[(256 - nFreeColors) + i].B = coder->decode_bits(8); } } recursive_decode_intra(0, 0, width, height); } else { recursive_decode_inter(0, 0, width, height); }
Frame coding is done by recursively partitioning picture horizontally or vertically and coding partitions in some way:
recursive_decode_intra(x, y, width, height) { mode = coder->decode_model(split_mode_model); switch (mode) { case 0: pivot = decode_pivot(height); recursive_decode_intra(x, y, width, pivot); recursive_decode_intra(x, y + pivot, width, height - pivot); break; case 1: pivot = decode_pivot(width); recursive_decode_intra(x, y, pivot, height); recursive_decode_intra(x + pivot, y, width - pivot, height break; case 2: mode = coder->decode_model(intra_decode_model); if (!mode) { pix = decode_pixel(); fill_rect(x, y, width, height, pixel); } else { decode_area(x, y, width, height); } break; } } recursive_decode_inter(x, y, width, height) { mode = coder->decode_model(split_mode_model); switch (mode) { case 0: pivot = decode_pivot(height); recursive_decode_inter(x, y, width, pivot); recursive_decode_inter(x, y + pivot, width, height - pivot); break; case 1: pivot = decode_pivot(width); recursive_decode_inter(x, y, pivot, height); recursive_decode_inter(x + pivot, y, width - pivot, height break; case 2: mode = coder->decode_model(inter_decode_model); if (!mode) { pix = decode_pixel(); // same meaning as mask values, see below // for MSS2, pix == 4 means a motion compensated rectangle if (pix != 0xFF) { copy_rect(x, y, width, height, pixel); } else { mode = coder->decode_model(intra_decode_model); if (!mode) { pix = decode_pixel(); fill_rect(x, y, width, height, pixel); } else { decode_area(x, y, width, height); } } } else { // this decoded change mask first and then // checks - if mask value is 0xFF then decode pixel // otherwise copy if from the previous frame mask = decode_area(x, y, width, height); decode_area_masked(x, y, width, height); } break; } }
Mask values:
Type | Value in MSS1 | Value in MSS2 |
---|---|---|
copy from same location | 0x80 | 0x02 |
copy motion compensated | N/A | 0x04 |
decode new | 0xFF | 0x01 |
In decode_area_masked(), decode new pixels as described in "Context modeller" even if the neighboring pixels were copied.
other decoding routines
Decoding pivot point:
decode_pivot(ref_value) { edge = coder->decode_model(edge_model); coord = coder->decode_model(pivot_model) + 1; if (coord > 2) coord = coder->decode_number((ref_value + 1) / 2 - 2) + 3; if (edge) return ref_value - coord; else return coord; }
Decoding pixels is not that trivial. Codec uses neighbour pixels (left, top-left, top, top-right) to form a cache which is used along with cached move-to-front queue and several models to restore pixel.
Models
Models are reinitialised at every intraframe. Initially all symbols have weigth = 1. With every update weight is increased by one and when they're too large they get rescaled.
Rescaling weights is performed when total cumulative probability is bigger than threshold, which can be static or adaptive.
Static threshold is calculated as num_symbols * symbol_threshold
, adaptive one is recalculated every time as
min(0x3FFF, ((2 * weights[num_symbols] - 1) / 2 + 4 * cumulative_probability[0]) / (2 * weights[num_symbols] - 1))
.
Scaling weights is simply weight' = (weight + 1) >> 1
.
Main models:
Name | Purpose | Number of symbols | Threshold per symbol |
---|---|---|---|
intra_decode_model | region decoding mode for intra (solid fill or not) | 2 | adaptive |
inter_decode_model | region decoding mode for inter (full region decoder or masked) | 2 | adaptive |
split_mode_model | region split mode (horizontal/vertical/none) | 3 | 50 |
edge_model | signals from which edge pivot point is decoded | 2 | 50 |
pivot_model | rough coordinates for pivot point (1, 2, escape) | 3 | 15 |
Context modeller
Context modeller is used for modelling pixel context by using its neighbours and caching last decoded values. There are two context modellers used by decoder — one for decoding picture data (in both kinds of frames), another one is used solely for decoding mask in interframes.
Modeller components (values in {brackets} are for MSS2):
- last decoded pixels cache (8 for picture data, 2 {3} for mask), initially filled with 0, 1, 2... and reset to that every intraframe
- primary model for decoding pixel (
(cache_size + 1)
symbols,15
symbol threshold) - escape model for decoding pixel value not in cache (
256
{numSymbolsEscapeModel
} symbols,50
symbol threshold) - secondary models for context-modelled pixels, four layers of models for different combinations of non-equal neighbours:
- first layer - 1x4 models (
2
symbols, adaptive symbol threshold) - second layer - 7x4 models (
3
symbols,15
symbol threshold) - third layer - 6x4 models (
4
symbols,15
symbol threshold) - fourth layer - 1x4 models (
5
symbols,15
symbol threshold)
- first layer - 1x4 models (
Decoding top left pixel (for it no neighbourhood is provided):
val = coder->decode_model(modeller->primary_model); if (val < modeller->cache_size) { pix = modeller->cache[pix]; if pix is found in the provided neighbourhood, insert it to the first position in the cache (it doesn't matter if it's already in the cache) else move it to the first position shifting other values by one } else { pix = coder->decode_model(modeller->escape_model); if pix is found in cache, move it to the first position shifting other values by one else just insert it at the first position in cache }
Decoding other pixels:
get neighbourhood (left, top, top-right and top-left pixels) select secondary model depending on neighbourhood if decoded value is less than number of neighbours, pick corresponding neighbour else decode pixel like top left one but provide neighbourhood for the reference this time
Determine neighborhood as:
If top pixel isn't available (first row): top = top-right = top-left = left (left is available, as it was decoded above) If right pixel isn't available (last column): top-right = top If left pixel isn't available (first column): left = top-left = top If neither right nor left are available (single column): top-right = top-left = left = top note: pixels outside the current area aren't considered available
Determine secondary model as:
layer = number of different neighborhoods (1 if all equal, 4 if all different, 2 if ABBB or AABB or ABBA or any other such combination, 3 if ABCC or ABBC or ABCA or any other such combination) sublayer = identify which neighborhoods are equal to each other. For example: if layer == 1: # all equal sublayer = 0 if layer == 2: # 2-2 or 3-1 if top == topLeft: if topRight == topLeft: sublayer = 3 elsif left == topLeft: sublayer = 2 else: sublayer = 4 elsif topRight == topLeft: if left == topLeft: sublayer = 1 else: sublayer = 5 else if left == topLeft: sublayer = 6 else: sublayer = 0 if layer == 3: # 2-1-1 if top == topLeft: sublayer = 0 elsif topRight == topLeft: sublayer = 1 elsif left == topLeft: sublayer = 2 elsif topRight == top: sublayer = 3 elsif left == top: sublayer = 4 else sublayer = 5 if layer == 4: # all different sublayer = 0 subsublayer = 0 if left-left pixel is available (column >= 2) and its value is equal to the left pixel: subsublayer += 1 if top-top pixel is available (row >= 2) and its value is equal to the top pixel: subsublayer += 2
Last decoded pixels cache use:
This cache internally has 4 more entries (12 total for picture data, 6 {7} for mask). The extra entries are to skip neighboring colors which we already know aren't the ones we're looking for.
Example:
Get neiborhood pixels, in this order: topLeft = 140, top = 134, topRight = 140, left = 136 Remove duplicates: [140, 134, 136] We have 3 unique colors, therefore we use the third layer in the secondary model. Since topRight == topLeft, we use sublayer 1. The subsublayer doesn't mater for the sake of this example.
Now we fetch a value x using the corresponding secondary model:
if x == 0, output 140
if x == 1, output 134
if x == 2, output 136
if x == 3, the secondary model can't code the color. Fall back to the primary model to try and decode it from the cache.
Assume the cache contents are [25, 140, 136, 134, 50, 23, ...
If the primary model returned 0, output 25
If it returned 1, since we know the color isn't 134, 136, or 140, output 50
If it returned 2, output 23, and so on, until 8 which means the color isn't in the cache either and we have to fall back to the escape model. In this example, the last cache entry was unreachable. For the top-left pixel, there are zero neighbors and the last 4 entries are unreachable.
MSS2 details
MSS2 (aka Windows Media Video 9 Screen codec) seems to be largely the same as MSS1 with such differences:
- RGB24 compression instead of palettised images (maybe)
- motion compensation in interframes
- different header format