ClearVideo

From MultimediaWiki
Jump to navigation Jump to search

ClearVideo is a video encoder ostensibly based on fractals. Alleged to be the basis of certain RealVideo codecs. Also alleged to be patented. A patent search with "iterated" as the assignee name indeed turns up over 20 patents, many of which mention "fractals" and "image compression" in the title. The same decoder core is used for VfW, QT and Real decoders.

One of the peculiarities of the codec is that the codebooks are provided to it externally and usually they are stored as the resources in decoder (or encoder) wrapper. Those tables include:

  • CVLHUFF — single large codebook for 64x64 tile mode interframes (first 1/5th of it is 8-bit code lengths, next 2/5 is 16-bit code values, the rest is 16-bit code symbols)
  • HUFF — codebooks for interframe tree information coding.
  • DCT — intraframe coefficients codebooks
  • VQ — probably suggested block configurations for the encoder, not used by decoder

Extradata

Extradata differs for all container formats. AVI variant stores values in little-endian words and half-words, RealMedia variant stores the same values in big-endian order, and QuickTime has completely different format that does not seem to look like either.

AVI/RM format looks like this:

  • the usual BITMAPINFOHEADER (even in RealMedia—right after the RM-specific stream information but also in big-endian order)
  • byte sequence 00 01 00 01
  • two sixteen-bit values 0x0001 0x0001 (probably to detect endianness)
  • 8 zero bytes
  • image width and height (two 32-bit words)
  • unused 32-bit word
  • block version parameter 1 (32-bit word)
  • tile size (usually 16 or 32)
  • unused 32-bit word
  • block version parameter 2 (32-bit word)
  • block version parameter 3 (32-bit word)

Block version parameter 1 seems to be some flag that should take only 0 and 1 values and affects block decoding variant for inter-frame selection. Block version parameter 2 selects the block handler depending on version, valid values are 0, 1, 2 and 6 (or 4, 1, 5 and 6 is the previously described flag is 6); most common value is 5. Block version parameter 3 seems to be a duplicate in meaning and it's not really used by the decoder.

Quadtree version is also devised from the block version: block versions 0-5 map to quadtree version 0, block version 6 maps to quadtree version 2 (block versions 0-2 map to quadtree codes version 0 (unpacked), 3-5 map to codes version 1 (default), 6 maps to version 3).

All of these is needed to decode inter frames, for intra frames only codebooks and the frame dimensions are required to be provided.

Intraframe decoding

Codebooks

Codebooks are stored in DCT table as the arrays of (32-bit word for code value, 32-bit word for code length) pairs with zero codes for unused entries. There are following tables stored one after another:

  • AC table 1, 24 entries
  • AC table 2, 100 entries
  • AC table 3, 6 entries
  • AC table 4, 40 entries
  • 112*8 bytes - some lookup table
  • 96*8 bytes - some lookup table
  • 120*8 bytes - some lookup table
  • DC table, 127 entries

Bitstream

Bitreader uses 32-bit little-endian words and reads data from MSB.

At least one of the coding methods is based on DCT and works in YUV420 colourspace:

 ac_quant = get_bits(8); // DC quantisers are always 32
 for (all macroblocks) {
    for (i = 0; i < 6; i++)
      block_coded[i] = get_bit();
    for (i = 0; i < 6; i++) {
      if (!block_coded[i]) continue;
      idx = 0;
      while (idx < 64) {
        val = get_code();
        if (val != ESCAPE) {
          unpack val into last, value and skip
          if (get_bit())
            value = -value;
        } else {
          last = get_bit();
          skip = get_bits(6);
          value = get_bits(8); // signed value
        }
        blk[idx] = value;
        idx += skip;
        if (last)
          break;
      }
      unquantise block (with separate quantiser for DC)
      IDCT();
      put_block_clamped();
    }
 }

Interframe decoding

Interframes are coded as larger tiles (usually 32x32) with tile information being quadtree with following parameters: present children, motion vector and bias value. Decoding process consists of reading the quadtree information, restoring the motion vector and copying block from the provided position with optional adding of bias value (if it's non-zero). There seem to be four ways to code information but only variant 1 has been observed in practice (and there may be variant 2 present in QuickTime):

  • variant 0 — uncoded trees
  • variant 1 — coded trees with context-dependent static codebooks
  • variant 2 — the same but with less levels possible
  • variant 3 — single codebook for all trees

Codebook description

For variants 1 and 2 codebook is stored in the external resource usually named HUFF with data in little-endian order. First 32-bit word is zero, next 32-bit word is the number of tables and then there is table data with the following header: 8 bits — ID, 32 bits — number of elements.

Each table first has elements bytes with code lengths, then elements 16-bit words with code values and then elements 8- or 16-bit symbol values (8-bit is just for the flags table). Also for 16-bit symbols the largest signed element is an escape value meaning you have to read 16 bits more to get the actual value if you decode one.

Known IDs:

  • 0x58 — motion vector data (pair of bytes for motion vectors)
  • 0x53 ­— child flags (in range 1-16 in the tables but 0-15 range is used instead everywhere)
  • 0x51 — bias value

The tables are stored sequentially with MV, bias, flags order kept for each level (some tables might be not present for each level).

Version 1 has this order:

  • Y level 0 — MV, flags
  • Y level 1 — MV, bias, flags
  • Y level 2 — MV, bias, flags
  • Y level 3 — MV, bias
  • U level 0 — flags
  • U level 1 — MV, bias, flags
  • U level 2 — MV, bias
  • V level 0 — flags
  • V level 1 — MV, bias, flags
  • V level 2 — MV, bias

Version 2 has this order:

  • Y level 0 — MV, flags
  • Y level 1 — MV, bias, flags
  • Y level 2 — MV, bias
  • U level 0 — flags
  • U level 1 — MV, bias
  • V level 0 — flags
  • V level 1 — MV, bias

Decoding process

      • The following should be true for block versions 0-5, version 6 uses different coding process and needs to be figured out still.
 for each tile {
   if (get_bit()) {
     // empty tile
     restore motion vector
     restore block
   } else {
     read quadtree for luma
     restore luma motion vector
     restore luma block
     read quadtrees for chroma
     use scaled down luma MV as base for chroma blocks and restore them
   }
 }

Quadtree is decoded recursively: first you decode elements in order flags, MV, bias (if codebook for the current context is not present then the value is zero), then if flags are present the corresponding quadtree children are decoded (and children for the current decoded quadtree child are decoded before the sibling).

Quadtree children correspond to the following flags:

  1 | 4
 ---+---
  2 | 8

For top level luma blocks there's a motion vector prediction using left neighbour for top row, top neighbour for first and last blocks in the row or median from left, top and top-right neighbours for the rest of blocks. If predicted motion vector makes block reference out of frame region it's clipped before using.

An example:

  • for luma level 0 (root) we decode flags = 0xC and mv = 0,3 - that means we have two right children
    • we decode top-right child as level 1 and get flags = 0x1, mv = 0,2 and bias = 4
      • we decode top-left child of that as level 2 and get flags = 0x0, mv = 0,5 and bias = 0
    • we decode bottom-right child as level 1 and get flags 0x0, mv = 0,-1 and bias = 0

Now we get MV for the whole luma block — let it be 42,42. We can restore the block now:

  • luma level 0 is split so we need to operate on subblocks
    • blocks 1 and 2 are not coded so we apply parent MV+prediction (0,3 + 42,42 = 42,45)
    • block 4 is split — descending
      • subblock 1 is coded and has no children, we apply its MV+prediction (0,5 + 42,42 = 42,47)
      • subblocks 2, 4 and 8 are not coded, we apply parent MV+prediction (0,2 + 42,42 = 42,44) and bias 4
    • block 8 has no children, we apply its MV+prediction (0,-1 + 42,42 = 42,41)

Final luma block MV is 42,42+0,3 (prediction + top level MV), so for chroma blocks 21,22 is used as prediction MV.