RealVideo 6

From MultimediaWiki
Revision as of 11:16, 3 December 2018 by Kostya (talk | contribs) (Fill the rest of RV6 info)
Jump to navigation Jump to search
  • FourCC: rv60
  • Company: Real

RealVideo 6 is a combination of coding technologies from RealVideo 4 and HEVC. The codec is designed for speed and simplicity and thus has only limited set of features: I/P/B frames with no complex reference lists, 64x64 coding blocks that can be split down to 8x8 blocks, variable-length codes.

Frame structure

Frame is composed from rows of 64x64 blocks (or slices) with each slice being coded independently (relying on top neighbours just for reconstruction). Frame consists of: frame header, slice sizes and coded slices data.

Frame header

  • sync (2 bits) - always 3
  • profile (2 bits) - always 0
  • unknown (4 bits)
  • frame type (2 bits) - I, P, B or some special frame type (probably preview frame)
  • quantiser (6 bits)
  • marker (1 bit) - always 0
  • toolset (2 bits) - always 0
  • osvquant (2 bits) - quantiser selection mode (for coefficient coding)
  • unknown flag (1 bit)
  • unknown field (2 bits)
  • picture number (24 bits)
  • width code (11 bits) - width is (code + 1) * 4
  • height code (11 bits) - height is code * 4
  • unknown flag (1 bit)

These two fields present only in inter frames:

    • some flag (1 bit) - if present then three unknown flags after it are present too
    • use two forward references (1 bit)
  • luma QP difference? (1-2 bits) - coded as 0, 10 or 11
  • chroma QP difference (the same coding scheme) - should be 0
  • QP offset type (also 012 coding) - defines how QP difference for each slice is coded
  • deblock flag (1 bit)
  • do not deblock chroma flag (1 bit, present only when deblock flag is set)
  • optional message present flag (1 bit)

Optional message format

  • message chunks length - 2 bits
  • data

Message can be coded in 0-3 chunks that have lengths of 2, 4 and 16 bytes.

Slice sizes

There are always (height + 63) >> 6 slices so that number is calculated from transmitted frame height.

Sizes are coded as an array of differences from the previous size.

  • size difference length minus one - 5 bits
  • array of size change flags (1 bit per flag, 0 - add to the previous size, 1 - subtract from the previous size)
  • array of size differences (the first size equals to the difference, each difference takes the amount of bits signalled above)

Slice data

Slice data consists of QP difference for the whole slice and data for each individual 64x64 macroblock.

QP difference is read depending on the mode in frame header (QP offset type):

  • 0 - no difference
  • 1 - 0 = 0, 10 = +1, 11 = -1
  • 2 - 0 = 0, 100 = +1, 101 = +2, 110 = -1, 111 = -2

Coded block header

Coded blocks are coded recursively the same way as in HEVC. If the block parts are outside frame then split the block and process all parts that are (partly) inside the frame recursively. Otherwise read a bit to decide whether the block should be split unless block size is 8x8, then it should not be split any further.

  • block type (intra for I-frame, otherwise 2 bits) - types are intra, inter with motion vector, skip block and inter block without coded motion frame

If it is 8x8 intra block then we need to read another bit that signals whether the block should be coded as split one (i.e. four 4x4 subblocks with individual intra prediction and transform for luma instead of single 8x8 block). Intra prediction direction (four entries for split 8x8 block, one otherwise) follows.

Inter block with coded MV has the prediction unit type (2 bits for 8x8 block, 3 bits otherwise) and motion vector data (1-4 depending on prediction unit type).

Inter block without MV or skip block has just skip candidate coded as 0 = 0, 10 = 1, 110 = 2, 111 = 3.

Prediction unit types:

  • full
  • horizontal split into equal halves
  • vertical split into equal halves
  • split into quarters
  • horizontal split into 1/4 and 3/4
  • vertical split into 1/4 and 3/4
  • horizontal split into 3/4 and 1/4
  • vertical split into 3/4 and 1/4


Intra prediction direction coding

For blocks larger than 16x16 intra prediction is coded with single bit: 0 - plane mode, 1 - DC mode.

For the rest of block sizes it uses:

  • coding mode (1 bit)
    • (if coding mode = 1) shortlist index (012 coding)
    • (if coding mode = 0) raw mode (5 bits)

Intra mode is reconstructed as in HEVC: form a short list of unique modes from top, left and top left neighbours, add modes from the default list if needed (0, 1, 10, 26, 18, 2) and either take an item at provided index or sort the list in ascending order and increase the raw mode for each element seen less than or equal to it.

For split 8x8 block prediction neighbours are always taken from the neighbouring blocks instead of current one.

Chroma is predicted using the first (or only for non-split blocks) luma prediction mode.

Motion vector coding

For each motion vector first its reference is read:

  • for P-frames without two references flag set the reference is always the first one
  • for P-frames with that flag set there is a bit telling which one of two references to use
  • for B-frames:
    • if block size is 8 or the block is divided or there's a bit set - read a bit to decide whether it's a forward or backward reference
    • otherwise it is a bidirectionally predicted block and two MV differences are present

Motion vector differences are coded as signed Elias Gamma code (exactly like in RealVideo 3 and 4).

Determining which references should be used for P-frames is left as an exercise to the reader.

Coefficients coding

For all but skip block type there is coefficient data present.

Depending on block size and type one of three modes is selected:

  • for 8x8 blocks with some split mode (including intra) it is 4x4 transform type
  • for full type 8x8 blocks 8x8 transform is employed
  • for 16x16 inter blocks with split mode 4x4 transform type is employed
  • for all other block types 16x16 transform type is employed

For 8x8/16x16 coded block patterns and for coefficients one of multiple static codebooks is selected depending on intra/inter mode and QP+osvquant.

  • osvquant = 0 - use QP
  • osvquant = 1 - for QP <= 25 use QP + 5, otherwise QP
  • osvquant = 2 - for QP <= 18 use QP + 10, for 19 <= QP <= 25 use QP + 5, otherwise use QP

4x4 coding mode for 8x8 blocks:

  • cbp8x8 coded with subset 0 for intra or 2 for inter
  • four 4x4 luma blocks (optional, signalled by cbp8x8)
  • two 4x4 chroma blocks (optional, signalled by cbp8x8)

4x4 coding mode for 16x16 blocks:

  • data present flag (if 0 nothing else is coded)
  • cbp16x16 coded with subset 0 for intra or 2 for inter
  • sixteen 4x4 luma blocks (optional, signalled by cbp16x16)
  • two sets of four 4x4 chroma blocks (optional, signalled by cbp16x16)

8x8 coding mode:

  • cbp8x8 coded with subset 1 for intra or 3 for inter
  • single 8x8 luma block made from four 4x4 subblocks (subblocks presence is signalled by cbp8x8)
  • two 4x4 chroma blocks (optional, signalled by cbp8x8)

16x16 coding mode splits block into 16x16 tiles and has the following data:

  • tiles coded flags (1 bit for 16x16, 4 for 32x32)
  • for each coded tile:
    • cbp16x16 coded with subset 1 for intra or 3 for inter
    • single 16x16 luma block made from sixteen 4x4 subblocks (subblocks presence is signalled by cbp16x16)
    • two 8x8 chroma blocks made from four 4x4 subblocks

Actual 4x4 coefficient decoding is performed in the same way as in RealVideo 3 and 4.

Reconstruction

Transforms

4x4 transform is the same as RealVideo 4#ITransform4x4.

8x8 transform is defined by the following matrix:

    37,  37,  37,  37,  37,  37,  37,  37,
    51,  43,  29,  10, -10, -29, -43, -51,
    48,  20, -20, -48, -48, -20,  20,  48,
    43, -10, -51, -29,  29,  51,  10, -43,
    37, -37, -37,  37,  37, -37, -37,  37,
    29, -51,  10,  43, -43, -10,  51, -29,
    20, -48,  48, -20, -20,  48, -48,  20,
    10, -29,  43, -51,  51, -43,  29, -10

16x16 transform is defined by the following matrix:

    26,  26,  26,  26,  26,  26,  26,  26,  26,  26,  26,  26,  26,  26,  26,  26,
    37,  35,  32,  28,  23,  17,  11,   4,  -4, -11, -17, -23, -28, -32, -35, -37,
    36,  31,  20,   7,  -7, -20, -31, -36, -36, -31, -20,  -7,   7,  20,  31,  36,
    35,  23,   4, -17, -32, -37, -28, -11,  11,  28,  37,  32,  17,  -4, -23, -35,
    34,  14, -14, -34, -34, -14,  14,  34,  34,  14, -14, -34, -34, -14,  14,  34,
    32,   4, -28, -35, -11,  23,  37,  17, -17, -37, -23,  11,  35,  28,  -4, -32,
    31,  -7, -36, -20,  20,  36,   7, -31, -31,   7,  36,  20, -20, -36,  -7,  31,
    28, -17, -35,   4,  37,  11, -32, -23,  23,  32, -11, -37,  -4,  35,  17, -28,
    26, -26, -26,  26,  26, -26, -26,  26,  26, -26, -26,  26,  26, -26, -26,  26,
    23, -32, -11,  37,  -4, -35,  17,  28, -28, -17,  35,   4, -37,  11,  32, -23,
    20, -36,   7,  31, -31,  -7,  36, -20, -20,  36,  -7, -31,  31,   7, -36,  20,
    17, -37,  23,  11, -35,  28,   4, -32,  32,  -4, -28,  35, -11, -23,  37, -17,
    14, -34,  34, -14, -14,  34, -34,  14,  14, -34,  34, -14, -14,  34, -34,  14,
    11, -28,  37, -32,  17,   4, -23,  35, -35,  23,  -4, -17,  32, -37,  28, -11,
     7, -20,  31, -36,  36, -31,  20,  -7,  -7,  20, -31,  36, -36,  31, -20,   7,
     4, -11,  17, -23,  28, -32,  35, -37,  37, -35,  32, -28,  23, -17,  11,  -4

Transforms are done columns then rows using the same rounded shift by 7 in both stages (for both 8x8 and 16x16).

Intra prediction

This is done essentially like in H.265 though plane mode prediction might be a bit different.

Motion compensation

Motion compensation uses the same 1/4-th pel interpolation as RealVideo 4. Motion vector prediction is done from neighbouring top, left and top right block. For three candidates a median prediction is used, for two candidates it is (A + B) >> 1.

Blocks without coded motion vector form a list of unique motion vector list from the neighbours (i.e. if the motion vector is present already in the list do not add it the second time), padding to the required length with zero MVs and taking the Nth motion vector from the list specified by the coded number.

Skip candidate selection order:

  • top
  • left
  • top right
  • left down
  • just above left down block
  • just left to the top right block

For B-frames the averaging is done with (A + B) >> 1 formula.

Deblocking

Deblocking is performed recursively on each 64x64 coded block with strength set to 2 for edges and 1 for coded blocks. Vertical edges are deblocked first then horizontal ones.

Sample deblocking code:

   for (i = 0; i < 4; i++) {
       diff_q1q0[i] = dst[i * stride - 2] - dst[i * stride - 1];
       diff_p1p0[i] = dst[i * stride + 1] - dst[i * stride];
   }
   // for chroma it's just first diff_q1q0/diff_p1p0 < lim2
   str_p = (diff_q1q0[0] + diff_q1q0[1] + diff_q1q0[2] + diff_q1q0[3]) < lim2 ? 3 : 1;
   str_q = (diff_p1p0[0] + diff_p1p0[1] + diff_p1p0[2] + diff_p1p0[3]) < lim2 ? 3 : 1;
   if (str_p + str_q > 2) {
       msum = ((mode1 + mode2 + str_q + str_p) >> 1) as i16;
       if ((str_q == 1) || (str_p == 1)) {
           maxprod = 512;
           weak = 1;
       } else {
           maxprod = 384;
           weak = 0;
       }
       for (y = 0; y < 4; y++) {
           diff_p0q0 = dst[0] - dst[-1];
           if ((diff_p0q0 != 0) && (lim1 * ABS(diff_p0q0) < maxprod)) {
               diff_q1q2 = dst[-2] - dst[-3];
               diff_p1p2 = dst[1] - dst[2];
               if weak {
                   delta = clip_symm((diff_p0q0 + 1) >> 1, msum >> 1);
               } else {
                   diff_strg = (dst[-2] - dst[1] + 4 * diff_p0q0 + 4) >> 3;
                   delta = clip_symm(diff_strg, msum);
               }
               dst[-1] = clip8(dst[-1] + delta);
               dst[ 0] = clip8(dst[ 0] - delta);
               if ((str_q != 1) && (ABS(diff_q1q2) < (lim1 >> 2))) {
                   diff = (diff_q1q0[y] + diff_q1q2 - delta) >> 1;
                   delta_q1 = weak ? clip_symm(diff, mode1 >> 1) : clip_symm(diff, mode1);
                   dst[-2] = clip8(dst[-2] - delta_q1);
               }
               if ((str_p != 1) && (ABS(diff_p1p2) < (lim1 >> 2))) {
                   diff = (diff_p1p0[y] + diff_p1p2 + delta) >> 1;
                   delta_p1 = weak ? clip_symm(diff, mode2 >> 1) : clip_symm(diff, mode2);
                   dst[1] = clip8(dst[1] - delta_p1);
               }
           }
           dst += stride;
       }
   }

Modes and limits are selected based on quantiser (first array index) and filter strength. Mode is determined by filter strength: 0 -> 0, 1 -> 0th element, 2 -> 1st element. lim1 is 2nd element and lim2 is 3rd element multiplied by four.

 const RV60_DEB_LIMITS: [[u8; 4]; 32] = [
   [ 0, 0, 128,  0 ], [ 0, 0, 128,  0 ], [ 0, 0, 128,  0 ], [ 0, 0, 128,  0 ],
   [ 0, 0, 128,  0 ], [ 0, 0, 128,  0 ], [ 0, 0, 128,  0 ], [ 0, 0, 128,  0 ],
   [ 0, 0, 128,  3 ], [ 0, 1, 128,  3 ], [ 0, 1, 122,  3 ], [ 1, 1,  96,  4 ],
   [ 1, 1,  75,  4 ], [ 1, 1,  59,  4 ], [ 1, 1,  47,  6 ], [ 1, 1,  37,  6 ],
   [ 1, 1,  29,  6 ], [ 1, 2,  23,  7 ], [ 1, 2,  18,  8 ], [ 1, 2,  15,  8 ],
   [ 1, 2,  13,  9 ], [ 2, 3,  11,  9 ], [ 2, 3,  10, 10 ], [ 2, 3,   9, 10 ],
   [ 2, 4,   8, 11 ], [ 3, 4,   7, 11 ], [ 3, 5,   6, 12 ], [ 3, 5,   5, 13 ],
   [ 3, 5,   4, 14 ], [ 4, 7,   3, 15 ], [ 5, 8,   2, 16 ], [ 5, 9,   1, 17 ]
 ];

An example: if we have QP=19 then we use sub-array [ 1, 2, 15, 8 ]. For filter strength 2 we select first element = 2 and use lim1 = 15 and lim2 = 8 * 4.