SpeedHQ

From MultimediaWiki
Jump to navigation Jump to search

This is a simple intermediate codec with different subsampling and alpha support. It is generally very similar to MPEG-2 intraframe. Several variants exist:

  • SHQ0/SHQ1 are 4:2:0 with/without alpha.
  • SHQ2/SHQ3 are 4:2:2 with/without alpha.
  • SHQ4/SHQ5 are 4:4:4 with/without alpha.
  • SHQ7 is 422 with alpha coded the same way as luma
  • SHQ9 is the same for 444 format.

NewTek's NDI network codec uses SHQ2 and SHQ7.

Frame is always coded as two fields, data is packed into 32-bit little-endian words and is read from LSB.

First byte is quality, next three bytes form an offset to the second field. (Exception: NDI can send a frame consisting of a single field, with the second field offset equal to 4, ie., exactly overlapping with the first field.) Quantizer is 100 - quality.

Field format

Each field is split into four slices that can be independently decoded; each slice codes 16 lines, then skip 48 lines (coded by the other three slices), code 16 more lines, and so on until at the bottom of the field, which implicitly ends the slice. The first three bytes of each slice form a length of the slice in bytes (including the three length bytes). The next slice starts on the byte immediately after the end of the previous one, as given by the length (this might mean skipping 1–7 bits).

Blocks are 8x8 pixels, and macroblocks are 16x16 pixels. Luma is coded top-left, top-right, bottom-left, bottom-right. 4:4:4 is coded YYYY UV UV UV UV (chroma is top-left, bottom-left, top-right, bottom-right). 4:2:2 is coded YYYY UV UV (chroma is top, then bottom). 4:2:0 is coded YYYY UV.

Blocks are coded with static VLCs. DC coefficients are coded exactly as in MPEG-2 (same luma/chroma tables, same scheme for storing the coefficient and sign). DC prediction is taken from the previous block of the same coefficient, and restarts at 1024 for each new macroblock row (not 128 as in MPEG-2). Curiously enough, one needs to _subtract_ the stored DC value from the prediction, not add as usual.

AC coefficients are coded using the same scheme as MPEG-2 (run/level with the same zigzag pattern), and with a VLC that is very similar to MPEG-2's “Table One” for AC coefficients. (It is not identical; some codewords that are illegal in MPEG-2 are used in SpeedHQ, and a few others are moved around as a consequence.) In FFmpeg format, the codes are (except that they would need to be bit-reversed due to INIT_VLC_LE demands):

static const uint16_t speedhq_vlc[123][2] = {
  {0x02, 2}, {0x06, 3}, {0x07, 4}, {0x1c, 5},
  {0x1d, 5}, {0x05, 6}, {0x04, 6}, {0x7b, 7},
  {0x7c, 7}, {0x23, 8}, {0x22, 8}, {0xfa, 8},
  {0xfb, 8}, {0xfe, 8}, {0xff, 8}, {0x1f,14},
  {0x1e,14}, {0x1d,14}, {0x1c,14}, {0x1b,14},
  {0x1a,14}, {0x19,14}, {0x18,14}, {0x17,14},
  {0x16,14}, {0x15,14}, {0x14,14}, {0x13,14},
  {0x12,14}, {0x11,14}, {0x10,14}, {0x18,15},
  {0x17,15}, {0x16,15}, {0x15,15}, {0x14,15},
  {0x13,15}, {0x12,15}, {0x11,15}, {0x10,15},
  {0x02, 3}, {0x06, 5}, {0x79, 7}, {0x27, 8},
  {0x20, 8}, {0x16,13}, {0x15,13}, {0x1f,15},
  {0x1e,15}, {0x1d,15}, {0x1c,15}, {0x1b,15},
  {0x1a,15}, {0x19,15}, {0x13,16}, {0x12,16},
  {0x11,16}, {0x10,16}, {0x18,13}, {0x17,13},
  {0x05, 5}, {0x07, 7}, {0xfc, 8}, {0x0c,10},
  {0x14,13}, {0x18,12}, {0x14,12}, {0x13,12},
  {0x10,12}, {0x1a,13}, {0x19,13}, {0x07, 5},
  {0x26, 8}, {0x1c,12}, {0x13,13}, {0x1b,12},
  {0x06, 6}, {0xfd, 8}, {0x12,12}, {0x1d,12},
  {0x07, 6}, {0x04, 9}, {0x12,13}, {0x06, 7},
  {0x1e,12}, {0x14,16}, {0x04, 7}, {0x15,12},
  {0x05, 7}, {0x11,12}, {0x78, 7}, {0x11,13},
  {0x7a, 7}, {0x10,13}, {0x21, 8}, {0x1a,16},
  {0x25, 8}, {0x19,16}, {0x24, 8}, {0x18,16},
  {0x05, 9}, {0x17,16}, {0x07, 9}, {0x16,16},
  {0x0d,10}, {0x15,16}, {0x1f,12}, {0x1a,12},
  {0x19,12}, {0x17,12}, {0x16,12}, {0x1f,13},
  {0x1e,13}, {0x1d,13}, {0x1c,13}, {0x1b,13},
  {0x1f,16}, {0x1e,16}, {0x1d,16}, {0x1c,16},
  {0x1b,16},
  {0x01,6}, /* escape */
  {0x06,4}, /* EOB */
};
static const uint8_t speedhq_level[121] = {
   1,  2,  3,  4,  5,  6,  7,  8,
   9, 10, 11, 12, 13, 14, 15, 16,
  17, 18, 19, 20, 21, 22, 23, 24,
  25, 26, 27, 28, 29, 30, 31, 32,
  33, 34, 35, 36, 37, 38, 39, 40,
   1,  2,  3,  4,  5,  6,  7,  8,
   9, 10, 11, 12, 13, 14, 15, 16,
  17, 18, 19, 20,  1,  2,  3,  4,
   5,  6,  7,  8,  9, 10, 11,  1,
   2,  3,  4,  5,  1,  2,  3,  4,
   1,  2,  3,  1,  2,  3,  1,  2,
   1,  2,  1,  2,  1,  2,  1,  2,
   1,  2,  1,  2,  1,  2,  1,  2,
   1,  2,  1,  1,  1,  1,  1,  1,
   1,  1,  1,  1,  1,  1,  1,  1,
   1,
};
static const uint8_t speedhq_run[121] = {
   0,  0,  0,  0,  0,  0,  0,  0,
   0,  0,  0,  0,  0,  0,  0,  0,
   0,  0,  0,  0,  0,  0,  0,  0,
   0,  0,  0,  0,  0,  0,  0,  0,
   0,  0,  0,  0,  0,  0,  0,  0,
   1,  1,  1,  1,  1,  1,  1,  1,
   1,  1,  1,  1,  1,  1,  1,  1,
   1,  1,  1,  1,  2,  2,  2,  2,
   2,  2,  2,  2,  2,  2,  2,  3,
   3,  3,  3,  3,  4,  4,  4,  4,
   5,  5,  5,  6,  6,  6,  7,  7,
   8,  8,  9,  9, 10, 10, 11, 11,
  12, 12, 13, 13, 14, 14, 15, 15,
  16, 16, 17, 18, 19, 20, 21, 22,
  23, 24, 25, 26, 27, 28, 29, 30,
  31,
};

Escape works similarly to MPEG-2; the next six bits contain the run (non-reversed), and the next 12 bits contain the level (non-reversed), offset by 2048. There'a always an EOB at the end of each block (as in MPEG-2), and there's never a block without a DC coefficient (also as in MPEG-2). All other codes are followed by the coefficient sign (again as in MPEG-2).

Alpha

Alpha for SHQ1/3/5 is coded in 16x8 blocks without zigzag, coming after chroma (so e.g. SHQ3 is YYYY UV UV AA), top first, then bottom. It does not use DCT, but is still coded as run/level; first it has a codeword telling how many elements to skip (or whether this is end of block) and then a codeword for nonzero element, repeat until the EOB code. The code for run is (remember that all bit strings and literals go in little-endian order, even though they're written left to right here):

0 0
10xx xx plus 1
110 EOB
111xxxxxxx xxxxxxx

Level codes:

1s 1 or -1 (s = sign bit)
01sxx xx plus 2 (s = sign bit)
00xxxxxxxx xxxxxxxx (two's complement)

Alpha prediction happens in vectors of 16 elements; every element predicts from the previous element, and the stored value is subtracted from the prediction to yield the actual pixel value (and next prediction for this element), with wraparound. The start value, reset for each macroblock row, is 255 for all pixels. Note that subtraction happens with 8-bit precision and wraparound, so every alpha value can be encoded despite the range of only +128/-127.

Alpha for SHQ7/9 is encoded identically to luma (ie., with DCT), and comes after chroma (so e.g. SHQ7 is YYYY UV UV AAAA). It does not use the prediction scheme from SHQ1/3/5, just the regular DC prediction as from luma.

Quantiser

Allowed quality levels:

 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 18, 20, 22, 24, 28, 32, 36, 40, 44, 48, 52, 56, 64, 72, 80, 88, 96, 104, 112

Provided quality is clipped to 99, also if it's less than 38 or odd then a nearest smaller value from default quality table is used instead.

Quantisation matrix:

  *, 16, 19, 22, 26, 27, 29, 34,
 16, 16, 22, 24, 27, 29, 34, 37,
 19, 22, 26, 27, 29, 34, 34, 38,
 22, 22, 26, 27, 29, 34, 37, 40,
 22, 26, 27, 29, 32, 35, 40, 48,
 26, 27, 29, 32, 35, 40, 48, 58,
 26, 27, 29, 34, 38, 46, 56, 69,
 27, 29, 35, 38, 46, 56, 69, 83

The DC quantizer is set to 16 no matter what; the other values are multiplied by (100 - quality). Each coefficient is multiplied with the corresponding quantization factor and then divided by 16 (no rounding), before the block goes to IDCT.

As an optimization, DC-only blocks are special-cased; all pixels in the block get exactly the value (dc + 4) >> 3.

Y'CbCr conversion uses BT.601 coefficients. Chroma positioning is center, as in MPEG-2.