Difference between revisions of "SpeedHQ"

From MultimediaWiki
Jump to navigation Jump to search
(Some corrections based on information from NewTek)
(More info based on the samples)
Line 1: Line 1:
* FOURCCs: SHQ0, SHQ1, ..., SHQ5, SHQ7, SHQ9
* FOURCCs: SHQ0, SHQ1, ..., SHQ5, SHQ7, SHQ9
* Company: NewTek (note: NewTek has provided support in understanding the format)
* Company: NewTek (note: NewTek has provided samples and support in understanding the format)


This is a simple intermediate codec with different subsampling and alpha support. It is generally very similar to MPEG-2 intraframe. Several variants exist:
This is a simple intermediate codec with different subsampling and alpha support. It is generally very similar to MPEG-2 intraframe. Several variants exist:


* <code>SHQ0</code>/<code>SHQ1</code> are 4:2:0 with/without alpha (SHQ1 is either rare or nonexistant in practice).
* <code>SHQ0</code>/<code>SHQ1</code> are 4:2:0 with/without alpha.
* <code>SHQ2</code>/<code>SHQ3</code> are 4:2:2 with/without alpha.
* <code>SHQ2</code>/<code>SHQ3</code> are 4:2:2 with/without alpha.
* <code>SHQ4</code>/<code>SHQ5</code> are 4:4:4 with/without alpha.
* <code>SHQ4</code>/<code>SHQ5</code> are 4:4:4 with/without alpha.
Line 20: Line 20:
Each field is split into four slices that can be independently decoded; each slice codes 16 lines, then skip 48 lines (coded by the other three slices), code 16 more lines, and so on until at the bottom of the field, which implicitly ends the slice. The first three bytes of each slice form a length of the slice in bytes (including the three length bytes). The next slice starts on the byte immediately after the end of the previous one, as given by the length (this might mean skipping 1–7 bits).
Each field is split into four slices that can be independently decoded; each slice codes 16 lines, then skip 48 lines (coded by the other three slices), code 16 more lines, and so on until at the bottom of the field, which implicitly ends the slice. The first three bytes of each slice form a length of the slice in bytes (including the three length bytes). The next slice starts on the byte immediately after the end of the previous one, as given by the length (this might mean skipping 1–7 bits).


Blocks are 8x8 pixels, and macroblocks are 16x16 pixels (coded top-left, top-right, bottom-left, bottom-right). 4:4:4 is coded YYYY UV UV UV UV. 4:2:2 is coded YYYY UV UV. 4:2:0 is coded YYYY UV.
Blocks are 8x8 pixels, and macroblocks are 16x16 pixels. Luma is coded top-left, top-right, bottom-left, bottom-right. 4:4:4 is coded YYYY UV UV UV UV (chroma is top-left, bottom-left, top-right, bottom-right). 4:2:2 is coded YYYY UV UV (chroma is top, then bottom). 4:2:0 is coded YYYY UV.


Blocks are coded with static VLCs. DC coefficients are coded exactly as in MPEG-2 (same luma/chroma tables, same scheme for storing the coefficient and sign). DC prediction is taken from the previous block of the same coefficient, and restarts at 1024 for each new macroblock row (not 128 as in MPEG-2). Curiously enough, one needs to _subtract_ the stored DC value from the prediction, not add as usual.
Blocks are coded with static VLCs. DC coefficients are coded exactly as in MPEG-2 (same luma/chroma tables, same scheme for storing the coefficient and sign). DC prediction is taken from the previous block of the same coefficient, and restarts at 1024 for each new macroblock row (not 128 as in MPEG-2). Curiously enough, one needs to _subtract_ the stored DC value from the prediction, not add as usual.

Revision as of 16:20, 5 January 2017

  • FOURCCs: SHQ0, SHQ1, ..., SHQ5, SHQ7, SHQ9
  • Company: NewTek (note: NewTek has provided samples and support in understanding the format)

This is a simple intermediate codec with different subsampling and alpha support. It is generally very similar to MPEG-2 intraframe. Several variants exist:

  • SHQ0/SHQ1 are 4:2:0 with/without alpha.
  • SHQ2/SHQ3 are 4:2:2 with/without alpha.
  • SHQ4/SHQ5 are 4:4:4 with/without alpha.
  • SHQ7 is 422 with alpha coded as luma and chroma blocks
  • SHQ9 is the same for 444 format.

NewTek's NDI network codec uses SHQ2 and SHQ7.

Frame is always coded as two fields, data is packed into 32-bit little-endian words and is read from LSB.

First byte is quality, next three bytes form an offset to the second field. (Exception: NDI can send a frame consisting of a single field, with the second field offset equal to 4, ie., exactly overlapping with the first field.) Quantizer is 100 - quality.

Field format

Each field is split into four slices that can be independently decoded; each slice codes 16 lines, then skip 48 lines (coded by the other three slices), code 16 more lines, and so on until at the bottom of the field, which implicitly ends the slice. The first three bytes of each slice form a length of the slice in bytes (including the three length bytes). The next slice starts on the byte immediately after the end of the previous one, as given by the length (this might mean skipping 1–7 bits).

Blocks are 8x8 pixels, and macroblocks are 16x16 pixels. Luma is coded top-left, top-right, bottom-left, bottom-right. 4:4:4 is coded YYYY UV UV UV UV (chroma is top-left, bottom-left, top-right, bottom-right). 4:2:2 is coded YYYY UV UV (chroma is top, then bottom). 4:2:0 is coded YYYY UV.

Blocks are coded with static VLCs. DC coefficients are coded exactly as in MPEG-2 (same luma/chroma tables, same scheme for storing the coefficient and sign). DC prediction is taken from the previous block of the same coefficient, and restarts at 1024 for each new macroblock row (not 128 as in MPEG-2). Curiously enough, one needs to _subtract_ the stored DC value from the prediction, not add as usual.

AC coefficients are coded using the same scheme as MPEG-2 (run/level with the same zigzag pattern), and with a VLC that is very similar to MPEG-2's “Table One” for AC coefficients. (It is not identical; some codewords that are illegal in MPEG-2 are used in SpeedHQ, and a few others are moved around as a consequence.) In FFmpeg format, the codes are (except that they would need to be bit-reversed due to INIT_VLC_LE demands):

static const uint16_t speedhq_vlc[123][2] = {
  {0x02, 2}, {0x06, 3}, {0x07, 4}, {0x1c, 5},
  {0x1d, 5}, {0x05, 6}, {0x04, 6}, {0x7b, 7},
  {0x7c, 7}, {0x23, 8}, {0x22, 8}, {0xfa, 8},
  {0xfb, 8}, {0xfe, 8}, {0xff, 8}, {0x1f,14},
  {0x1e,14}, {0x1d,14}, {0x1c,14}, {0x1b,14},
  {0x1a,14}, {0x19,14}, {0x18,14}, {0x17,14},
  {0x16,14}, {0x15,14}, {0x14,14}, {0x13,14},
  {0x12,14}, {0x11,14}, {0x10,14}, {0x18,15},
  {0x17,15}, {0x16,15}, {0x15,15}, {0x14,15},
  {0x13,15}, {0x12,15}, {0x11,15}, {0x10,15},
  {0x02, 3}, {0x06, 5}, {0x79, 7}, {0x27, 8},
  {0x20, 8}, {0x16,13}, {0x15,13}, {0x1f,15},
  {0x1e,15}, {0x1d,15}, {0x1c,15}, {0x1b,15},
  {0x1a,15}, {0x19,15}, {0x13,16}, {0x12,16},
  {0x11,16}, {0x10,16}, {0x18,13}, {0x17,13},
  {0x05, 5}, {0x07, 7}, {0xfc, 8}, {0x0c,10},
  {0x14,13}, {0x18,12}, {0x14,12}, {0x13,12},
  {0x10,12}, {0x1a,13}, {0x19,13}, {0x07, 5},
  {0x26, 8}, {0x1c,12}, {0x13,13}, {0x1b,12},
  {0x06, 6}, {0xfd, 8}, {0x12,12}, {0x1d,12},
  {0x07, 6}, {0x04, 9}, {0x12,13}, {0x06, 7},
  {0x1e,12}, {0x14,16}, {0x04, 7}, {0x15,12},
  {0x05, 7}, {0x11,12}, {0x78, 7}, {0x11,13},
  {0x7a, 7}, {0x10,13}, {0x21, 8}, {0x1a,16},
  {0x25, 8}, {0x19,16}, {0x24, 8}, {0x18,16},
  {0x05, 9}, {0x17,16}, {0x07, 9}, {0x16,16},
  {0x0d,10}, {0x15,16}, {0x1f,12}, {0x1a,12},
  {0x19,12}, {0x17,12}, {0x16,12}, {0x1f,13},
  {0x1e,13}, {0x1d,13}, {0x1c,13}, {0x1b,13},
  {0x1f,16}, {0x1e,16}, {0x1d,16}, {0x1c,16},
  {0x1b,16},
  {0x01,6}, /* escape */
  {0x06,4}, /* EOB */
};
static const uint8_t speedhq_level[121] = {
   1,  2,  3,  4,  5,  6,  7,  8,
   9, 10, 11, 12, 13, 14, 15, 16,
  17, 18, 19, 20, 21, 22, 23, 24,
  25, 26, 27, 28, 29, 30, 31, 32,
  33, 34, 35, 36, 37, 38, 39, 40,
   1,  2,  3,  4,  5,  6,  7,  8,
   9, 10, 11, 12, 13, 14, 15, 16,
  17, 18, 19, 20,  1,  2,  3,  4,
   5,  6,  7,  8,  9, 10, 11,  1,
   2,  3,  4,  5,  1,  2,  3,  4,
   1,  2,  3,  1,  2,  3,  1,  2,
   1,  2,  1,  2,  1,  2,  1,  2,
   1,  2,  1,  2,  1,  2,  1,  2,
   1,  2,  1,  1,  1,  1,  1,  1,
   1,  1,  1,  1,  1,  1,  1,  1,
   1,
};
static const uint8_t speedhq_run[121] = {
   0,  0,  0,  0,  0,  0,  0,  0,
   0,  0,  0,  0,  0,  0,  0,  0,
   0,  0,  0,  0,  0,  0,  0,  0,
   0,  0,  0,  0,  0,  0,  0,  0,
   0,  0,  0,  0,  0,  0,  0,  0,
   1,  1,  1,  1,  1,  1,  1,  1,
   1,  1,  1,  1,  1,  1,  1,  1,
   1,  1,  1,  1,  2,  2,  2,  2,
   2,  2,  2,  2,  2,  2,  2,  3,
   3,  3,  3,  3,  4,  4,  4,  4,
   5,  5,  5,  6,  6,  6,  7,  7,
   8,  8,  9,  9, 10, 10, 11, 11,
  12, 12, 13, 13, 14, 14, 15, 15,
  16, 16, 17, 18, 19, 20, 21, 22,
  23, 24, 25, 26, 27, 28, 29, 30,
  31,
};

Escape works similarly to MPEG-2; the next six bits contain the run (non-reversed), and the next 12 bits contain the level (non-reversed), offset by 2048. There'a always an EOB at the end of each block (as in MPEG-2), and there's never a block without a DC coefficient (also as in MPEG-2). All other codes are followed by the coefficient sign (again as in MPEG-2).

Alpha is coded as run-level too: first it has a codeword telling how many elements to skip (or whether this is end of block) and then a codeword for nonzero element, repeat.

Quantiser

Allowed quality levels:

 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 18, 20, 22, 24, 28, 32, 36, 40, 44, 48, 52, 56, 64, 72, 80, 88, 96, 104, 112

Provided quality is clipped to 99, also if it's less than 38 or odd then a nearest smaller value from default quality table is used instead.

Quantisation matrix:

  *, 16, 19, 22, 26, 27, 29, 34,
 16, 16, 22, 24, 27, 29, 34, 37,
 19, 22, 26, 27, 29, 34, 34, 38,
 22, 22, 26, 27, 29, 34, 37, 40,
 22, 26, 27, 29, 32, 35, 40, 48,
 26, 27, 29, 32, 35, 40, 48, 58,
 26, 27, 29, 34, 38, 46, 56, 69,
 27, 29, 35, 38, 46, 56, 69, 83

The DC quantizer is set to 16 no matter what; the other values are multiplied by (100 - quality). Each coefficient is multiplied with the corresponding quantization factor and then divided by 16 (no rounding), before the block goes to IDCT.

As an optimization, DC-only blocks are special-cased; all pixels in the block get exactly the value (dc + 4) >> 3.

Y'CbCr conversion uses BT.601 coefficients. Chroma positioning is center, as in MPEG-2.