Apple ProRes

From MultimediaWiki
Jump to navigation Jump to search

ProRes Introduction

Apple ProRes is a family of proprietary video codecs used for storing and editing high definition video data in Apple's Final Cut Pro. Apple's official whitepaper lists the codec's key features as being:

  • intra-only codecs
  • visually lossless compression (i.e. compressed images cannot be distinguished from the original by a human observer)
  • 4:2:2 / 4:4:4:4 source material
  • 10-bit (12-bit for ProRes 4444) sample depth
  • variable bitrate

ProRes 422 Standard Definition / High Quality codec

ProRes 422 SD/HQ is the same codec operating on two different bitrates (flavours). Two different FOURCCs are used in order to indicate each flavour:

Flavour name FOURCC Bitrate
Standard Definition (SD) 'apcn' 145 Mbps
High Quality (HQ) 'apch' 220 Mbps

ProRes algorithm is based on the Discrete cosine transform (further DCT) and utilizes the following compression techniques:

The bitstream of the ProRes 422 has been designed to provide the following additional features:

  • frame-level multi-threaded encoding/decoding depending on available CPU cores
  • spatial scalability providing the possibility to decode a video at different partial resolutions (1/2, 1/4, 1/8 of the full size and so on). ProRes is capable of saving CPU cycles while decoding at smaller resolutions due to a special bitstream layout enabling partial bitstream access and parsing.


Binary packages and compatibility

ProRes codec is currently available as the following binary libraries:

Lib Name Version Supported OS Supported Architecture Encoding Decoding
AppleProRes422.component 1.0.2 (Build 46) Mac OS X PowerPC Yes Yes
AppleProResDecoder.qtx 1.0.0.1 Windows x86 No Yes
AppleProResCodec.component 2.0 (Build 224) Mac OS X PowerPC/x86 Yes Yes
AppleProResDecoder.component 2.0.1 (Build 227) Mac OS X PowerPC/x86 No Yes
AppleProResDecoder.component 3.0.0 Mac OS X x86 No Yes

Frame layout

A typical ProRes 422 frame has the following layout:

       Frame container atom
------------------------------------
           Frame header
------------------------------------
            Picture 1
------------------------------------
 Picture 2 (interlaced frames only)

Frame container atom

At the beginning of each frame the frame container atom is located. It has the classical QuickTime atom structure with the ID set to the undocumented ProRes frame type ID:

Field size Field name Description
4 bytes size frame size in bytes
4 bytes type 'icpf' ("image codec prores frame"?)

All data is stored in the big-endian format. The value of the field "size" must match frame size from the movie container.


Frame header

A frame header stores description information, such as frame dimension, frame structure (progressive/interlaced), color information and the like. All data is stored in the big-endian format.

Field size Field name Value Description
2 bytes hdrSize size of this header in bytes. Must be at least 28 bytes long.
2 bytes version
  • "0" - supported in all known decoders
  • "1" - supported in the version 2.0 only
header version.
4 bytes creatorID
  • 'apl0' -> Apple Inc.
  • 'arri' -> Arnold & Richter Cine Technik (A&R)
  • 'aja0' -> AJA Kona Hardware
FOURCC of the creator of the present stream. Ignored in all known decoders.
2 bytes frameWidth Width of encoded frame.
2 bytes frameHeight Height of encoded frame.
1 byte frameFlags

layout: AAxxBBxx where

  • bits AA = chrominance factor (picture format):
    • "2" - 422
    • "3" - 444
  • bits BB = frame type:
    • "0" - progressive
    • "1" - interlaced (top-field first)
    • "2" - interlaced (bottom-field first)
Frame structure flags.
1 byte reserved1 0 Ignored.
1 byte primaries Color primaries of the coded image (see the description of the 'nclc' extension by Apple).
1 byte transf_func Transfer function of the coded image (see the description of the 'nclc' extension by Apple).
1 byte colorMatrix
  • "1" = ITU-R BT.709-2 / SMPTE 274M-1995 / SMPTE 296M-1997
  • "6" = ITU-R BT.601-4 / SMPTE 170M-1994 / SMPTE 293M-1996
Color matrix ID for color conversion between YUV and RGB (see below).
4 bits src_pix_fmt
  • 0 - unknown
  • 1 - '2vuy' (8-bit 4:2:2)
  • 2 - 'v210' (10-bit 4:2:2)
  • 3 - 'v216' (10,12,14,16-bit 4:2:2)
  • 4 - 'r408' (8-bit 4:4:4:4 with alpha)
  • 5 - 'v408' (8-bit 4:4:4:4 with alpha and super black)
  • 6 - 'r4fl' (32-bit floating-point 4:4:4:4)
  • 7 - 0x20 (8-bit RGB)
  • 8 - 'BGRA' (8-bit RGB with alpha)
  • 9 - 'n302' seems to be undocumented
  • 10 - 'b64a' (16-bit ARGB)
  • 11 - 'R10k' (AJA 10-bit RGB)
  • 12 - 'l302' seems to be undocumented
  • 13-15 invalid
Indicates source pixel format.
4 bits alpha_info
  • 0 - no alpha
  • 1 - 8-bit alpha
  • 2 - 16-bit alpha
Used in combination with alpha channel coding.
1 byte reserved2 0 Ignored.
1 byte QMatFlags

layout: xxxxxxCD where

Custom quantization matrices presence indicators.
64 bytes QMatLuma Custom quantization matrix for luminance. Only present if indicated by the bit "C" of the QMatFlags.
64 bytes QMatChroma Custom quantization matrix for chrominance. Only present if indicated by the bit "D" of the QMatFlags.

Picture layout

Each picture has the following layout:

           Picture header
------------------------------------
          Slice index table
------------------------------------
            Slices data

The picture header contains two important parameters: width and height factors of a slice. Therefore, those tell the decoder how the coded picture is subdivided.

Slice index table consists of 16bit entries - one for each slice - giving the length of the data for each slice. Thus, it permits independent processing of the slices in means of multi-threading.

Slices data array contains actual encoded macroblock data.

Picture header

This header is present for every picture (field).

Field size Field name Description
1 byte pic_hdr_size size of this header in bits. Must be at least 64 bits (8 bytes) long.
4 bytes pic_data_size size of the picture data in bytes.
2 bytes total_slices total number of slices in the picture.

At the same times it indicates the number of entries in the slice table.

4 bits slice_width_factor slice width = 2 ^ slice_width_factor. Supported slice sizes are therefore 8, 4, 2 and 1 macroblocks wide.
4 bits slice_height_factor Ideally slice height = 2 ^ slice_height_factor but in all known decoders only the value of "0" for that factor is allowed.

Thus, only one slice height = 1 macroblock is supported.

Slice coding

Slice header

Field size Field name Description
1 byte slice_hdr_size size of this header in bits. Must be at least 48 bits (6 bytes) long.
1 byte scale_factor scale factor for scaling the quantization matrices (see below).
2 bytes luma_data_size size of the luma bitstream in bytes.
2 bytes u_data_size size of the chroma U bitstream in bytes.

Although, the length of the chroma V data is not indcated in the slice header, it can be easily calculated as follows:

v_data_size = slice_data_size from slice index table - luma_data_size - u_data_size - (slice_hdr_size / 8);

Codeword encoding scheme

Every codeword is encoded as Rice code with three parameters defining coding parameters: maximum prefix length for Rice codes (MP), Rice code parameter (R) and Elias gamma (aka exp-Golomb) code parameter (G).

Decoding process is the following: read unary prefix, if its value more than MP then treat code as Elias gamma, otherwise treat it as Rice code (or pure unary for R=0).

 n = get_unary();
 if (n > MP) {
   val = get_bits(G + (n - MP - 1)) + ((MP + 1) << R);
 } else if (R) {
   val = (1 << n) | get_bits(R);
 } else {
   val = n;
 }

Coding parameters are packed into one byte:

 bits 0-1 MP
 bits 2-4 G
 bits 5-7 R

So further this byte value will be used to denote parameters.

Overall slice coding

Add data in slices is stored grouped: data for luma blocks is stored first, for chroma blocks last. Inside blocks DC coefficients are stored first, then AC coefficients.

DC coding scheme

DC values are delta-coded. First value and the first difference value are coded with fixed parameters, others depend on previous raw code:

 dc_code_params[] = {0x04, 0x28, 0x28, 0x4D, 0x4D, 0x70, 0x70 };
 
 code = get_code(0xB8);
 dc[0] = (code >> 1) ^ -(code & 1);
 
 code = 5;
 sign = 0;
 for (i = 1; i < num_dcs; i++) {
   code = get_code(dc_code_params[min(code, 6)]);
   sign ^= -(code & 1);
   dc[i] = dc[i - 1] + (((code + 1) >> 1) ^ sign) - sign; 
 }

AC coding scheme

AC coefficients from all blocks are coded together as single (skip, val, sign) stream interleaved (i.e. all coefficients at position 1 first, then all coefficients at position 2, etc.). And again parameters for coding next value are selected depending on previous decoded value:

 skip_code_params[] = { 0x06, 0x06, 0x05, 0x05, 0x04, 0x29, 0x29, 0x29, 0x29, 0x28, 0x28, 0x28, 0x28, 0x28, 0x28, 0x4C };
 level_code_params[] = { 0x04, 0x0A, 0x05, 0x06, 0x04, 0x28, 0x28, 0x28, 0x28, 0x4C };
 
 pos   = num_blocks;
 skip  = 4;
 level = 2;
 while (pos < 64 * num_blocks && has_bits_left()) {
   skip = get_code(skip_code_params[min(skip, 15)]);
   level = get_code(level_code_params[min(level, 9)]) + 1;
   sign = get_bit();
   
   pos += skip + 1;
   block[pos % num_blocks][scan[pos / num_blocks]] = sign ? -val : val;
 }

Unquantising

 DC = 4096 + ((dc_val * quant_matrix[0] * quant_mul) >> 2);
 AC = (ac_val * quant_matrix[i] * quant_mul) >> 2;

Base quantising matrices are given in frame header, quantising multiplier is given in each slice header.

Scan order

Progressive:

    0,  1,  8,  9,  2,  3, 10, 11,
   16, 17, 24, 25, 18, 19, 26, 27,
    4,  5, 12, 20, 13,  6,  7, 14,
   21, 28, 29, 22, 15, 23, 30, 31,
   32, 33, 40, 48, 41, 34, 35, 42,
   49, 56, 57, 50, 43, 36, 37, 44,
   51, 58, 59, 52, 45, 38, 39, 46,
   53, 60, 61, 54, 47, 55, 62, 63

Interlaced:

    0,  8,  1,  9, 16, 24, 17, 25,
    2, 10,  3, 11, 18, 26, 19, 27,
   32, 40, 33, 34, 41, 48, 56, 49,
   42, 35, 43, 50, 57, 58, 51, 59,
    4, 12,  5,  6, 13, 20, 28, 21,
   14,  7, 15, 22, 29, 36, 44, 37,
   30, 23, 31, 38, 45, 52, 60, 53,
   46, 39, 47, 54, 61, 62, 55, 63,

Alpha plane coding

Both alpha depths are coded the same, the only difference is delta value bit size (4 for 8-bit alpha, 7 for 16-bit alpha).

 alpha = (1 << bit_depth) - 1;
 while (!all_coeffs_decoded) {
   if (get_bit())
     val = get_bits(bit_depth);
   else {
     val = get_bits(bit_depth == 16 ? 7 : 4);
     sign = val & 1;
     val = (val + 2) >> 1;
     if (sign)
       val = -val;
   }
   alpha_val = (alpha_val + val) & ((1 << bit_depth) - 1);
   *dst++ = alpha_val;
   if (get_bit()) {
     run = get_bits(4);
     if (!run)
       run = get_bits(11);
     for (i = 0; i < run; i++)
       *dst++ = run;
   }
 }

This decodes slice alpha data line by line.