Apple ProRes
- FourCCs used to indicate different ProRes flavours in the QuickTime container:
- Apple ProRes 422 High Quality: 'apch' ('hcpa' in little-endian)
- Apple ProRes 422 Standard Definition: 'apcn' ('ncpa' in little-endian)
- Apple ProRes 422 LT: 'apcs' ('scpa' in little-endian)
- Apple ProRes 422 Proxy: 'apco' ('ocpa' in little-endian)
- Apple ProRes 4444: 'ap4h' ('h4pa' in little-endian)
- Company: Apple
- Whitepaper: http://images.apple.com/support/finalcutpro/docs/Apple-ProRes-White-Paper-July-2009.pdf
- New Whitepaper introducing ProRes LT/Proxy/4444: http://images.apple.com/finalcutstudio/docs/Apple_ProRes_White_Paper_July_2009.pdf
- Samples: http://samples.mplayerhq.hu/V-codecs/HCPA/
ProRes Introduction
Apple ProRes is a family of proprietary video codecs used for storing and editing high definition video data in Apple's Final Cut Pro. Apple's official whitepaper lists the codec's key features as being:
- intra-only codecs
- visually lossless compression (i.e. compressed images cannot be distinguished from the original by a human observer)
- 4:2:2 / 4:4:4:4 source material
- 10-bit (12-bit for ProRes 4444) sample depth
- variable bitrate
ProRes 422 Standard Definition / High Quality codec
ProRes 422 SD/HQ is the same codec operating on two different bitrates (flavours). Two different FOURCCs are used in order to indicate each flavour:
Flavour name | FOURCC | Bitrate |
---|---|---|
Standard Definition (SD) | 'apcn' | 145 Mbps |
High Quality (HQ) | 'apch' | 220 Mbps |
ProRes algorithm is based on the Discrete cosine transform (further DCT) and utilizes the following compression techniques:
- custom hybrid Golomb-Rice / Exponential Golomb coding for DCT coefficients
- run-length coding
- differential coding
- scalar quantization
The bitstream of the ProRes 422 has been designed to provide the following additional features:
- frame-level multi-threaded encoding/decoding depending on available CPU cores
- spatial scalability providing the possibility to decode a video at different partial resolutions (1/2, 1/4, 1/8 of the full size and so on). ProRes is capable of saving CPU cycles while decoding at smaller resolutions due to a special bitstream layout enabling partial bitstream access and parsing.
Binary packages and compatibility
ProRes codec is currently available as the following binary libraries:
Lib Name | Version | Supported OS | Supported Architecture | Encoding | Decoding |
---|---|---|---|---|---|
AppleProRes422.component | 1.0.2 (Build 46) | Mac OS X | PowerPC | Yes | Yes |
AppleProResDecoder.qtx | 1.0.0.1 | Windows | x86 | No | Yes |
AppleProResCodec.component | 2.0 (Build 224) | Mac OS X | PowerPC/x86 | Yes | Yes |
AppleProResDecoder.component | 2.0.1 (Build 227) | Mac OS X | PowerPC/x86 | No | Yes |
AppleProResDecoder.component | 3.0.0 | Mac OS X | x86 | No | Yes |
Frame layout
A typical ProRes 422 frame has the following layout:
Frame container atom ------------------------------------ Frame header ------------------------------------ Picture 1 ------------------------------------ Picture 2 (interlaced frames only)
Frame container atom
At the beginning of each frame the frame container atom is located. It has the classical QuickTime atom structure with the ID set to the undocumented ProRes frame type ID:
Field size | Field name | Description |
---|---|---|
4 bytes | size | frame size in bytes |
4 bytes | type | 'icpf' ("image codec prores frame"?) |
All data is stored in the big-endian format. The value of the field "size" must match frame size from the movie container.
Frame header
A frame header stores description information, such as frame dimension, frame structure (progressive/interlaced), color information and the like. All data is stored in the big-endian format.
Field size | Field name | Value | Description |
---|---|---|---|
2 bytes | hdrSize | size of this header in bytes. Must be at least 28 bytes long. | |
2 bytes | version |
|
header version. |
4 bytes | creatorID |
|
FOURCC of the creator of the present stream. Ignored in all known decoders. |
2 bytes | frameWidth | Width of encoded frame. | |
2 bytes | frameHeight | Height of encoded frame. | |
1 byte | frameFlags |
layout: AAxxBBxx where
|
Frame structure flags. |
1 byte | reserved1 | 0 | Ignored. |
1 byte | primaries | Color primaries of the coded image (see the description of the 'nclc' extension by Apple). | |
1 byte | transf_func | Transfer function of the coded image (see the description of the 'nclc' extension by Apple). | |
1 byte | colorMatrix |
|
Color matrix ID for color conversion between YUV and RGB (see below). |
4 bits | src_pix_fmt |
|
Indicates source pixel format. |
4 bits | alpha_info |
|
Used in combination with alpha channel coding. |
1 byte | reserved2 | 0 | Ignored. |
1 byte | QMatFlags |
layout: xxxxxxCD where
|
Custom quantization matrices presence indicators. |
64 bytes | QMatLuma | Custom quantization matrix for luminance. Only present if indicated by the bit "C" of the QMatFlags. | |
64 bytes | QMatChroma | Custom quantization matrix for chrominance. Only present if indicated by the bit "D" of the QMatFlags. |
Picture layout
Each picture has the following layout:
Picture header ------------------------------------ Slice index table ------------------------------------ Slices data
The picture header contains two important parameters: width and height factors of a slice. Therefore, those tell the decoder how the coded picture is subdivided.
Slice index table consists of 16bit entries - one for each slice - giving the length of the data for each slice. Thus, it permits independent processing of the slices in means of multi-threading.
Slices data array contains actual encoded macroblock data.
Picture header
This header is present for every picture (field).
Field size | Field name | Description |
---|---|---|
1 byte | pic_hdr_size | size of this header in bits. Must be at least 64 bits (8 bytes) long. |
4 bytes | pic_data_size | size of the picture data in bytes. |
2 bytes | total_slices | total number of slices in the picture.
At the same times it indicates the number of entries in the slice table. |
4 bits | slice_width_factor | slice width = 2 ^ slice_width_factor. Supported slice sizes are therefore 8, 4, 2 and 1 macroblocks wide. |
4 bits | slice_height_factor | Ideally slice height = 2 ^ slice_height_factor but in all known decoders only the value of "0" for that factor is allowed.
Thus, only one slice height = 1 macroblock is supported. |
Slice coding
Slice header
Field size | Field name | Description |
---|---|---|
1 byte | slice_hdr_size | size of this header in bits. Must be at least 48 bits (6 bytes) long. |
1 byte | scale_factor | scale factor for scaling the quantization matrices (see below). |
2 bytes | luma_data_size | size of the luma bitstream in bytes. |
2 bytes | u_data_size | size of the chroma U bitstream in bytes. |
Although, the length of the chroma V data is not indcated in the slice header, it can be easily calculated as follows:
v_data_size = slice_data_size from slice index table - luma_data_size - u_data_size - (slice_hdr_size / 8);
Codeword encoding scheme
Every codeword is encoded as Rice code with three parameters defining coding parameters: maximum prefix length for Rice codes (MP
), Rice code parameter (R
) and Elias gamma (aka exp-Golomb) code parameter (G
).
Decoding process is the following: read unary prefix, if its value more than MP
then treat code as Elias gamma, otherwise treat it as Rice code (or pure unary for R
=0).
n = get_unary(); if (n > MP) { val = get_bits(G + (n - MP - 1)) + ((MP + 1) << R); } else if (R) { val = (1 << n) | get_bits(R); } else { val = n; }
Coding parameters are packed into one byte:
bits 0-1 MP bits 2-4 G bits 5-7 R
So further this byte value will be used to denote parameters.
Overall slice coding
Add data in slices is stored grouped: data for luma blocks is stored first, for chroma blocks last. Inside blocks DC coefficients are stored first, then AC coefficients.
DC coding scheme
DC values are delta-coded. First value and the first difference value are coded with fixed parameters, others depend on previous raw code:
dc_code_params[] = {0x04, 0x28, 0x28, 0x4D, 0x4D, 0x70, 0x70 }; code = get_code(0xB8); dc[0] = (code >> 1) ^ -(code & 1); code = 5; sign = 0; for (i = 1; i < num_dcs; i++) { code = get_code(dc_code_params[min(code, 6)]); sign ^= -(code & 1); dc[i] = dc[i - 1] + (((code + 1) >> 1) ^ sign) - sign; }
AC coding scheme
AC coefficients from all blocks are coded together as single (skip, val, sign)
stream interleaved (i.e. all coefficients at position 1 first, then all coefficients at position 2, etc.).
And again parameters for coding next value are selected depending on previous decoded value:
skip_code_params[] = { 0x06, 0x06, 0x05, 0x05, 0x04, 0x29, 0x29, 0x29, 0x29, 0x28, 0x28, 0x28, 0x28, 0x28, 0x28, 0x4C }; level_code_params[] = { 0x04, 0x0A, 0x05, 0x06, 0x04, 0x28, 0x28, 0x28, 0x28, 0x4C }; pos = num_blocks; skip = 4; level = 2; while (pos < 64 * num_blocks && has_bits_left()) { skip = get_code(skip_code_params[min(skip, 15)]); level = get_code(level_code_params[min(level, 9)]) + 1; sign = get_bit(); pos += skip + 1; block[pos % num_blocks][scan[pos / num_blocks]] = sign ? -val : val; }
Unquantising
DC = 4096 + ((dc_val * quant_matrix[0] * quant_mul) >> 2);
AC = (ac_val * quant_matrix[i] * quant_mul) >> 2;
Base quantising matrices are given in frame header, quantising multiplier is given in each slice header.
Scan order
Progressive:
0, 1, 8, 9, 2, 3, 10, 11, 16, 17, 24, 25, 18, 19, 26, 27, 4, 5, 12, 20, 13, 6, 7, 14, 21, 28, 29, 22, 15, 23, 30, 31, 32, 33, 40, 48, 41, 34, 35, 42, 49, 56, 57, 50, 43, 36, 37, 44, 51, 58, 59, 52, 45, 38, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63
Interlaced:
0, 8, 1, 9, 16, 24, 17, 25, 2, 10, 3, 11, 18, 26, 19, 27, 32, 40, 33, 34, 41, 48, 56, 49, 42, 35, 43, 50, 57, 58, 51, 59, 4, 12, 5, 6, 13, 20, 28, 21, 14, 7, 15, 22, 29, 36, 44, 37, 30, 23, 31, 38, 45, 52, 60, 53, 46, 39, 47, 54, 61, 62, 55, 63,
Alpha plane coding
Both alpha depths are coded the same, the only difference is delta value bit size (4 for 8-bit alpha, 7 for 16-bit alpha).
alpha = (1 << bit_depth) - 1; while (!all_coeffs_decoded) { if (get_bit()) val = get_bits(bit_depth); else { val = get_bits(bit_depth == 16 ? 7 : 4); sign = val & 1; val = (val + 2) >> 1; if (sign) val = -val; } alpha_val = (alpha_val + val) & ((1 << bit_depth) - 1); *dst++ = alpha_val; if (get_bit()) { run = get_bits(4); if (!run) run = get_bits(11); for (i = 0; i < run; i++) *dst++ = run; } }
This decodes slice alpha data line by line.