VC-1 Data Structures
Part of Understanding VC-1
This page is a discussion of the various data structures and constant enumerations employed in the SMPTE Reference Decoder (SRD) VC-1 reference implementation. These are mostly found in the file vc1types.h.
Macroblocks, Blocks, and Sub-blocks
A macroblock embodies a 16x16 block of pixels in a YUV 4:2:0 colorspace. A macroblock consists of 6 blocks: 4 8x8 Y blocks, 1 U block, and 1 V block. Further, these blocks maybe divided into sub-blocks of sizes 8x4, 4x8, or 4x4.
Block Types
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation. 0: 8x8 inter-coded block 1: 8x4 inter-coded block 2: 4x8 inter-coded block 3: 4x4 inter-coded block 4: transform type not yet determined 5: intra-coded block, no AC prediction 6: intra-coded block, AC prediction of top row coefficients 7: intra-coded block, AC prediction of left column coefficients
Intra Block
An intra block data structure requires the following information:
- number of non-zero AC coefficients
- quantized DC coefficient for prediction
- 7 quantized AC coefficients along the top row of the block, used for prediction
- 7 quantized AC coefficients along the left side of the block, used for prediction
- 16 samples representing the bottom 2 8-pixel rows of the block, maintained for overlap smoothing filters
Hybrid Prediction Modes
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to independent implementation.
- 0: predict from left
- 1: predict from top
- 2: no hybrid prediction
Sub-block Patterns
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to independent implementation.
- 0: 8x8 transform, coded
- 1: 8x4 transform, bottom subblock coded
- 2: 8x4 transform, top subblock coded
- 3: 8x4 transform, both subblocks coded
- 4: 4x8 transform, right subblock coded
- 5: 4x8 transform, left subblock coded
- 6: 4x8 transform, both subblocks coded
- 7: 4x4 transform, subblock pattern separate
- 8: 8x8 transform, coded, whole MB
- 9: 8x4 transform, bottom subblock coded, whole MB
- 10: 8x4 transform, top subblock coded, whole MB
- 11: 8x4 transform, both subblocks coded, whole MB
- 12: 4x8 transform, right subblock coded, whole MB
- 13: 4x8 transform, left subblock coded, whole MB
- 14: 4x8 transform, both subblocks coded, whole MB
- 15: 4x4 transform, subblocks pattern separate, whole MB
One more stray constant:
- 8: MB level threshold
Motion Vector
This is the information maintained for an individual motion vector:
- X offset
- Y offset
- a flag indicating whether the vector pertains to the top or bottom field of interlaced video
The (X, Y) vector is relative to the top-left coordinate of a block. The fractional pel resolution of the vectors depends on the motion vector coding mode.
Motion
This structure encapsulates motion vectors along with mode information.
- prediction mode, as enumerated in Hybrid Prediction Modes
- motion vector data structure
- differential motion vector data structure, represented in quarter-pel units
The SRD contains the following not accompanying this data structure:
/* * If Two Reference Images (NUMREF=1) then: * Y=2*(YValue)+PredFlag * PredFlag: 0=dominant 1=non-dominant */
Motion Vector History
This data structure consists of an array of 4 motion vector data structures which stores the motion vector history for 4 Y blocks, used for direct mode.
Inter Block
An intra block data structure requires the following information:
- an array of 4 integers representing the number of non-zero coefficients (DC and AC together) for up to 4 sub-blocks in the block
- 2 motion structures, 1 for backward prediction and 1 for forward prediction
Block
This is the information that the SRD maintains for an individual block:
- block type, as defined in Block Types section
- a flag indicating whether there are non-zero AC coefficients for an intra block, or non-zero DC or AC coefficients for an inter block
- a C union that contains either an intra or inter block data structure
Quantizer
This data structure maintains information about quantization parameters.
- quantizer step, range 0..31
- quantizer half-step, either 0 or 1
- a flag indicating whether the quantization is uniform or non-uniform
Macroblock Properties
These properties are associated with various macroblock coding options:
- a block is intra-coded, or has 1, 2, or 4 motion vectors associated
- a block predicts backwards from a previous frame, from a forward frame, from both a backward or forward frame, or uses "direct" mode
- a flag that indicates whether MVs apply to interlaced fields
- a flag that indicates that "bottom field different direction to top"
- a flag that indicates "field transform"
AC Prediction
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to independent implementation.
- 0: AC prediction off
- 1: AC prediction on
- 2: AC prediction absent (no blocks to predict from?)
Macroblock
The SRD maintains the following information about an individual macroblock:
- macroblock type: this is a combination of properties enumerated in Macroblock Properties
- AC prediction status as enumerated in AC Prediction
- block type as enumerated in Block Types; might be "any"
- a flag indicating whether the overlap filter is active for this macroblock
- a flag indicating whether is motion predicted only (presumably, this means that a predicted block is formed by no other transform/addition occurs for the MB)
- a 6-bit flag vector that indicates which of the 6 constituent block in the MB are coded
- a 4-bit flag vector that indicates motion vector block pattern-- these flags are set if the differential MV for a respective Y block is not 0
- quantizer data structure for the macroblock
- 6 block structures
Picture Format
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
- 0: picture is a progressive frame
- 1: picture is an interlaced frame
- 2: picture consists of 2 interlaced fields
- 3: picture format has not been determined
Bitstream Profile
These are the supported profiles in the VC-1 coding scheme.
- 0: simple profile
- 1: main profile
- 2: reserved
- 3: advanced profile
Profile Level Enumeration
- simple/main profiles:
- 0: low
- 1: medium
- 2: high
- advanced profile:
- 0..4: levels 0..4
- levels 5..7 are reserved
- 255 indicates that the level is unknown
Chroma Format
The SRD only supports one chroma format: YUV 4:2:0, which is format 1. Ostensibly, there are 2 bits in the bitstream to define chroma format. Modes 0, 2, and 3 are all reserved.
Color Primaries
- 0: color primaries are forbidden
- 1: ITU-R BT-709
- 2: unspecified
- 3: reserved
- 4: reserved
- 5: EBU Tech 3213
- 6: SMPTE C
- 7-255: reserved
Transfer Characteristics
These properties are encoded into the bitstream and describe the characteristics of the source bitstream:
- 0: forbidden
- 1: ITU-R BT-709
- 2: unspecified
- 3: reserved
- 4: reserved
- 5: reserved
- 6: reserved
- 7: SMPTE 240M
- 8-255: reserved
Matrix Coefficients
- 0: forbidden
- 1: ITU-R BT-709
- 2: unspecified
- 3: reserved
- 4: reserved
- 5: reserved
- 6: SMPTE 170M
- 7: SMPTE 240M
- 8-255: reserved
Quantizer Modes
- 0: quantizer implied by quantizer step size
- 1: quantizer explicitly signaled
- 2: non-uniform quantizer
- 3: uniform quantizer
Picture Types
- 0: I-frame-- intraframe/field
- 1: P-frame-- predicted frame/field
- 2: B-frame-- bi-directionally predicted frame/field
- 3: BI-frame-- ??? perhaps an I-frame upon which no other frames depend
- 4: skipped
Scaling Modes
This enumeration defines whether there will be any scaling in the picture before display:
- 0: 1x1 = no scaling
- 1: 2x1 = horizontal scaling
- 2: 1x2 = vertical scaling
- 3: 2x2 = horizontal and vertical scaling
Motion Vector Ranges
- Range #0:
- x component range = -64..63
- y component range = -32..31
- Range #1:
- x component range = -128..127
- y component range = -64..63
- Range #2:
- x component range = -512..511
- y component range = -128..127
- Range #3:
- x component range = -1024..1023
- y component range = -256..255
Differential Motion Vector Ranges
- 0: no extended DMV
- 1: extended DMV horizontal/X
- 2: extended DMV vertical/Y
- 3: extended DMV horizontal & vertical
Macroblock Quantizer Step Sizes
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
- 0: all macroblocks use PQUANT
- 1: edge MBs use ALTPQUANT
- 2: left/top MBs use ALTPQUANT
- 3: top/right MBs use ALTPQUANT
- 4: right/bottom MBs use ALTPQUANT
- 5: bottom/left MBs use ALTPQUANT
- 6: left MBs use ALTPQUANT
- 7: top MBs use ALTPQUANT
- 8: right MBs use ALTPQUANT
- 9: bottom MBs use ALTPQUANT
- 10: PQUANT vs. ALTPQUANT is selected per MB
- 11: quantizer select per MB
Bitplane Coding Methods
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
- 0: normal-2 method
- 1: normal-6 method
- 2: rowskip method
- 3: colskip method
- 4: diff-2 method
- 5: diff-6 method
- 6: uncompressed
Overlap Filter Modes
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
- 0: disable overlap filter
- 1: enable overlap filter for all macroblocks
- 2: overlap filter is enabled for select macroblocks
Motion Vector Modes
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
- 0: 1 motion vector, half-pel, bilinear interpolation
- 1: 1 motion vector, half-pel, bicubic interpolation
- 2: 1 motion vector, quarter-pel, bicubic interpolation
- 3: mixed motion vectors, quarter-pel, bicubic interpolation
- 4: intensity compensation
Intensity Compensation
The SRD used the following data structure to maintain information about intensity compensation:
- a flag to indicate whether intensity compensation is enabled
- IC scale
- IC shift
B Fraction
This data structure tracks B-fraction data:
- B-fraction numerator
- B-fraction denominator
- scalefactor: approximate numerator * 256 / denominator
Hypothetical Reference Decoder
The SRD contains data structures pertaining to a hypothetical reference decoder involving a leaky bucket algorithm.
Pan Scan Window
This data structure pertains to pan and scan windows:
- horizontal offset in pixels
- vertical offset in pixels
- width in pixels
- height in pixels
Pan Scan Parameters
This data structure contains a flag indicating whether pan and scan is present, and an array of 3 Pan Scan Window data structures (3 is the maximum supported).
Sequence And Layer Parameters
The SRD maintains all of the information for the sequence layer:
- profile (simple, main, or advanced)
- maximum coded width
- maximum coded height
- coded width
- coded height
- display width
- display height
- aspect width
- aspect height
- profile level
- interface flag
- frame rate numerator (0 unless otherwise specified)
- frame rate denominator (the comments list 1000, 1001, and 32 as valid values, and claim that it is 0 if not specified, which sounds potentially problematic)
- color format indicator flag
- chroma format
- color primaries
- transfer characteristics
- matrix coefficients
- hypothetical reference decoder
- loop filter flag
- multi resolution coding flag
- fast chrominance motion compensation flag
- extended motion vector flag
- extended differential motion vector flag
- d-quant
- VS transform flag
- overlapped transform flag
- sync marker flag
- range reduction flag
- max B-frames
- quantizer mode
- post processing flag
- frame counter flag
- pull down flag
- PsF (???)
- Q framerate for post processing
- Q bitrate for post processing
- pan scan flag
- reserved RTM flag
- frame interpolation flag
- range scale Y (comment: scale value times 8)
- range scale UV (comment: scale value times 8)
- number of pan scan windows
- broken link flag
- closed entry flag
- refdist flag (refdist refers to distance to previous reference frame)
- frame user data present flag
- end-of-sequence marker present flag
Start Codes
These are the various start codes that the SRD defines:
- 0x0A: end of sequence
- 0x0B: slice
- 0x0C: field
- 0x0D: frame header
- 0x0E: entry point header
- 0x0F: sequence header
- 0x1B: user-defined slice
- 0x1C: user-defined field
- 0x1D: user-defined frame header
- 0x1E: user-defined entry point header
- 0x1F: user-defined sequence header
Component
The SRD defines a component as a single Y, U, or V plane and associates these members with the data structure:
- data: the raw bytes that comprise the Y, U, or V sample data
- bytes/line, a.k.a. stride
Field
This data structure defines all of the parameters that pertain to a particular field:
- picture type
- conditional overlap filter mode
- quantization mode
- motion vector mode
- motion vector range
- block transform type
- post processing flag
- extended X differential motion vector range
- extended Y differential motion vector range
- number of reference fields (either 1 or 2)
- reference field (either last or last-but-one)
- motion vector VLC table (0..7)
- MB mode VLC table (0..7)
- block pattern 2 motion vector table (0..3)
- block pattern 4 motion vector table (0..3)
- inter-coded block coding pattern VLC table (0..3)
- AC coding set to use for intra-coded Y blocks (0..2)
- AC coding set to use for all inter-coded blocks or U and V intra (0..2)
- DC coding set (0..1)
- rows per slice (0 = no slicing used)
Picture
The picture is the fundamental data unit in the VC-1 coding scheme. A picture can be one of the following things:
- a progressive frame
- an interlaced top field
- an interlaced bottom field
- an interlaced frame
The SRD maintains the following information about a picture:
- frame number (modulo 1<<32)
- picture format
- Y component data structure
- U component data structure
- V component data structure
- 2 field data structures
- picture resolution index
- top field first flag
- repeat first field flag
- range reduction used flag
- frame interpolation hint flag
- chrominance sample format flag
- repeat frame count
- pan scan parameters data structure
- post processing mode
Scale Motion Vectors
This data structure contains information about scaling motion vectors for interlaced frames:
- scale (comment: down scale factor * 256)
- scale 1 (comment: up scale factor * 256) if in zone 1
- scale 2 (comment: down scale factor * 256) if not in zone 1
- zone 1 X size
- zone 1 Y size
- zone 1 X offset
- zone 1 Y offset
- flag indicating scaling up or down for opposite
- flag indicating top or bottom field
- motion vector range
- motion vector mode
Interpolation
This data structure contains information to be passed to a bilinear or bicubic interpolation function:
- component data structure
- width of resulting filtered rectangle
- height of resulting filtered rectangle
- flag indicating rounding behavior
Padding Modes
- simple or main profile - pad from macroblock edge
- advanced profile progressive - pad from image edge
- advanced profile interlaced field padding
Rectangle
Nothing complicated about this data structure-- it's just 2 (X, Y) coordinate pairs specifying the upper-left and lower-right corners of a rectange.
Image Position
This data structure contains rectangles to control padding and cropping:
- total width of buffer
- total height of buffer
- image rectangle in pels relative to buffer origin
- rectangle to pad outwards from in pels relative to buffer origin
- rectangle limits to pad outwards to in pels relative to buffer origin
Reference Picture
This data structure contains all of the information to comprise a reference picture (I-frame).
- valid flag
- broken link flag-- reference is not longer available due to a broken link
- parameter indicating whether top field, bottom field, or both are padded
- range Y scale (comment: Y scaling factor times 8)
- range UV scale (comment: UV scaling factor times 8)
- number of frames between this and the last reference frame)
- frame number modulo (1<<32)
- top field first flag
- repeat first field flag
- PsF (???)
- pan and scan parameters data structure
- frame interpolation hint flag (comment indicates it is not used in decoding process)
- chrominance plane sampling mode, pertains to interlaced modes
- repeated frame count
- post processing mode
- coded width
- coded height
- max coded width
- max coded height
- picture format
- 2 motion vector ranges, 1 for each field
- 2 picture types, 1 for each field
- padding mode
- picture resolution scaling mode
- Y component data structure
- U component data structure
- V component data structure
- pointer to Y data top-left corner
- pointer to U data top-left corner
- pointer to V data top-left corner
- image position data structure indicating position of Y samples in image buffer
- image position data structure indicating position of C samples in image buffer
Level Limits
This data structure hold information about various limits at each profile and level.
- max macroblocks per second
- max macroblocks per frame
- max peak transmission rate in kilobits per second
- max buffer size in multiples of 16 kilobits
- motion vector range allowed
Position
This data structure describes the current macroblock being processed:
- picture type
- picture format
- profile
- motion vector mode
- motion vector range
- flag indicating top vs. bottom field
- flag indicating first vs. second field
- pointer to the current macroblock data structure
- pointer to the start of the macroblock circular data structure
- pointer to the current position in the motion vector history buffer
- circular buffer size in macroblocks
- X macroblock offset in current slice
- Y macroblock offset in current slice
- Y macroblock offset of slice in picture
- width in macroblocks of coded picture
- height in macroblocks of codec picture
- coded width
- coded height
- max coded width
- max coded height
- picture quantizer (PQUANT)
- B-fraction syntax element
- number of reference fields, minus 1
- reference field when previous field is 0
- bias to add to intra blocks after transform
- Y scaling factor (times 8)
- UV scaling factor (times 8)
- fast chrominance motion compensation flag
- picture resolution scale mode
- reference picture data structure: old I/P
- reference picture data structure: new/current I/P
- reference picture data structure: reconstructed B picture
- reference picture data structure: backup copy of reference before IC applied
- 2 scale motion vector data structures (1 forward, 1 backward)
- 6 64-element arrays for rescontructing samples