VC-1 Data Structures: Difference between revisions
(and yet more data structures) |
(split the enumerations into a separate page) |
||
Line 5: | Line 5: | ||
== Macroblocks, Blocks, and Sub-blocks == | == Macroblocks, Blocks, and Sub-blocks == | ||
A macroblock embodies a 16x16 block of pixels in a [[YUV 4:2:0]] colorspace. A macroblock consists of 6 blocks: 4 8x8 Y blocks, 1 U block, and 1 V block. Further, these blocks maybe divided into sub-blocks of sizes 8x4, 4x8, or 4x4. | A macroblock embodies a 16x16 block of pixels in a [[YUV 4:2:0]] colorspace. A macroblock consists of 6 blocks: 4 8x8 Y blocks, 1 U block, and 1 V block. Further, these blocks maybe divided into sub-blocks of sizes 8x4, 4x8, or 4x4. | ||
== Intra Block == | == Intra Block == | ||
Line 24: | Line 13: | ||
* 7 quantized AC coefficients along the left side of the block, used for prediction | * 7 quantized AC coefficients along the left side of the block, used for prediction | ||
* 16 samples representing the bottom 2 8-pixel rows of the block, maintained for overlap smoothing filters | * 16 samples representing the bottom 2 8-pixel rows of the block, maintained for overlap smoothing filters | ||
== Motion Vector == | == Motion Vector == | ||
Line 98: | Line 60: | ||
* a flag that indicates that "bottom field different direction to top" | * a flag that indicates that "bottom field different direction to top" | ||
* a flag that indicates "field transform" | * a flag that indicates "field transform" | ||
== Macroblock == | == Macroblock == | ||
Line 116: | Line 72: | ||
* quantizer data structure for the macroblock | * quantizer data structure for the macroblock | ||
* 6 block structures | * 6 block structures | ||
== Intensity Compensation == | == Intensity Compensation == | ||
Line 331: | Line 148: | ||
* frame user data present flag | * frame user data present flag | ||
* end-of-sequence marker present flag | * end-of-sequence marker present flag | ||
== Component == | == Component == |
Revision as of 08:13, 24 April 2006
Part of Understanding VC-1
This page is a discussion of the various data structures and constant enumerations employed in the SMPTE Reference Decoder (SRD) VC-1 reference implementation. These are mostly found in the file vc1types.h.
Macroblocks, Blocks, and Sub-blocks
A macroblock embodies a 16x16 block of pixels in a YUV 4:2:0 colorspace. A macroblock consists of 6 blocks: 4 8x8 Y blocks, 1 U block, and 1 V block. Further, these blocks maybe divided into sub-blocks of sizes 8x4, 4x8, or 4x4.
Intra Block
An intra block data structure requires the following information:
- number of non-zero AC coefficients
- quantized DC coefficient for prediction
- 7 quantized AC coefficients along the top row of the block, used for prediction
- 7 quantized AC coefficients along the left side of the block, used for prediction
- 16 samples representing the bottom 2 8-pixel rows of the block, maintained for overlap smoothing filters
Motion Vector
This is the information maintained for an individual motion vector:
- X offset
- Y offset
- a flag indicating whether the vector pertains to the top or bottom field of interlaced video
The (X, Y) vector is relative to the top-left coordinate of a block. The fractional pel resolution of the vectors depends on the motion vector coding mode.
Motion
This structure encapsulates motion vectors along with mode information.
- prediction mode, as enumerated in Hybrid Prediction Modes
- motion vector data structure
- differential motion vector data structure, represented in quarter-pel units
The SRD contains the following not accompanying this data structure:
/* * If Two Reference Images (NUMREF=1) then: * Y=2*(YValue)+PredFlag * PredFlag: 0=dominant 1=non-dominant */
Motion Vector History
This data structure consists of an array of 4 motion vector data structures which stores the motion vector history for 4 Y blocks, used for direct mode.
Inter Block
An intra block data structure requires the following information:
- an array of 4 integers representing the number of non-zero coefficients (DC and AC together) for up to 4 sub-blocks in the block
- 2 motion structures, 1 for backward prediction and 1 for forward prediction
Block
This is the information that the SRD maintains for an individual block:
- block type, as defined in Block Types section
- a flag indicating whether there are non-zero AC coefficients for an intra block, or non-zero DC or AC coefficients for an inter block
- a C union that contains either an intra or inter block data structure
Quantizer
This data structure maintains information about quantization parameters.
- quantizer step, range 0..31
- quantizer half-step, either 0 or 1
- a flag indicating whether the quantization is uniform or non-uniform
Macroblock Properties
These properties are associated with various macroblock coding options:
- a block is intra-coded, or has 1, 2, or 4 motion vectors associated
- a block predicts backwards from a previous frame, from a forward frame, from both a backward or forward frame, or uses "direct" mode
- a flag that indicates whether MVs apply to interlaced fields
- a flag that indicates that "bottom field different direction to top"
- a flag that indicates "field transform"
Macroblock
The SRD maintains the following information about an individual macroblock:
- macroblock type: this is a combination of properties enumerated in Macroblock Properties
- AC prediction status as enumerated in AC Prediction
- block type as enumerated in Block Types; might be "any"
- a flag indicating whether the overlap filter is active for this macroblock
- a flag indicating whether is motion predicted only (presumably, this means that a predicted block is formed by no other transform/addition occurs for the MB)
- a 6-bit flag vector that indicates which of the 6 constituent block in the MB are coded
- a 4-bit flag vector that indicates motion vector block pattern-- these flags are set if the differential MV for a respective Y block is not 0
- quantizer data structure for the macroblock
- 6 block structures
Intensity Compensation
The SRD used the following data structure to maintain information about intensity compensation:
- a flag to indicate whether intensity compensation is enabled
- IC scale
- IC shift
B Fraction
This data structure tracks B-fraction data:
- B-fraction numerator
- B-fraction denominator
- scalefactor: approximate numerator * 256 / denominator
Hypothetical Reference Decoder
The SRD contains data structures pertaining to a hypothetical reference decoder involving a leaky bucket algorithm.
Pan Scan Window
This data structure pertains to pan and scan windows:
- horizontal offset in pixels
- vertical offset in pixels
- width in pixels
- height in pixels
Pan Scan Parameters
This data structure contains a flag indicating whether pan and scan is present, and an array of 3 Pan Scan Window data structures (3 is the maximum supported).
Sequence And Layer Parameters
The SRD maintains all of the information for the sequence layer:
- profile (simple, main, or advanced)
- maximum coded width
- maximum coded height
- coded width
- coded height
- display width
- display height
- aspect width
- aspect height
- profile level
- interface flag
- frame rate numerator (0 unless otherwise specified)
- frame rate denominator (the comments list 1000, 1001, and 32 as valid values, and claim that it is 0 if not specified, which sounds potentially problematic)
- color format indicator flag
- chroma format
- color primaries
- transfer characteristics
- matrix coefficients
- hypothetical reference decoder
- loop filter flag
- multi resolution coding flag
- fast chrominance motion compensation flag
- extended motion vector flag
- extended differential motion vector flag
- d-quant
- VS transform flag
- overlapped transform flag
- sync marker flag
- range reduction flag
- max B-frames
- quantizer mode
- post processing flag
- frame counter flag
- pull down flag
- PsF (???)
- Q framerate for post processing
- Q bitrate for post processing
- pan scan flag
- reserved RTM flag
- frame interpolation flag
- range scale Y (comment: scale value times 8)
- range scale UV (comment: scale value times 8)
- number of pan scan windows
- broken link flag
- closed entry flag
- refdist flag (refdist refers to distance to previous reference frame)
- frame user data present flag
- end-of-sequence marker present flag
Component
The SRD defines a component as a single Y, U, or V plane and associates these members with the data structure:
- data: the raw bytes that comprise the Y, U, or V sample data
- bytes/line, a.k.a. stride
Field
This data structure defines all of the parameters that pertain to a particular field:
- picture type
- conditional overlap filter mode
- quantization mode
- motion vector mode
- motion vector range
- block transform type
- post processing flag
- extended X differential motion vector range
- extended Y differential motion vector range
- number of reference fields (either 1 or 2)
- reference field (either last or last-but-one)
- motion vector VLC table (0..7)
- MB mode VLC table (0..7)
- block pattern 2 motion vector table (0..3)
- block pattern 4 motion vector table (0..3)
- inter-coded block coding pattern VLC table (0..3)
- AC coding set to use for intra-coded Y blocks (0..2)
- AC coding set to use for all inter-coded blocks or U and V intra (0..2)
- DC coding set (0..1)
- rows per slice (0 = no slicing used)
Picture
The picture is the fundamental data unit in the VC-1 coding scheme. A picture can be one of the following things:
- a progressive frame
- an interlaced top field
- an interlaced bottom field
- an interlaced frame
The SRD maintains the following information about a picture:
- frame number (modulo 1<<32)
- picture format
- Y component data structure
- U component data structure
- V component data structure
- 2 field data structures
- picture resolution index
- top field first flag
- repeat first field flag
- range reduction used flag
- frame interpolation hint flag
- chrominance sample format flag
- repeat frame count
- pan scan parameters data structure
- post processing mode
Scale Motion Vectors
This data structure contains information about scaling motion vectors for interlaced frames:
- scale (comment: down scale factor * 256)
- scale 1 (comment: up scale factor * 256) if in zone 1
- scale 2 (comment: down scale factor * 256) if not in zone 1
- zone 1 X size
- zone 1 Y size
- zone 1 X offset
- zone 1 Y offset
- flag indicating scaling up or down for opposite
- flag indicating top or bottom field
- motion vector range
- motion vector mode
Interpolation
This data structure contains information to be passed to a bilinear or bicubic interpolation function:
- component data structure
- width of resulting filtered rectangle
- height of resulting filtered rectangle
- flag indicating rounding behavior
Padding Modes
- simple or main profile - pad from macroblock edge
- advanced profile progressive - pad from image edge
- advanced profile interlaced field padding
Rectangle
Nothing complicated about this data structure-- it's just 2 (X, Y) coordinate pairs specifying the upper-left and lower-right corners of a rectange.
Image Position
This data structure contains rectangles to control padding and cropping:
- total width of buffer
- total height of buffer
- image rectangle in pels relative to buffer origin
- rectangle to pad outwards from in pels relative to buffer origin
- rectangle limits to pad outwards to in pels relative to buffer origin
Reference Picture
This data structure contains all of the information to comprise a reference picture (I-frame).
- valid flag
- broken link flag-- reference is not longer available due to a broken link
- parameter indicating whether top field, bottom field, or both are padded
- range Y scale (comment: Y scaling factor times 8)
- range UV scale (comment: UV scaling factor times 8)
- number of frames between this and the last reference frame)
- frame number modulo (1<<32)
- top field first flag
- repeat first field flag
- PsF (???)
- pan and scan parameters data structure
- frame interpolation hint flag (comment indicates it is not used in decoding process)
- chrominance plane sampling mode, pertains to interlaced modes
- repeated frame count
- post processing mode
- coded width
- coded height
- max coded width
- max coded height
- picture format
- 2 motion vector ranges, 1 for each field
- 2 picture types, 1 for each field
- padding mode
- picture resolution scaling mode
- Y component data structure
- U component data structure
- V component data structure
- pointer to Y data top-left corner
- pointer to U data top-left corner
- pointer to V data top-left corner
- image position data structure indicating position of Y samples in image buffer
- image position data structure indicating position of C samples in image buffer
Level Limits
This data structure hold information about various limits at each profile and level.
- max macroblocks per second
- max macroblocks per frame
- max peak transmission rate in kilobits per second
- max buffer size in multiples of 16 kilobits
- motion vector range allowed
Position
This data structure describes the current macroblock being processed:
- picture type
- picture format
- profile
- motion vector mode
- motion vector range
- flag indicating top vs. bottom field
- flag indicating first vs. second field
- pointer to the current macroblock data structure
- pointer to the start of the macroblock circular data structure
- pointer to the current position in the motion vector history buffer
- circular buffer size in macroblocks
- X macroblock offset in current slice
- Y macroblock offset in current slice
- Y macroblock offset of slice in picture
- width in macroblocks of coded picture
- height in macroblocks of codec picture
- coded width
- coded height
- max coded width
- max coded height
- picture quantizer (PQUANT)
- B-fraction syntax element
- number of reference fields, minus 1
- reference field when previous field is 0
- bias to add to intra blocks after transform
- Y scaling factor (times 8)
- UV scaling factor (times 8)
- fast chrominance motion compensation flag
- picture resolution scale mode
- reference picture data structure: old I/P
- reference picture data structure: new/current I/P
- reference picture data structure: reconstructed B picture
- reference picture data structure: backup copy of reference before IC applied
- 2 scale motion vector data structures (1 forward, 1 backward)
- 6 64-element arrays for rescontructing samples
Bitstream
The SRD maintains a typical bitstream data structure. It simply treats a bytestream and a sequence of bits to be read from left -> right.
VLC
The SRD uses a simple and highly inefficient VLC lookup mechanism. A table of VLCs consists of these data structures:
- the bit pattern of the VLC
- the number of bits in the VLC
- the number that the VLC represents
The first entry of a VLC table has the following meaning:
- bits = 0
- length = number of codes in table
- maximun VLC code length
The SRD's VLC reading function marches through each entry in a table, sequentially, until it finds a bit/length pattern that matches the bits at the current position in the bitstream.
Bitplane
A bitplane data structure is used for representing a series of bit values which represent properties of the macroblocks in a picture. The data structure has the following properties:
Picture Layer Parameters
The SRD maintains the following information for picture layer parameters:
- frame count
- 2 picture types (per field?)
- buffer fullness
- pq index
- per-picture quantizer mode
- PQUANT
- half q step
- frame transform AC coding set index
- frame transform AC coding set index 2
- intra transform DC table flag
- temporal reference frame counter
- top field first flag
- repeat first field flag
- U&V sample mode flag
- post processing mode
- quantization step size
- ALTPQUANT
- interpolation data structure
- pointer to selected motion vector VLC table
- pointer to selected coded block patterm VLC table
- transform type flag
- repeat frame count
- frame interpolation hint
- overlapping filter mode
- pan scan parameters data structure
- dquant frame flag (comment: per MB quant mode)
- bitplane for AC prediction
- bitplane for MB skip
- bitplane for MV type
- bitplane for Direct MB
- bitplane for overlap flags
- bitplane for Forward MB
- bitplane for FieldTX MB
- extend horizontal differential MV flag
- extend vertical differential MV flag
- pointer to selected VLC table for macroblock modes
- pointer to selected VLC table for macroblock 4 motion vector block pattern table
- pointer to selected VLC table for macroblock 2 motion vector block pattern table
- 2 intensity compensation data structures, for top and bottom fields
State
The SRD maintains the following information about the overall state:
- macroblock position data structure
- picture data structure
- current frame number
- pointer to macroblock data structure
- number of fields per frame
- maximum number of macroblocks per frame
- pointer to the level limits for the combination of profile & level
- sequence layer data structure
- picture layer parameters data structure
- "1 if not first mode 3 escape in frame"
- "Level code size for mode 3 escape, per frame"
- "Run code size for mode 3 escape, per frmae"
- zig zag table index
- flag indicating if frame is first in stream
- flag indicating whether bitplane coding is in use
- number of first coded block in current macroblock
- reference picture data structure-- this is the where the current frame will be decoded
- motion vector history buffer
- number of fields present in the current picture
Decoder Configuration
The SRD maintains the following information when configuring the decoder:
- max coded width
- max coded height
- highest profile supported by the decoder
- highest level supported by the decoder
- framerate numerator
- framerate denominator