Difference between revisions of "VC-1 Data Structures"

From MultimediaWiki
Jump to navigation Jump to search
(and yet more data structures)
(split the enumerations into a separate page)
Line 5: Line 5:
== Macroblocks, Blocks, and Sub-blocks ==
== Macroblocks, Blocks, and Sub-blocks ==
A macroblock embodies a 16x16 block of pixels in a [[YUV 4:2:0]] colorspace. A macroblock consists of 6 blocks: 4 8x8 Y blocks, 1 U block, and 1 V block. Further, these blocks maybe divided into sub-blocks of sizes 8x4, 4x8, or 4x4.
A macroblock embodies a 16x16 block of pixels in a [[YUV 4:2:0]] colorspace. A macroblock consists of 6 blocks: 4 8x8 Y blocks, 1 U block, and 1 V block. Further, these blocks maybe divided into sub-blocks of sizes 8x4, 4x8, or 4x4.
== Block Types ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
0: 8x8 inter-coded block
1: 8x4 inter-coded block
2: 4x8 inter-coded block
3: 4x4 inter-coded block
4: transform type not yet determined
5: intra-coded block, no AC prediction
6: intra-coded block, AC prediction of top row coefficients
7: intra-coded block, AC prediction of left column coefficients


== Intra Block ==
== Intra Block ==
Line 24: Line 13:
* 7 quantized AC coefficients along the left side of the block, used for prediction
* 7 quantized AC coefficients along the left side of the block, used for prediction
* 16 samples representing the bottom 2 8-pixel rows of the block, maintained for overlap smoothing filters
* 16 samples representing the bottom 2 8-pixel rows of the block, maintained for overlap smoothing filters
== Hybrid Prediction Modes ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to independent implementation.
* 0: predict from left
* 1: predict from top
* 2: no hybrid prediction
== Sub-block Patterns ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to independent implementation.
* 0: 8x8 transform, coded
* 1: 8x4 transform, bottom subblock coded
* 2: 8x4 transform, top subblock coded
* 3: 8x4 transform, both subblocks coded
* 4: 4x8 transform, right subblock coded
* 5: 4x8 transform, left subblock coded
* 6: 4x8 transform, both subblocks coded
* 7: 4x4 transform, subblock pattern separate
* 8: 8x8 transform, coded, whole MB
* 9: 8x4 transform, bottom subblock coded, whole MB
* 10: 8x4 transform, top subblock coded, whole MB
* 11: 8x4 transform, both subblocks coded, whole MB
* 12: 4x8 transform, right subblock coded, whole MB
* 13: 4x8 transform, left subblock coded, whole MB
* 14: 4x8 transform, both subblocks coded, whole MB
* 15: 4x4 transform, subblocks pattern separate, whole MB
One more stray constant:
* 8: MB level threshold


== Motion Vector ==
== Motion Vector ==
Line 98: Line 60:
* a flag that indicates that "bottom field different direction to top"
* a flag that indicates that "bottom field different direction to top"
* a flag that indicates "field transform"
* a flag that indicates "field transform"
== AC Prediction ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to independent implementation.
* 0: AC prediction off
* 1: AC prediction on
* 2: AC prediction absent (no blocks to predict from?)


== Macroblock ==
== Macroblock ==
Line 116: Line 72:
* quantizer data structure for the macroblock
* quantizer data structure for the macroblock
* 6 block structures
* 6 block structures
== Picture Format ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
* 0: picture is a progressive frame
* 1: picture is an interlaced frame
* 2: picture consists of 2 interlaced fields
* 3: picture format has not been determined
== Bitstream Profile ==
These are the supported profiles in the VC-1 coding scheme.
* 0: simple profile
* 1: main profile
* 2: reserved
* 3: advanced profile
== Profile Level Enumeration ==
* simple/main profiles:
** 0: low
** 1: medium
** 2: high
* advanced profile:
** 0..4: levels 0..4
* levels 5..7 are reserved
* 255 indicates that the level is unknown
== Chroma Format ==
The SRD only supports one chroma format: [[YUV 4:2:0]], which is format 1. Ostensibly, there are 2 bits in the bitstream to define chroma format. Modes 0, 2, and 3 are all reserved.
== Color Primaries ==
* 0: color primaries are forbidden
* 1: ITU-R BT-709
* 2: unspecified
* 3: reserved
* 4: reserved
* 5: EBU Tech 3213
* 6: SMPTE C
* 7-255: reserved
== Transfer Characteristics ==
These properties are encoded into the bitstream and describe the characteristics of the source bitstream:
* 0: forbidden
* 1: ITU-R BT-709
* 2: unspecified
* 3: reserved
* 4: reserved
* 5: reserved
* 6: reserved
* 7: SMPTE 240M
* 8-255: reserved
== Matrix Coefficients ==
* 0: forbidden
* 1: ITU-R BT-709
* 2: unspecified
* 3: reserved
* 4: reserved
* 5: reserved
* 6: SMPTE 170M
* 7: SMPTE 240M
* 8-255: reserved
== Quantizer Modes ==
* 0: quantizer implied by quantizer step size
* 1: quantizer explicitly signaled
* 2: non-uniform quantizer
* 3: uniform quantizer
== Picture Types ==
* 0: I-frame-- intraframe/field
* 1: P-frame-- predicted frame/field
* 2: B-frame-- bi-directionally predicted frame/field
* 3: BI-frame-- ??? perhaps an I-frame upon which no other frames depend
* 4: skipped
== Scaling Modes ==
This enumeration defines whether there will be any scaling in the picture before display:
* 0: 1x1 = no scaling
* 1: 2x1 = horizontal scaling
* 2: 1x2 = vertical scaling
* 3: 2x2 = horizontal and vertical scaling
== Motion Vector Ranges ==
* Range #0:
** x component range = -64..63
** y component range = -32..31
* Range #1:
** x component range = -128..127
** y component range = -64..63
* Range #2:
** x component range = -512..511
** y component range = -128..127
* Range #3:
** x component range = -1024..1023
** y component range = -256..255
== Differential Motion Vector Ranges ==
* 0: no extended DMV
* 1: extended DMV horizontal/X
* 2: extended DMV vertical/Y
* 3: extended DMV horizontal & vertical
== Macroblock Quantizer Step Sizes ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
* 0: all macroblocks use PQUANT
* 1: edge MBs use ALTPQUANT
* 2: left/top MBs use ALTPQUANT
* 3: top/right MBs use ALTPQUANT
* 4: right/bottom MBs use ALTPQUANT
* 5: bottom/left MBs use ALTPQUANT
* 6: left MBs use ALTPQUANT
* 7: top MBs use ALTPQUANT
* 8: right MBs use ALTPQUANT
* 9: bottom MBs use ALTPQUANT
* 10: PQUANT vs. ALTPQUANT is selected per MB
* 11: quantizer select per MB
== Bitplane Coding Methods ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
* 0: normal-2 method
* 1: normal-6 method
* 2: rowskip method
* 3: colskip method
* 4: diff-2 method
* 5: diff-6 method
* 6: uncompressed
== Overlap Filter Modes ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
* 0: disable overlap filter
* 1: enable overlap filter for all macroblocks
* 2: overlap filter is enabled for select macroblocks
== Motion Vector Modes ==
Note: enumerated numbers are defined in the SRD; specific numbers may or may not be relevant to an independent implementation.
* 0: 1 motion vector, half-pel, bilinear interpolation
* 1: 1 motion vector, half-pel, bicubic interpolation
* 2: 1 motion vector, quarter-pel, bicubic interpolation
* 3: mixed motion vectors, quarter-pel, bicubic interpolation
* 4: intensity compensation


== Intensity Compensation ==
== Intensity Compensation ==
Line 331: Line 148:
* frame user data present flag
* frame user data present flag
* end-of-sequence marker present flag
* end-of-sequence marker present flag
== Start Codes ==
These are the various start codes that the SRD defines:
* 0x0A: end of sequence
* 0x0B: slice
* 0x0C: field
* 0x0D: frame header
* 0x0E: entry point header
* 0x0F: sequence header
* 0x1B: user-defined slice
* 0x1C: user-defined field
* 0x1D: user-defined frame header
* 0x1E: user-defined entry point header
* 0x1F: user-defined sequence header


== Component ==
== Component ==

Revision as of 08:13, 24 April 2006

Part of Understanding VC-1

This page is a discussion of the various data structures and constant enumerations employed in the SMPTE Reference Decoder (SRD) VC-1 reference implementation. These are mostly found in the file vc1types.h.

Macroblocks, Blocks, and Sub-blocks

A macroblock embodies a 16x16 block of pixels in a YUV 4:2:0 colorspace. A macroblock consists of 6 blocks: 4 8x8 Y blocks, 1 U block, and 1 V block. Further, these blocks maybe divided into sub-blocks of sizes 8x4, 4x8, or 4x4.

Intra Block

An intra block data structure requires the following information:

  • number of non-zero AC coefficients
  • quantized DC coefficient for prediction
  • 7 quantized AC coefficients along the top row of the block, used for prediction
  • 7 quantized AC coefficients along the left side of the block, used for prediction
  • 16 samples representing the bottom 2 8-pixel rows of the block, maintained for overlap smoothing filters

Motion Vector

This is the information maintained for an individual motion vector:

  • X offset
  • Y offset
  • a flag indicating whether the vector pertains to the top or bottom field of interlaced video

The (X, Y) vector is relative to the top-left coordinate of a block. The fractional pel resolution of the vectors depends on the motion vector coding mode.

Motion

This structure encapsulates motion vectors along with mode information.

  • prediction mode, as enumerated in Hybrid Prediction Modes
  • motion vector data structure
  • differential motion vector data structure, represented in quarter-pel units

The SRD contains the following not accompanying this data structure:

/*
 * If Two Reference Images (NUMREF=1) then:
 *      Y=2*(YValue)+PredFlag
 *      PredFlag: 0=dominant 1=non-dominant
*/

Motion Vector History

This data structure consists of an array of 4 motion vector data structures which stores the motion vector history for 4 Y blocks, used for direct mode.

Inter Block

An intra block data structure requires the following information:

  • an array of 4 integers representing the number of non-zero coefficients (DC and AC together) for up to 4 sub-blocks in the block
  • 2 motion structures, 1 for backward prediction and 1 for forward prediction

Block

This is the information that the SRD maintains for an individual block:

  • block type, as defined in Block Types section
  • a flag indicating whether there are non-zero AC coefficients for an intra block, or non-zero DC or AC coefficients for an inter block
  • a C union that contains either an intra or inter block data structure

Quantizer

This data structure maintains information about quantization parameters.

  • quantizer step, range 0..31
  • quantizer half-step, either 0 or 1
  • a flag indicating whether the quantization is uniform or non-uniform

Macroblock Properties

These properties are associated with various macroblock coding options:

  • a block is intra-coded, or has 1, 2, or 4 motion vectors associated
  • a block predicts backwards from a previous frame, from a forward frame, from both a backward or forward frame, or uses "direct" mode
  • a flag that indicates whether MVs apply to interlaced fields
  • a flag that indicates that "bottom field different direction to top"
  • a flag that indicates "field transform"

Macroblock

The SRD maintains the following information about an individual macroblock:

  • macroblock type: this is a combination of properties enumerated in Macroblock Properties
  • AC prediction status as enumerated in AC Prediction
  • block type as enumerated in Block Types; might be "any"
  • a flag indicating whether the overlap filter is active for this macroblock
  • a flag indicating whether is motion predicted only (presumably, this means that a predicted block is formed by no other transform/addition occurs for the MB)
  • a 6-bit flag vector that indicates which of the 6 constituent block in the MB are coded
  • a 4-bit flag vector that indicates motion vector block pattern-- these flags are set if the differential MV for a respective Y block is not 0
  • quantizer data structure for the macroblock
  • 6 block structures

Intensity Compensation

The SRD used the following data structure to maintain information about intensity compensation:

  • a flag to indicate whether intensity compensation is enabled
  • IC scale
  • IC shift

B Fraction

This data structure tracks B-fraction data:

  • B-fraction numerator
  • B-fraction denominator
  • scalefactor: approximate numerator * 256 / denominator

Hypothetical Reference Decoder

The SRD contains data structures pertaining to a hypothetical reference decoder involving a leaky bucket algorithm.

Pan Scan Window

This data structure pertains to pan and scan windows:

  • horizontal offset in pixels
  • vertical offset in pixels
  • width in pixels
  • height in pixels

Pan Scan Parameters

This data structure contains a flag indicating whether pan and scan is present, and an array of 3 Pan Scan Window data structures (3 is the maximum supported).

Sequence And Layer Parameters

The SRD maintains all of the information for the sequence layer:

  • profile (simple, main, or advanced)
  • maximum coded width
  • maximum coded height
  • coded width
  • coded height
  • display width
  • display height
  • aspect width
  • aspect height
  • profile level
  • interface flag
  • frame rate numerator (0 unless otherwise specified)
  • frame rate denominator (the comments list 1000, 1001, and 32 as valid values, and claim that it is 0 if not specified, which sounds potentially problematic)
  • color format indicator flag
  • chroma format
  • color primaries
  • transfer characteristics
  • matrix coefficients
  • hypothetical reference decoder
  • loop filter flag
  • multi resolution coding flag
  • fast chrominance motion compensation flag
  • extended motion vector flag
  • extended differential motion vector flag
  • d-quant
  • VS transform flag
  • overlapped transform flag
  • sync marker flag
  • range reduction flag
  • max B-frames
  • quantizer mode
  • post processing flag
  • frame counter flag
  • pull down flag
  • PsF (???)
  • Q framerate for post processing
  • Q bitrate for post processing
  • pan scan flag
  • reserved RTM flag
  • frame interpolation flag
  • range scale Y (comment: scale value times 8)
  • range scale UV (comment: scale value times 8)
  • number of pan scan windows
  • broken link flag
  • closed entry flag
  • refdist flag (refdist refers to distance to previous reference frame)
  • frame user data present flag
  • end-of-sequence marker present flag

Component

The SRD defines a component as a single Y, U, or V plane and associates these members with the data structure:

  • data: the raw bytes that comprise the Y, U, or V sample data
  • bytes/line, a.k.a. stride

Field

This data structure defines all of the parameters that pertain to a particular field:

  • picture type
  • conditional overlap filter mode
  • quantization mode
  • motion vector mode
  • motion vector range
  • block transform type
  • post processing flag
  • extended X differential motion vector range
  • extended Y differential motion vector range
  • number of reference fields (either 1 or 2)
  • reference field (either last or last-but-one)
  • motion vector VLC table (0..7)
  • MB mode VLC table (0..7)
  • block pattern 2 motion vector table (0..3)
  • block pattern 4 motion vector table (0..3)
  • inter-coded block coding pattern VLC table (0..3)
  • AC coding set to use for intra-coded Y blocks (0..2)
  • AC coding set to use for all inter-coded blocks or U and V intra (0..2)
  • DC coding set (0..1)
  • rows per slice (0 = no slicing used)

Picture

The picture is the fundamental data unit in the VC-1 coding scheme. A picture can be one of the following things:

  • a progressive frame
  • an interlaced top field
  • an interlaced bottom field
  • an interlaced frame

The SRD maintains the following information about a picture:

  • frame number (modulo 1<<32)
  • picture format
  • Y component data structure
  • U component data structure
  • V component data structure
  • 2 field data structures
  • picture resolution index
  • top field first flag
  • repeat first field flag
  • range reduction used flag
  • frame interpolation hint flag
  • chrominance sample format flag
  • repeat frame count
  • pan scan parameters data structure
  • post processing mode

Scale Motion Vectors

This data structure contains information about scaling motion vectors for interlaced frames:

  • scale (comment: down scale factor * 256)
  • scale 1 (comment: up scale factor * 256) if in zone 1
  • scale 2 (comment: down scale factor * 256) if not in zone 1
  • zone 1 X size
  • zone 1 Y size
  • zone 1 X offset
  • zone 1 Y offset
  • flag indicating scaling up or down for opposite
  • flag indicating top or bottom field
  • motion vector range
  • motion vector mode

Interpolation

This data structure contains information to be passed to a bilinear or bicubic interpolation function:

  • component data structure
  • width of resulting filtered rectangle
  • height of resulting filtered rectangle
  • flag indicating rounding behavior

Padding Modes

  • simple or main profile - pad from macroblock edge
  • advanced profile progressive - pad from image edge
  • advanced profile interlaced field padding

Rectangle

Nothing complicated about this data structure-- it's just 2 (X, Y) coordinate pairs specifying the upper-left and lower-right corners of a rectange.

Image Position

This data structure contains rectangles to control padding and cropping:

  • total width of buffer
  • total height of buffer
  • image rectangle in pels relative to buffer origin
  • rectangle to pad outwards from in pels relative to buffer origin
  • rectangle limits to pad outwards to in pels relative to buffer origin

Reference Picture

This data structure contains all of the information to comprise a reference picture (I-frame).

  • valid flag
  • broken link flag-- reference is not longer available due to a broken link
  • parameter indicating whether top field, bottom field, or both are padded
  • range Y scale (comment: Y scaling factor times 8)
  • range UV scale (comment: UV scaling factor times 8)
  • number of frames between this and the last reference frame)
  • frame number modulo (1<<32)
  • top field first flag
  • repeat first field flag
  • PsF (???)
  • pan and scan parameters data structure
  • frame interpolation hint flag (comment indicates it is not used in decoding process)
  • chrominance plane sampling mode, pertains to interlaced modes
  • repeated frame count
  • post processing mode
  • coded width
  • coded height
  • max coded width
  • max coded height
  • picture format
  • 2 motion vector ranges, 1 for each field
  • 2 picture types, 1 for each field
  • padding mode
  • picture resolution scaling mode
  • Y component data structure
  • U component data structure
  • V component data structure
  • pointer to Y data top-left corner
  • pointer to U data top-left corner
  • pointer to V data top-left corner
  • image position data structure indicating position of Y samples in image buffer
  • image position data structure indicating position of C samples in image buffer

Level Limits

This data structure hold information about various limits at each profile and level.

  • max macroblocks per second
  • max macroblocks per frame
  • max peak transmission rate in kilobits per second
  • max buffer size in multiples of 16 kilobits
  • motion vector range allowed

Position

This data structure describes the current macroblock being processed:

  • picture type
  • picture format
  • profile
  • motion vector mode
  • motion vector range
  • flag indicating top vs. bottom field
  • flag indicating first vs. second field
  • pointer to the current macroblock data structure
  • pointer to the start of the macroblock circular data structure
  • pointer to the current position in the motion vector history buffer
  • circular buffer size in macroblocks
  • X macroblock offset in current slice
  • Y macroblock offset in current slice
  • Y macroblock offset of slice in picture
  • width in macroblocks of coded picture
  • height in macroblocks of codec picture
  • coded width
  • coded height
  • max coded width
  • max coded height
  • picture quantizer (PQUANT)
  • B-fraction syntax element
  • number of reference fields, minus 1
  • reference field when previous field is 0
  • bias to add to intra blocks after transform
  • Y scaling factor (times 8)
  • UV scaling factor (times 8)
  • fast chrominance motion compensation flag
  • picture resolution scale mode
  • reference picture data structure: old I/P
  • reference picture data structure: new/current I/P
  • reference picture data structure: reconstructed B picture
  • reference picture data structure: backup copy of reference before IC applied
  • 2 scale motion vector data structures (1 forward, 1 backward)
  • 6 64-element arrays for rescontructing samples

Bitstream

The SRD maintains a typical bitstream data structure. It simply treats a bytestream and a sequence of bits to be read from left -> right.

VLC

The SRD uses a simple and highly inefficient VLC lookup mechanism. A table of VLCs consists of these data structures:

  • the bit pattern of the VLC
  • the number of bits in the VLC
  • the number that the VLC represents

The first entry of a VLC table has the following meaning:

  • bits = 0
  • length = number of codes in table
  • maximun VLC code length

The SRD's VLC reading function marches through each entry in a table, sequentially, until it finds a bit/length pattern that matches the bits at the current position in the bitstream.

Bitplane

A bitplane data structure is used for representing a series of bit values which represent properties of the macroblocks in a picture. The data structure has the following properties:

Picture Layer Parameters

The SRD maintains the following information for picture layer parameters:

  • frame count
  • 2 picture types (per field?)
  • buffer fullness
  • pq index
  • per-picture quantizer mode
  • PQUANT
  • half q step
  • frame transform AC coding set index
  • frame transform AC coding set index 2
  • intra transform DC table flag
  • temporal reference frame counter
  • top field first flag
  • repeat first field flag
  • U&V sample mode flag
  • post processing mode
  • quantization step size
  • ALTPQUANT
  • interpolation data structure
  • pointer to selected motion vector VLC table
  • pointer to selected coded block patterm VLC table
  • transform type flag
  • repeat frame count
  • frame interpolation hint
  • overlapping filter mode
  • pan scan parameters data structure
  • dquant frame flag (comment: per MB quant mode)
  • bitplane for AC prediction
  • bitplane for MB skip
  • bitplane for MV type
  • bitplane for Direct MB
  • bitplane for overlap flags
  • bitplane for Forward MB
  • bitplane for FieldTX MB
  • extend horizontal differential MV flag
  • extend vertical differential MV flag
  • pointer to selected VLC table for macroblock modes
  • pointer to selected VLC table for macroblock 4 motion vector block pattern table
  • pointer to selected VLC table for macroblock 2 motion vector block pattern table
  • 2 intensity compensation data structures, for top and bottom fields

State

The SRD maintains the following information about the overall state:

  • macroblock position data structure
  • picture data structure
  • current frame number
  • pointer to macroblock data structure
  • number of fields per frame
  • maximum number of macroblocks per frame
  • pointer to the level limits for the combination of profile & level
  • sequence layer data structure
  • picture layer parameters data structure
  • "1 if not first mode 3 escape in frame"
  • "Level code size for mode 3 escape, per frame"
  • "Run code size for mode 3 escape, per frmae"
  • zig zag table index
  • flag indicating if frame is first in stream
  • flag indicating whether bitplane coding is in use
  • number of first coded block in current macroblock
  • reference picture data structure-- this is the where the current frame will be decoded
  • motion vector history buffer
  • number of fields present in the current picture

Decoder Configuration

The SRD maintains the following information when configuring the decoder:

  • max coded width
  • max coded height
  • highest profile supported by the decoder
  • highest level supported by the decoder
  • framerate numerator
  • framerate denominator