VC-1 Data Structures

From MultimediaWiki
Jump to navigation Jump to search

Part of Understanding VC-1

This page is a discussion of the various data structures and constant enumerations employed in the SMPTE Reference Decoder (SRD) VC-1 reference implementation. These are mostly found in the file vc1types.h.

Macroblocks, Blocks, and Sub-blocks

A macroblock embodies a 16x16 block of pixels in a YUV 4:2:0 colorspace. A macroblock consists of 6 blocks: 4 8x8 Y blocks, 1 U block, and 1 V block. Further, these blocks maybe divided into sub-blocks of sizes 8x4, 4x8, or 4x4.

Intra Block

An intra block data structure requires the following information:

  • number of non-zero AC coefficients
  • quantized DC coefficient for prediction
  • 7 quantized AC coefficients along the top row of the block, used for prediction
  • 7 quantized AC coefficients along the left side of the block, used for prediction
  • 16 samples representing the bottom 2 8-pixel rows of the block, maintained for overlap smoothing filters

Motion Vector

This is the information maintained for an individual motion vector:

  • X offset
  • Y offset
  • a flag indicating whether the vector pertains to the top or bottom field of interlaced video

The (X, Y) vector is relative to the top-left coordinate of a block. The fractional pel resolution of the vectors depends on the motion vector coding mode.

Motion

This structure encapsulates motion vectors along with mode information.

  • prediction mode, as enumerated in Hybrid Prediction Modes
  • motion vector data structure
  • differential motion vector data structure, represented in quarter-pel units

The SRD contains the following not accompanying this data structure:

/*
 * If Two Reference Images (NUMREF=1) then:
 *      Y=2*(YValue)+PredFlag
 *      PredFlag: 0=dominant 1=non-dominant
*/

Motion Vector History

This data structure consists of an array of 4 motion vector data structures which stores the motion vector history for 4 Y blocks, used for direct mode.

Inter Block

An intra block data structure requires the following information:

  • an array of 4 integers representing the number of non-zero coefficients (DC and AC together) for up to 4 sub-blocks in the block
  • 2 motion structures, 1 for backward prediction and 1 for forward prediction

Block

This is the information that the SRD maintains for an individual block:

  • block type, as defined in Block Types section
  • a flag indicating whether there are non-zero AC coefficients for an intra block, or non-zero DC or AC coefficients for an inter block
  • a C union that contains either an intra or inter block data structure

Quantizer

This data structure maintains information about quantization parameters.

  • quantizer step, range 0..31
  • quantizer half-step, either 0 or 1
  • a flag indicating whether the quantization is uniform or non-uniform

Macroblock Properties

These properties are associated with various macroblock coding options:

  • a block is intra-coded, or has 1, 2, or 4 motion vectors associated
  • a block predicts backwards from a previous frame, from a forward frame, from both a backward or forward frame, or uses "direct" mode
  • a flag that indicates whether MVs apply to interlaced fields
  • a flag that indicates that "bottom field different direction to top"
  • a flag that indicates "field transform"

Macroblock

The SRD maintains the following information about an individual macroblock:

  • macroblock type: this is a combination of properties enumerated in Macroblock Properties
  • AC prediction status as enumerated in AC Prediction
  • block type as enumerated in Block Types; might be "any"
  • a flag indicating whether the overlap filter is active for this macroblock
  • a flag indicating whether is motion predicted only (presumably, this means that a predicted block is formed by no other transform/addition occurs for the MB)
  • a 6-bit flag vector that indicates which of the 6 constituent block in the MB are coded
  • a 4-bit flag vector that indicates motion vector block pattern-- these flags are set if the differential MV for a respective Y block is not 0
  • quantizer data structure for the macroblock
  • 6 block structures

Alternate Description

SRD maintains the following information about each macroblock:

 macroblock type, contains the following attributes:
   (attribute 1)
   intra
   1 MV
   2 MV
   4 MV

   (attribute 2)
   direct macroblock
   forward prediction
   backward prediction
   forward and backward prediction

   (attribute 3)
   MVs apply to fields
   bottom different than top
   field transform

 AC prediction status, one of the following attributes:
   AC prediction off
   AC prediction on
   no blocks can be predicted

 Block type, one of the following types:
   8x8 inter-coded
   8x4 inter-coded
   4x8 inter-coded
   4x4 inter-coded
   intra-coded, no AC prediction
   intra-coded, AC prediction from top values
   intra-coded, AC prediction from left values

 flag indicating whether overlap filter is active for this macroblock
 flag indicating whether macroblock is motion predicted only (no residual)
 byte indicating coded block pattern which indicates which of the 6
   sub-blocks are coded

 Quantizer information, this includes:
   quantizer step in the range 1..31
   quantizer half step, either 0 or 1
   flag indicating uniform or non-uniform quantizer

 Information for each of the 6 constituent blocks in the macroblock:
   block type (same choices as the macroblock attributes)
   flag indicating non-zero AC coeffs for intra, non-zero AC/DC for inter
   union between an intra block structure and an inter block structure
   intra structure:
     number of zero/non-zero AC coeffs
     quantized DC coeff
     quantized AC top row for prediction (7 values)
     quantized AC left column for prediction (7 values)
     bottom 2 rows (16 values) kept for overlap smoothing
   inter structure:
     number of zero/non-zero AC coeffs for 4 sub-blocks (Y blocks?)
     forward and backward motion vector structures, each includes:
       hybrid prediction mode, one of the following attributes:
         predict from left
         predict from top
         no hybrid prediction
       (x,y) motion vectors for each of the 4 Y blocks
       (x,y) differential motion vectors in 1/4 pel units
 (note: I am a little confused as to why each of the 6 sub-blocks stores the motion vector data for the entire macroblock)

Intensity Compensation

The SRD used the following data structure to maintain information about intensity compensation:

  • a flag to indicate whether intensity compensation is enabled
  • IC scale
  • IC shift

B Fraction

This data structure tracks B-fraction data:

  • B-fraction numerator
  • B-fraction denominator
  • scalefactor: approximate numerator * 256 / denominator

Hypothetical Reference Decoder

The SRD contains data structures pertaining to a hypothetical reference decoder involving a leaky bucket algorithm.

Pan Scan Window

This data structure pertains to pan and scan windows:

  • horizontal offset in pixels
  • vertical offset in pixels
  • width in pixels
  • height in pixels

Pan Scan Parameters

This data structure contains a flag indicating whether pan and scan is present, and an array of 3 Pan Scan Window data structures (3 is the maximum supported).

Sequence And Layer Parameters

The SRD maintains all of the information for the sequence layer:

  • profile (simple, main, or advanced)
  • maximum coded width
  • maximum coded height
  • coded width
  • coded height
  • display width
  • display height
  • aspect width
  • aspect height
  • profile level
  • interface flag
  • frame rate numerator (0 unless otherwise specified)
  • frame rate denominator (the comments list 1000, 1001, and 32 as valid values, and claim that it is 0 if not specified, which sounds potentially problematic)
  • color format indicator flag
  • chroma format
  • color primaries
  • transfer characteristics
  • matrix coefficients
  • hypothetical reference decoder
  • loop filter flag
  • multi resolution coding flag
  • fast chrominance motion compensation flag
  • extended motion vector flag
  • extended differential motion vector flag
  • d-quant
  • variable size transform flag (VSTransform)
  • overlapped transform flag
  • sync marker flag
  • range reduction flag
  • max B-frames
  • quantizer mode
  • post processing flag
  • frame counter flag
  • pull down flag
  • PsF (???)
  • Q framerate for post processing
  • Q bitrate for post processing
  • pan scan flag
  • reserved RTM flag
  • frame interpolation flag
  • range scale Y (comment: scale value times 8)
  • range scale UV (comment: scale value times 8)
  • number of pan scan windows
  • broken link flag
  • closed entry flag
  • refdist flag (refdist refers to distance to previous reference frame)
  • frame user data present flag
  • end-of-sequence marker present flag

Component

The SRD defines a component as a single Y, U, or V plane and associates these members with the data structure:

  • data: the raw bytes that comprise the Y, U, or V sample data
  • bytes/line, a.k.a. stride

Field

This data structure defines all of the parameters that pertain to a particular field:

  • picture type
  • conditional overlap filter mode
  • quantization mode
  • motion vector mode
  • motion vector range
  • block transform type
  • post processing flag
  • extended X differential motion vector range
  • extended Y differential motion vector range
  • number of reference fields (either 1 or 2)
  • reference field (either last or last-but-one)
  • motion vector VLC table (0..7)
  • MB mode VLC table (0..7)
  • block pattern 2 motion vector table (0..3)
  • block pattern 4 motion vector table (0..3)
  • inter-coded block coding pattern VLC table (0..3)
  • AC coding set to use for intra-coded Y blocks (0..2)
  • AC coding set to use for all inter-coded blocks or U and V intra (0..2)
  • DC coding set (0..1)
  • rows per slice (0 = no slicing used)

Picture

The picture is the fundamental data unit in the VC-1 coding scheme. A picture can be one of the following things:

  • a progressive frame
  • an interlaced top field
  • an interlaced bottom field
  • an interlaced frame

The SRD maintains the following information about a picture:

  • frame number (modulo 1<<32)
  • picture format
  • Y component data structure
  • U component data structure
  • V component data structure
  • 2 field data structures
  • picture resolution index
  • top field first flag
  • repeat first field flag
  • range reduction used flag
  • frame interpolation hint flag
  • chrominance sample format flag
  • repeat frame count
  • pan scan parameters data structure
  • post processing mode

Scale Motion Vectors

This data structure contains information about scaling motion vectors for interlaced frames:

  • scale (comment: down scale factor * 256)
  • scale 1 (comment: up scale factor * 256) if in zone 1
  • scale 2 (comment: down scale factor * 256) if not in zone 1
  • zone 1 X size
  • zone 1 Y size
  • zone 1 X offset
  • zone 1 Y offset
  • flag indicating scaling up or down for opposite
  • flag indicating top or bottom field
  • motion vector range
  • motion vector mode

Interpolation

This data structure contains information to be passed to a bilinear or bicubic interpolation function:

  • component data structure
  • width of resulting filtered rectangle
  • height of resulting filtered rectangle
  • flag indicating rounding behavior

Padding Modes

  • simple or main profile - pad from macroblock edge
  • advanced profile progressive - pad from image edge
  • advanced profile interlaced field padding

Rectangle

Nothing complicated about this data structure-- it's just 2 (X, Y) coordinate pairs specifying the upper-left and lower-right corners of a rectange.

Image Position

This data structure contains rectangles to control padding and cropping:

  • total width of buffer
  • total height of buffer
  • image rectangle in pels relative to buffer origin
  • rectangle to pad outwards from in pels relative to buffer origin
  • rectangle limits to pad outwards to in pels relative to buffer origin

Reference Picture

This data structure contains all of the information to comprise a reference picture (I-frame).

  • valid flag
  • broken link flag-- reference is not longer available due to a broken link
  • parameter indicating whether top field, bottom field, or both are padded
  • range Y scale (comment: Y scaling factor times 8)
  • range UV scale (comment: UV scaling factor times 8)
  • number of frames between this and the last reference frame)
  • frame number modulo (1<<32)
  • top field first flag
  • repeat first field flag
  • PsF (???)
  • pan and scan parameters data structure
  • frame interpolation hint flag (comment indicates it is not used in decoding process)
  • chrominance plane sampling mode, pertains to interlaced modes
  • repeated frame count
  • post processing mode
  • coded width
  • coded height
  • max coded width
  • max coded height
  • picture format
  • 2 motion vector ranges, 1 for each field
  • 2 picture types, 1 for each field
  • padding mode
  • picture resolution scaling mode
  • Y component data structure
  • U component data structure
  • V component data structure
  • pointer to Y data top-left corner
  • pointer to U data top-left corner
  • pointer to V data top-left corner
  • image position data structure indicating position of Y samples in image buffer
  • image position data structure indicating position of C samples in image buffer

Level Limits

This data structure hold information about various limits at each profile and level.

  • max macroblocks per second
  • max macroblocks per frame
  • max peak transmission rate in kilobits per second
  • max buffer size in multiples of 16 kilobits
  • motion vector range allowed

Position

This data structure describes the current macroblock being processed:

  • picture type
  • picture format
  • profile
  • motion vector mode
  • motion vector range
  • flag indicating top vs. bottom field
  • flag indicating first vs. second field
  • pointer to the current macroblock data structure
  • pointer to the start of the macroblock circular data structure
  • pointer to the current position in the motion vector history buffer
  • circular buffer size in macroblocks
  • X macroblock offset in current slice
  • Y macroblock offset in current slice
  • Y macroblock offset of slice in picture
  • width in macroblocks of coded picture
  • height in macroblocks of codec picture
  • coded width
  • coded height
  • max coded width
  • max coded height
  • picture quantizer (PQUANT)
  • B-fraction syntax element
  • number of reference fields, minus 1
  • reference field when previous field is 0
  • bias to add to intra blocks after transform
  • Y scaling factor (times 8)
  • UV scaling factor (times 8)
  • fast chrominance motion compensation flag
  • picture resolution scale mode
  • reference picture data structure: old I/P
  • reference picture data structure: new/current I/P
  • reference picture data structure: reconstructed B picture
  • reference picture data structure: backup copy of reference before IC applied
  • 2 scale motion vector data structures (1 forward, 1 backward)
  • 6 64-element arrays for rescontructing samples

Bitstream

The SRD maintains a typical bitstream data structure. It simply treats a bytestream and a sequence of bits to be read from left -> right.

VLC

The SRD uses a simple and highly inefficient VLC lookup mechanism. A table of VLCs consists of these data structures:

  • the bit pattern of the VLC
  • the number of bits in the VLC
  • the number that the VLC represents

The first entry of a VLC table has the following meaning:

  • bits = 0
  • length = number of codes in table
  • maximun VLC code length

The SRD's VLC reading function marches through each entry in a table, sequentially, until it finds a bit/length pattern that matches the bits at the current position in the bitstream.

Bitplane

A bitplane data structure is used for representing a series of bit values which represent properties of the macroblocks in a picture. The data structure has the following properties:

Picture Layer Parameters

The SRD maintains the following information for picture layer parameters:

  • frame count
  • 2 picture types (per field?)
  • buffer fullness
  • pq index
  • per-picture quantizer mode
  • PQUANT
  • half q step
  • frame transform AC coding set index
  • frame transform AC coding set index 2
  • intra transform DC table flag
  • temporal reference frame counter
  • top field first flag
  • repeat first field flag
  • U&V sample mode flag
  • post processing mode
  • quantization step size
  • ALTPQUANT
  • interpolation data structure
  • pointer to selected motion vector VLC table
  • pointer to selected coded block patterm VLC table
  • transform type flag
  • repeat frame count
  • frame interpolation hint
  • overlapping filter mode
  • pan scan parameters data structure
  • dquant frame flag (comment: per MB quant mode)
  • bitplane for AC prediction
  • bitplane for MB skip
  • bitplane for MV type
  • bitplane for Direct MB
  • bitplane for overlap flags
  • bitplane for Forward MB
  • bitplane for FieldTX MB
  • extend horizontal differential MV flag
  • extend vertical differential MV flag
  • pointer to selected VLC table for macroblock modes
  • pointer to selected VLC table for macroblock 4 motion vector block pattern table
  • pointer to selected VLC table for macroblock 2 motion vector block pattern table
  • 2 intensity compensation data structures, for top and bottom fields

State

The SRD maintains the following information about the overall state:

  • macroblock position data structure
  • picture data structure
  • current frame number
  • pointer to macroblock data structure
  • number of fields per frame
  • maximum number of macroblocks per frame
  • pointer to the level limits for the combination of profile & level
  • sequence layer data structure
  • picture layer parameters data structure
  • "1 if not first mode 3 escape in frame"
  • "Level code size for mode 3 escape, per frame"
  • "Run code size for mode 3 escape, per frmae"
  • zig zag table index
  • flag indicating if frame is first in stream
  • flag indicating whether bitplane coding is in use
  • number of first coded block in current macroblock
  • reference picture data structure-- this is the where the current frame will be decoded
  • motion vector history buffer
  • number of fields present in the current picture

Decoder Configuration

The SRD maintains the following information when configuring the decoder:

  • max coded width
  • max coded height
  • highest profile supported by the decoder
  • highest level supported by the decoder
  • framerate numerator
  • framerate denominator

Run Level Data Structure

SRD: vc1DEC3DH_sRunLevel; this data structure ties together 3 bytes for run, level, and last triple, and is meant to be used as an array.

AC Coding Set

SRD: vc1DEC3DH_sACCodingSet

The SRD defines the following fields for an AC coding set:

  • VLC table (vc1DEC_sVLCCode data type)
  • run-level-last (RLL) table (vc1DEC3DH_sRunLevel data type)
  • delta level byte array
  • delta run byte array
  • delta level last byte array
  • delta run last byte array