VC-1

From MultimediaWiki
Jump to navigation Jump to search

VC-1 is a video coding standard developed by Microsoft. It began as Windows Media Video 9. It is prevalent in ASF files downloaded from the internet. It is also supposed to be used on HD-DVDs.

See Understanding VC-1 for more information about the technical details of the format.

Encapsulation

Most commonly, VC-1 data is found inside of Microsoft ASF files and identified with the FourCC 'WMV3' for VC-1 simple and main profile and FourCC 'WVC1' for advanced profile. Note that the FourCC 'WMV9' may not actually exist in the wild but the acronym gained prominence anyway due to the fact that this video codec was introduced as part of the Windows Media 9 tool suite. VC-1 video will probably be encapsulated in other types of containers and stream formats such as MPEG for HD-DVD transport.

Profiles And Levels

This table is cribbed wholesale from http://www.microsoft.com/windows/windowsmedia/forpros/events/NAB2005/VC-1.aspx

VC-1 has 3 profiles: simple, main, and advanced. Each has various levels. The combinations of profiles and levels represent trade-offs between encoding/decoding complexity, compression quality, and compressed image size.

Profile Level Maximum Bit Rate Representative Resolutions by Frame Rate (Format)
Simple Low 96 kilobits per second (Kbps) 176 x 144 @ 15 Hz (QCIF)
Medium 384 Kbps 240 x 176 @ 30 Hz
352 x 288 @ 15 Hz (CIF)
Main Low 2 megabits per second (Mbps) 320 x 240 @ 24 Hz (QVGA)
Medium 10 Mbps 720 x 480 @ 30 Hz (480p)
720 x 576 @ 25 Hz (576p)
High 20 Mbps 1920 x 1080 @ 30 Hz (1080p)
Advanced L0 2 Mbps 352 x 288 @ 30 Hz (CIF)
L1 10 Mbps 720 x 480 @ 30 Hz (NTSC-SD)
720 x 576 @ 25 Hz (PAL-SD)
L2 20 Mbps 720 x 480 @ 60 Hz (480p)
1280 x 720 @ 30 Hz (720p)
L3 45 Mbps 1920 x 1080 @ 24 Hz (1080p)
1920 x 1080 @ 30 Hz (1080i)
1280 x 720 @ 60 Hz (720p)
L4 135 Mbps 1920 x 1080 @ 60 Hz (1080p)
2048 x 1536 @ 24 Hz

Coding Concepts

Colorspace

VC-1 codes a sequence of images in the YUV 4:2:0 colorspace.

Macroblocks, Blocks, and Sub-blocks

When VC-1 codes an image, it divides the image into macroblocks. Each 16x16 macroblock is comprised of 6 8x8 sample blocks (4 Y blocks, 1 U block, and 1 V block). Further, the coding method may divide an individual 8x8 block into 2 8x4 blocks, 2 4x8 blocks, or 4 4x4 blocks.

Transform Coding

VC-1 uses a variation of the Discrete Cosine Transform to convert blocks of samples into a transform domain to facilitate more efficient coding. The transform may operate on the full 8x8 block or any of the 3 supported sub-block sizes (8x4, 4x8, or 4x4). Unlike many codec standards preceding VC-1, the specification defines a bit-accurate transform method that all implementations are expected to conform to so as to minimize transform error.

Zigzag

After tranforming sample data into the transform domain, VC-1 reorders the transformed data in a zigzag pattern which makes certain successive coding techniques more effective. VC-1 has 13 different zigzag patterns depending on various parameters (block size, interlacing, prediction mode and intra/inter).

Quantization

Quantization is the compression step that potentially loses the most information in a lossy compression scheme such as VC-1. This codec (unlike many others) defines a direct way to scale DC/AC coefficients using quantization parameter instead of specifying quantization matrixes.

Quantizer may differ between macroblocks in several ways - all macroblocks may have different quantizers, edge macroblocks only, two adjacent edges macroblocks, macroblocks from one edge or all macroblocks may have the same quantizer. For cases 2-4 there is second quantizer for selected edge macroblocks, for the first case difference value between main and real quantizer is stored.

Bitplane Coding

VC-1 uses a number of bitplanes which are simply maps of ones and zeros that specify properties for the macroblocks in an image. For example, a particular bitplane codes information about which macroblocks are not coded in a frame. These bitplanes are coded into the final bitstream using a number of methods:

  • raw (data from bitplane is actually stored in macroblock header)
  • rowskip/colskip (each row or column are either zero - '0' bit is sent or coded - '1' bit and raw data bits are sent)
  • tiling (bitplane is split into 2x3 or 3x2 blocks, each block is coded with own codeword, remainder is coded with rowskip and colskip method)

Bitplane may be coded in inverted mode which is signalled by additional bit before bitplane data.

Motion Compensation

VC-1 uses half-pel and quarter-pel interframe motion compensation with either bilinear (like in H.264) or bicubic (extended version of motion compensation employed in Windows Media 2) interpolation.

Huffman Coding

All essential data in frames (like motion vectors, block coefficients) is stored using static Huffman codes. Usually there are several sets of codes for each data type (motion vectors, block coefficients) and one set is used throughout whole frame. The set index is usually defined in frame header or derived from some parameters (like quantization or is frame intra/inter). Many of those codesets are inherited from MS MPEG-4 variants.

Intensity Compensation

Intensity Compensation is special mode when reference frame luma and chroma data are scaled before using it in motion compensation.

Range Reduction

This is special mode when both luma and chroma data range (0..255) is scaled down twice to 64..192 (with center = 128), so it needs to be expanded back before displaying (and using for prediction in simple and main profiles).

Overlap Transform

For blocks with big quantization value overlap transform may be performed. This is done by smoothing borders of adjacent blocks.

Bitstream Packing

VC-1 bitstreams are packed as bits into bytes in left -> right order:

byte 0   byte 1   byte 2   byte 3   byte 4   ....
 byte 0    byte 1
abcdefgh  ijklmnop

Given the preceding bytestream/bitstream, a get_bit() operation to retrieve the next bit in the stream would return bit a. A get_bits(5) operation to request the next 5 bits would return 'bcdef'. The next get_bits(4) operation would return 'ghij'.

Setup Data / Sequence Layer

When VC-1 data is encapsulated inside of an ASF file it will be accompanied with setup data attached as the extradata of a BITMAPINFOHEADER data structure. In VC-1 parlance, this data is called the sequence layer. The format of this data is as follows:

  • 2 bits: profile (0 - simple, 1 - main, 2 - complex, 3 - advanced). Complex profile is not covered by VC-1 standard and may occur in old WMV3 files where it was called "advanced profile".
  • if profile is simple or main (0 or 1, respectively)
    • 2 bits: reserved, should be 0
  • if profile is advanced (3)
    • 3 bits: level of advanced profile (values 5-7 are invalid)
    • 2 bits: chroma format (note that only format 1, YUV 4:2:0 is defined; other values are invalid)
  • 3 bits: Q frame rate for post processing; unused
  • 5 bits: Q bit rate for post proc; unused
  • if profile is simple or main
    • 1 bit: loop filter flag
    • 1 bit: reserved, should be 0 (looks like special coding mode known as J-frames in WMV2)
    • 1 bit: multiresolution coding flag
    • 1 bit: reserved, should be 1
    • 1 bit: fast U/V motion compensation (note: must be 1 in simple profile) - hints if decoder should round chroma motion values to halves
    • 1 bit: extended motion vectors (note: must be 0 in simple profile)
    • 2 bits: macroblock dequantization mode
    • 1 bit: variable sized transform (i.e. allow 8x4, 4x8 and 4x4 blocks)
    • 1 bit: reserved, should be 0 (possibly means if codeset for decoding AC coefficient is specified explicitly)
    • 1 bit: overlapped transform flag
    • 1 bit: sync marker flag
    • 1 bit: range reduction flag
    • 3 bits: maximum number of consecutive B frames
    • 2 bits: quantizer mode
    • 1 bit: 'finterp' flag in present in frame header
    • 1 bit: 'release-to-manufacturer' flag - if set to 0 means old WMV3 encoding with different bitstream format for P/B frames (yet unfigured)
  • if profile is advanced
    • 1 bit: post processing flag
    • 12 bits: max coded width (actual width = (width + 1) * 2)
    • 12 bits: max coded height (actual height = (height + 1) * 2)
    • 1 bit: pulldown flag
    • 1 bit: interlaced
    • 1 bit: frame counter flag
  • 1 bit: frame interpolation flag
  • if profile is advanced
    • (UNFINISHED: lots more stuff to be filled in when advanced profile is needed)
  • if profile is simple or main
    • 1 bit: reserved, should be 1

In the case of simple or main profile data encapsulated in a general container format, the max coded width and height parameters will come from the container format rather than being encoded in the sequence layer. Observe that the total number of bits that comprise the sequence layer for simple or main profile data is 32 which, incidentally, ought to be the size of the extradata transmitted from the ASF container to a VC-1 decoder.

High Level Decoding Algorithm

For each encoded frame:

  • unpack the frame information such as quantization parameters, bitplanes, and tables
  • for each field
    • unpack each macroblock
    • for each macroblock
      • perform motion compensation if needed
      • determine blocks coding mode (which block is intra and is coded or not)
      • for each block
        • if block is intra or coded then decode it else proceed to the next block
        • do inverse transform and in some case add 128 to every sample value
        • do overlapping if needed
        • do postprocessing if requested

Decoding Motion Vector

Each motion vector is stored in form F*36+LY*6+LX where F - flag which signalizes that macroblock is coded, LX and LY are coded sign and lengths of dX and dY values.

Decoding Intra Block

Overall decoding process:

  • predict DC value
  • read and apply delta value
  • if block has AC coeffs
    • select dezigzag matrix
    • while it is not the last coefficient
      • decode AC coefficient information
      • put AC coefficient into designated place in block
  • if specified do AC prediction basing on predicted DC direction
  • unquantize all coefficients

Decoding Inter Block

Inter block is composed from AC coefficients starting from top left corner. The only catch that it could be divided into subblocks which need to be decoded separately.

Decoding AC Coefficient

AC coefficient is coded by special value which decomposes into number of zeroes before coefficient, coefficient value and if this is the last non-zero coefficient. All information needed to decode them is contained in special tables.

AC Prediction

AC prediction is simply adding seven AC coefficient values from first row of block above or first column of block left of the current one to the corresponding coefficients of the destination block. Source block is the same which DC value was used for prediction. This process is performed only when the special bit is set.

Official Information

This Wiki aims to provide a complete, independent, and understandable description of the VC-1 format. Until such time, here are some external references on the format.

WMVP differences from WMV3

WMVP is essentially a slide show containing source material and transform information.

Source material is stored like sprite which is transformed and cropped afterwards (sprite dimensions are usually bigger than output). Sprite properties are stored in sequence header instead of RES_RTM flag:

 sprite width      - 11 bits
 sprite height     - 11 bits
 frame rate        -  5 bits
 X8 presence       -  1 bit
 skip DC/AC tables -  1 bit
 slice code        -  3 bits

Frames are preceded by two bits with undiscovered meaning. I-frames contain sprites and transform coefficients in 15.15 fixed point format, P-frames contain only transform coefficients.