VC-1
- FourCCs: WMV3, WMV9, WMVA, WVC1, WMVP
- Company: Microsoft
- Samples:
- General Overview: http://www.microsoft.com/windows/windowsmedia/forpros/events/NAB2005/VC-1.aspx
VC-1 is a video coding standard developed by Microsoft. It began as Windows Media Video 9. It is prevalent in ASF files downloaded from the internet. It is also supposed to be used on HD-DVDs.
Official Information
This Wiki aims to provide a complete, independent, and understandable description of the VC-1 format. Until such time, here are some external references on the format.
- Old specs can be found here: http://jovian.com/files/C24.008-VC9-Spec-CD1.pdf
- VC-1 Compressed Video Bitstream Format and Decoding Process http://www.smpte.org/smpte_store/standards/pdf/s421m.pdf
- VC-1 Bitstream Transport Encodings (specs for placing VC-1 in MPEG-2 Program and Transport streams) http://www.smpte.org/smpte_store/standards/pdf/rp227.pdf
- VC-1 Decoder and Bitstream Conformance http://www.smpte.org/smpte_store/standards/pdf/rp228.pdf
- Googling for VC1_reference_decoder_release6.zip might turn up sources for the reference decoder.
See Understanding VC-1 for more information about the technical details of the format.
Encapsulation
Most commonly, VC-1 data is found inside of Microsoft ASF files and identified with the FourCC 'WMV3'. Note that the FourCC 'WMV9' may not actually exist in the wild but the acronym gained prominence anyway due to the fact that this video codec was introduced as part of the Windows Media 9 tool suite. VC-1 video will probably be encapsulated in other types of containers and stream formats such as MPEG for HD-DVD transport.
Profiles And Levels
This table is cribbed wholesale from http://www.microsoft.com/windows/windowsmedia/forpros/events/NAB2005/VC-1.aspx
VC-1 has 3 profiles: simple, main, and advanced. Each has various levels. The combinations of profiles and levels represent trade-offs between encoding/decoding complexity, compression quality, and compressed image size.
Profile | Level | Maximum Bit Rate | Representative Resolutions by Frame Rate (Format) |
---|---|---|---|
Simple | Low | 96 kilobits per second (Kbps) | 176 x 144 @ 15 Hz (QCIF) |
Medium | 384 Kbps | 240 x 176 @ 30 Hz 352 x 288 @ 15 Hz (CIF) | |
Main | Low | 2 megabits per second (Mbps) | 320 x 240 @ 24 Hz (QVGA) |
Medium | 10 Mbps | 720 x 480 @ 30 Hz (480p) 720 x 576 @ 25 Hz (576p) | |
High | 20 Mbps | 1920 x 1080 @ 30 Hz (1080p) | |
Advanced | L0 | 2 Mbps | 352 x 288 @ 30 Hz (CIF) |
L1 | 10 Mbps | 720 x 480 @ 30 Hz (NTSC-SD) 720 x 576 @ 25 Hz (PAL-SD) | |
L2 | 20 Mbps | 720 x 480 @ 60 Hz (480p) 1280 x 720 @ 30 Hz (720p) | |
L3 | 45 Mbps | 1920 x 1080 @ 24 Hz (1080p) 1920 x 1080 @ 30 Hz (1080i) 1280 x 720 @ 60 Hz (720p) | |
L4 | 135 Mbps | 1920 x 1080 @ 60 Hz (1080p) 2048 x 1536 @ 24 Hz |
Coding Concepts
Colorspace
VC-1 codes a sequence of images in the YUV 4:2:0 colorspace.
Macroblocks, Blocks, and Sub-blocks
When VC-1 codes an image, it divides the image into macroblocks. Each 16x16 macroblock is comprised of 6 8x8 sample blocks (4 Y blocks, 1 U block, and 1 V block). Further, the coding method may divide an individual 8x8 block into 2 8x4 blocks, 2 4x8 blocks, or 4 4x4 blocks.
Transform Coding
VC-1 uses a variation of the Discrete Cosine Transform to convert blocks of samples into a transform domain to facilitate more efficient coding. The transform may operate on the full 8x8 block or any of the 3 supported sub-block sizes (8x4, 4x8, or 4x4). Unlike many codec standards preceding VC-1, the specification defines a bit-accurate transform method that all implementations are expected to conform to so as to minimize transform error.
Zigzag
After tranforming sample data into the transform domain, VC-1 reorders the transformed data in a zigzag pattern which makes certain successive coding techniques more effective. VC-1 has 31 different zigzag patterns depending on various parameters.
Quantization
Quantization is the compression step that potentially loses the most information in a lossy compression scheme such as VC-1. This codec features an impressive number of quantization modes.
Bitplane Coding
VC-1 uses a number of bitplanes which are simply maps of ones and zeros that specify properties for the macroblocks in an image. For example, a particular bitplane codes information about which macroblocks are not coded in a frame. These bitplanes are coded into the final bitstream using a number of methods.
Differential Coding
In addition the usual type of differential coding where differences between successive values are stored rather than the absolute values, VC-1 also uses XOR bit operations.
Motion Compensation
VC-1 uses half-pel and quarter-pel interframe motion compensation.
Huffman Coding
Intensity Compensation
Bitstream Packing
Data Format
This description assumes that the data to be decoded in WMV3 data coming in from a Microsoft ASF file. The video data should be packaged with "extradata" which is attached to the end of a BITMAPINFOHEADER structure and transported in the ASF file. The format of the extradata is as follows:
2 bits VC-1 Profile if (profile == 3) 3 bits Profile level 2 bits Chroma format (SRD does not care) 3 bits VC1_BITS_FRMRTQ_POSTPROC (? SRD does not care) 5 bits VC1_BITS_BITRTQ_POSTPROC (? SRD does not care) 1 bit VC1_BITS_POSTPROCFLAG (? SRD does not care) 12 bits Encoded width (actual width = (w + 1) * 2) 12 bits Encoded height (actual height = (h + 1) * 2)
There are 4 VC-1 profiles:
- 0 simple profile
- 1 main profile
- 2 reserved
- 3 advanced profile
If profile is advanced, the extradata carries a lot of setup information. For simple and main profiles, the relevant setup data is established outside of the decoder, e.g., the BITMAPINFOHEADER of a Microsoft ASF file. This information provides the width and height that the decoder uses to set up its state.
The decoder computes the macroblock width and height as the ceiling of each dimension divided by 16:
macroblock_width = (frame_width + 15) / 16 macroblock_height = (frame_height + 15) / 16
The total number of macroblocks in a frame is defined as:
total_macroblocks = macroblock_width * macroblock_height
If the level is marked as unknown during the initialization process, figure out what level the video belongs at. This is determined by the number of macroblocks in combination with the profile. The relevant table is vc1gentab.c:vc1GENTAB_LevelLimits[][]. The profile/level combination defines the following limits:
max macroblocks/second max macroblocks/frame max peak transmission rate in kbps max buffer size in multiples of 16 kbits motion vector range
The initializer then needs to compute how much space to allocate for each reference frame. The size of a frame determined by frame width and height, encoding profile, and interlacing. This size is used to allocate space for 4 different frames:
reference new (new/current I/P frame) reference old (old I/P reference frame) reference B (reconstructed B frame) reference NoIC (B reference before intensity compensation was applied)
Further, the initializer allocates space for 7 different bitplanes. Each bitplanes has 1 flag per each macroblock as enumerated by the max macroblocks per frame for the profile/level. The bitplanes are:
ACPRED SKIPMB MVTYPEMB DIRECTMB OVERFLAGS FORWARDMB FIELDTX
Allocate space for motion vector history. The number of entries in this array is macroblock_width * (macroblock_height + 1) (extra height is for interlaced field). Each entry is a motion vector history structure which contains the 4 Y block motion vectors for a particular macroblock. The individual motion vector structures are the same as in the intra structure which provides hybrid prediction, motion vectors, and diff MVs (again, 4 for each block?).
And that's it for the SRD "requirements gathering" process (vc1dec.c:vc1DEC_DecoderRequirements()). The function returns the number of bytes needed for the decoder's internal state. The client app is expected to allocate enough space for this state.
Finally, it is time to decode an actual frame (referred to as "unpacking the picture layer"). The decode process iterates through however many fields comprise the frame (1 or 2).
(unfinished) ... there is a lot more logic dealing with frame accounting; let's skip to the real meat: macroblock decoding! ...
Decode a macroblock:
set the macroblock overlap filter flag, coding type, quantizer and halfstep parameters to the same as the picture clear the skipped flag set the CBP to 0 (no coded blocks) choose the quantizer (long list of logic, see vc1iquant.c:vc1IQUANT_ChooseQuantizer()) for each of the 6 sub-blocks, set coded field to 0, clear down all MV data decide on non-uniform quantizer unpack an I or BI macroblock:
(unfinished)