- FOURCCs: WMV3, WMV9
- Company: Microsoft
Old specs can be found here: 
VC-1 is the codec microsoft is pushing for SMPTE standard. VC-1 is what wmv9 became and specs for it can be found here:
VC-1 Compressed Video Bitstream Format and Decoding Process 
VC-1 Bitstream Transport Encodings (specs for placing VC-1 in MPEG-2 Program and Transport streams) 
VC-1 Decoder and Bitstream Conformance 
Googling for VC1_reference_decoder_release6.zip might turn up sources for the reference decoder.
This description assumes that the data to be decoded in WMV3 data coming in from a Microsoft ASF file. The video data should be packaged with "extradata" which is attached to the end of a BITMAPINFOHEADER structure and transported in the ASF file. The format of the extradata is as follows:
2 bits VC-1 Profile if (profile == 3) 3 bits Profile level 2 bits Chroma format (SRD does not care) 3 bits VC1_BITS_FRMRTQ_POSTPROC (? SRD does not care) 5 bits VC1_BITS_BITRTQ_POSTPROC (? SRD does not care) 1 bit VC1_BITS_POSTPROCFLAG (? SRD does not care) 12 bits Encoded width (actual width = (w + 1) * 2) 12 bits Encoded height (actual height = (h + 1) * 2)
There are 4 VC-1 profiles:
- 0 simple profile
- 1 main profile
- 2 reserved
- 3 advanced profile
If profile is advanced, the extradata carries a lot of setup information. For simple and main profiles, the relevant setup data is established outside of the decoder, e.g., the BITMAPINFO header of a Microsoft ASF file. This information provides the width and height that the decoder uses to set up its state.
The decoder computes the macroblock width and height as the ceiling of each dimension divided by 16:
macroblock_width = (frame_width + 15) / 16 macroblock_height = (frame_height + 15) / 16
The total number of macroblocks in a frame is defined as:
total_macroblocks = macroblock_width * macroblock_height
If the level is marked as unknown during the initialization process, figure out what level the video belongs at. This is determined by the number of macroblocks in combination with the profile. The relevant table is vc1gentab.c:vc1GENTAB_LevelLimits. The profile/level combination defines the following limits:
max macroblocks/second max macroblocks/frame max peak transmission rate in kbps max buffer size in multiples of 16 kbits motion vector range
SRD maintains the following information about each macroblock:
macroblock type, contains the following attributes: (attribute 1) intra 1 MV 2 MV 4 MV (attribute 2) direct macroblock forward prediction backward prediction forward and backward prediction (attribute 3) MVs apply to fields bottom different than top field transform AC prediction status, one of the following attributes: AC prediction off AC prediction on no blocks can be predicted Block type, one of the following types: 8x8 inter-coded 8x4 inter-coded 4x8 inter-coded 4x4 inter-coded intra-coded, no AC prediction intra-coded, AC prediction from top values intra-coded, AC prediction from left values flag indicating whether overlap filter is active for this macroblock flag indicating whether macroblock is motion predicted only (no residual) byte indicating coded block pattern which indicates which of the 6 sub-blocks are coded Quantizer information, this includes: quantizer step in the range 1..31 quantizer half step, either 0 or 1 flag indicating uniform or non-uniform quantizer Information for each of the 6 constituent blocks in the macroblock: block type (same choices as the macroblock attributes) flag indicating non-zero AC coeffs for intra, non-zero AC/DC for inter union between an intra block structure and an inter block structure intra structure: number of zero/non-zero AC coeffs quantized DC coeff quantized AC top row for prediction (7 values) quantized AC left column for prediction (7 values) bottom 2 rows (16 values) kept for overlap smoothing inter structure: number of zero/non-zero AC coeffs for 4 sub-blocks (Y blocks?) forward and backward motion vector structures, each includes: hybrid prediction mode, one of the following attributes: predict from left predict from top no hybrid prediction (x,y) motion vectors for each of the 4 Y blocks (x,y) differential motion vectors in 1/4 pel units (note: I am a little confused as to why each of the 6 sub-blocks stores the motion vector data for the entire macroblock)
The initializer then needs to computer how much space to allocate for each reference frame. The size of a frame determined by frame width and height, encoding profile, and interlacing. This size is used to allocate space for 4 different frames:
reference new (new/current I/P frame) reference old (old I/P reference frame) reference B (reconstructed B frame) reference NoIC (B reference before intensity compensation was applied)
Further, the initializer allocates space for 7 different bitplanes. Each bitplanes has 1 flag per each macroblock as enumerated by the max macroblocks per frame for the profile/level. The bitplanes are:
ACPRED SKIPMB MVTYPEMB DIRECTMB OVERFLAGS FORWARDMB FIELDTX
Allocate space for motion vector history. The number of entries in this array is macroblock_width * (macroblock_height + 1) (extra height is for interlaced field). Each entry is a motion vector history structure which contains the 4 Y block motion vectors for a particular macroblock. The individual motion vector structures are the same as in the intra structure which provides hybrid prediction, motion vectors, and diff MVs (again, 4 for each block?).
And that's it for the SRD "requirements gathering" process (vc1dec.c:vc1DEC_DecoderRequirements()). The function returns the number of bytes needed for the decoder's internal state. The client app is expected to allocate enough space for this state.
Next is the vc1dec.c:vc1DEC_DecoderInitialise() function. This sets up the positions and structures contained within the memory pool allocated for space.
Next is the vc1dec.c:vc1DEC_DecodeSequence() function which unpacks the sequence layer:
2 bits profile if (profile is simple or main) 2 bits VC1_BITS_RES_SM (? SRD does not care) if (profile is advanced) 3 bits level of advanced profile 2 bits chroma format (note that only format 1, YUV 4:2:0 is defined) 3 bits QFrameRateForPostProc ("see standard", SRD does not use) 5 bits QBitRateForPostProc ("see standard", SRD does not use) if (profile is simple or main) 1 bit loop filter flag 1 bit reserved, should be 0 1 bit multiresolution coding flag 1 bit reserved, should be 1, SRD calls it "RES_FASTTX" 1 bit fast U/V motion compensation note: must be 1 in simple profile 1 bit extended motion vectors note: must be 0 in simple profile 2 bits macroblock dequantization 1 bit variable sized transform 1 bit reserved, should be 0, SRD calls it "RES_TRANSTAB" 1 bit overlapped transform flag 1 bit sync marker flag 1 bit range reduction flag 3 bits maximum number of consecutive B frames 2 bits quantizer if (profile is advanced) 1 bit post processing flag 12 bits max coded width (actual width = (w + 1) * 2) 12 bits max coded height (actual height = (h + 1) * 2) 1 bit pulldown flag 1 bit interlaced 1 bit frame counter flag 1 bit frame interpolation flag if (profile is advanced) [lots more stuff to be filled in when advanced profile is needed] if (profile is simple or main) 1 bit reserved, should be 1, SRD calls it "RES_RTM_FLAG"
Finally, it is time to decode an actual frame (referred to as "unpacking the picture layer"). The decode process iterates through however many fields comprise the frame (1 or 2).
Choose from among 5 different zigzag table sets depending on profile and interlacing:
if (picture format is interlaced frame) choose set 4 if (picture is intra) choose set 0 else if (profile is simple or main) choose set 1 else if (picture format is progressive) choose set 2 else choose set 3
(unfinished) ... there is a lot more logic dealing with frame accounting; let's skip to the real meat: macroblock decoding! ...
Decode a macroblock:
set the macroblock overlap filter flag, coding type, quantizer and halfstep parameters to the same as the picture clear the skipped flag set the CBP to 0 (no coded blocks) choose the quantizer (long list of logic, see vc1iquant.c:vc1IQUANT_ChooseQuantizer()) for each of the 6 sub-blocks, set coded field to 0, clear down all MV data decide on non-uniform quantizer unpack an I or BI macroblock: