# Windows Media Audio

Jump to navigation
Jump to search

Windows Media Audio (WMA) is a perceptual audio codec that is usually packaged in ASF files.

There are 2 versions: v1 (ID 0x160) and v2 (ID 0x161) with slight differences.

Occasionally, WMA is referred to as DivX audio as it is often used in conjunction with Microsoft's family of MPEG-4 codecs, version 3 of which is sometimes known as 'DivX ;-)' video.

## Data Format And Decoding Process

This section contains some random notes about what it takes to decode the WMA format.

- multi-byte numbers are little endian
- data tables include:
- critical frequencies
- exponent bands for 22050, 32000, and 44100 Hz
- gain Huffman table (37 entries)
- codebook of LSP coefficients
- scale Huffman table (121 entries)
- coefficient 0 Huffman table (666 entries)
- coefficient 1 Huffman table (555 entries)
- coefficient 2 Huffman table (1336 entries)
- coefficient 3 Huffman table (1072 entries)
- coefficient 4 Huffman table (476 entries)
- coefficient 5 Huffman table (435 entries)
- levels 0 (60 entries)
- levels 1 (40 entries)
- levels 2 (340 entries)
- levels 3 (180 entries)
- levels 4 (70 entries)
- levels 5 (40 entries)

- coding format seems to embody concepts of blocks, frames (one or more blocks), and superframes (one or more frames)

- initialization:
- naturally, container format (AVI, ASF, maybe WAV?) carries sample rate, channel, bit rate, and block alignment information
- WAVEFORMATEX header contains extra setup data
- v1: 4 extradata bytes:

bytes 0-1: flags1 bytes 2-3: flags2

- v2: 6 extradata bytes:

bytes 0-3: flags1 bytes 4-5: flags2

- flags 2 field:

bit 0 indicates exp VLCs (exponential VLCs?) bit 1 indicates that a bit reservoir is to be used bit 2 indicates a variable block length (VBR audio?)

- frame length constraints:

if sample rate <= 16000, frame length bits = 9 else if (sr <= 22050) || (v1 && sr <= 32000) frame length bits = 10 else frame length bits = 11

- frame length = 2 ^ (frame length bits)
- if var block length ... add logic for determining block sizes ... based on upper 13 bits of flags2 ...
- init rate dependent parameters
- use noise coding = 1 as a default
- high frequency = sample rate / 2

- v2 forces normalized frequencies:

if sr >= 44100, force to 44100... other cutoffs are 22050, 16000, 11025, 8000

- bits/sec = bitrate / (channels * sr)
- byte offset bits = log2(bps * frame length / 8) + 2
- compute high frequency value and choose if noise coding should be activated based on channels and sr
- compute the scale factor band sizes for each MDCT block size