Windows Media Audio

Codec ID: 0x160, 0x161
Company: Microsoft
US Patent links: [1][2]
Samples:
- http://samples.mplayerhq.hu/A-codecs/WMA1/
- http://samples.mplayerhq.hu/A-codecs/WMA2/

Windows Media Audio (WMA) is a perceptual audio codec that is usually packaged in ASF files.

There are 2 versions: v1 (ID 0x160) and v2 (ID 0x161) with slight differences.

Occasionally, WMA is referred to as DivX audio as it is often used in conjunction with Microsoft's family of MPEG-4 codecs, version 3 of which is sometimes known as 'DivX ;-)' video.

Data Format And Decoding Process

This section contains some random notes about what it takes to decode the WMA format.

multi-byte numbers are little endian
data tables include:
- critical frequencies
- exponent bands for 22050, 32000, and 44100 Hz
- gain Huffman table (37 entries)
- codebook of LSP coefficients
- scale Huffman table (121 entries)
- coefficient 0 Huffman table (666 entries)
- coefficient 1 Huffman table (555 entries)
- coefficient 2 Huffman table (1336 entries)
- coefficient 3 Huffman table (1072 entries)
- coefficient 4 Huffman table (476 entries)
- coefficient 5 Huffman table (435 entries)
- levels 0 (60 entries)
- levels 1 (40 entries)
- levels 2 (340 entries)
- levels 3 (180 entries)
- levels 4 (70 entries)
- levels 5 (40 entries)
coding format seems to embody concepts of blocks, frames (one or more blocks), and superframes (one or more frames)

initialization:
- naturally, container format (AVI, ASF, maybe WAV?) carries sample rate, channel, bit rate, and block alignment information
- WAVEFORMATEX header contains extra setup data
- v1: 4 extradata bytes:

 bytes 0-1: flags1
 bytes 2-3: flags2

- v2: 6 extradata bytes:

 bytes 0-3: flags1
 bytes 4-5: flags2

- flags 2 field:

 bit 0 indicates exp VLCs (exponential VLCs?)
 bit 1 indicates that a bit reservoir is to be used
 bit 2 indicates a variable block length (VBR audio?)

- frame length constraints:

 if sample rate <= 16000,
   frame length bits = 9
 else if (sr <= 22050) || (v1 && sr <= 32000)
   frame length bits = 10
 else
   frame length bits = 11

- frame length = 2 ^ (frame length bits)
- if var block length ... add logic for determining block sizes ... based on upper 13 bits of flags2 ...
- init rate dependent parameters
  - use noise coding = 1 as a default
  - high frequency = sample rate / 2
- v2 forces normalized frequencies:

 if sr >= 44100, force to 44100...
 other cutoffs are 22050, 16000, 11025, 8000

- bits/sec = bitrate / (channels * sr)
- byte offset bits = log2(bps * frame length / 8) + 2
- compute high frequency value and choose if noise coding should be activated based on channels and sr
- compute the scale factor band sizes for each MDCT block size

Windows Media Audio

Data Format And Decoding Process

Navigation menu

Search