Windows Media Audio

From MultimediaWiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Windows Media Audio (WMA) is a perceptual audio codec that is usually packaged in ASF files.

There are 2 versions: v1 (ID 0x160) and v2 (ID 0x161) with slight differences.

Occasionally, WMA is referred to as DivX audio as it is often used in conjunction with Microsoft's family of MPEG-4 codecs, version 3 of which is sometimes known as 'DivX ;-)' video.

Data Format And Decoding Process

This section contains some random notes about what it takes to decode the WMA format.

  • multi-byte numbers are little endian
  • data tables include:
    • critical frequencies
    • exponent bands for 22050, 32000, and 44100 Hz
    • gain Huffman table (37 entries)
    • codebook of LSP coefficients
    • scale Huffman table (121 entries)
    • coefficient 0 Huffman table (666 entries)
    • coefficient 1 Huffman table (555 entries)
    • coefficient 2 Huffman table (1336 entries)
    • coefficient 3 Huffman table (1072 entries)
    • coefficient 4 Huffman table (476 entries)
    • coefficient 5 Huffman table (435 entries)
    • levels 0 (60 entries)
    • levels 1 (40 entries)
    • levels 2 (340 entries)
    • levels 3 (180 entries)
    • levels 4 (70 entries)
    • levels 5 (40 entries)
  • coding format seems to embody concepts of blocks, frames (one or more blocks), and superframes (one or more frames)
  • initialization:
    • naturally, container format (AVI, ASF, maybe WAV?) carries sample rate, channel, bit rate, and block alignment information
    • WAVEFORMATEX header contains extra setup data
    • v1: 4 extradata bytes:
 bytes 0-1: flags1
 bytes 2-3: flags2
    • v2: 6 extradata bytes:
 bytes 0-3: flags1
 bytes 4-5: flags2
    • flags 2 field:
 bit 0 indicates exp VLCs (exponential VLCs?)
 bit 1 indicates that a bit reservoir is to be used
 bit 2 indicates a variable block length (VBR audio?)
    • frame length constraints:
 if sample rate <= 16000,
   frame length bits = 9
 else if (sr <= 22050) || (v1 && sr <= 32000)
   frame length bits = 10
 else
   frame length bits = 11
    • frame length = 2 ^ (frame length bits)
    • if var block length ... add logic for determining block sizes ... based on upper 13 bits of flags2 ...
    • init rate dependent parameters
      • use noise coding = 1 as a default
      • high frequency = sample rate / 2
    • v2 forces normalized frequencies:
 if sr >= 44100, force to 44100...
 other cutoffs are 22050, 16000, 11025, 8000
    • bits/sec = bitrate / (channels * sr)
    • byte offset bits = log2(bps * frame length / 8) + 2
    • compute high frequency value and choose if noise coding should be activated based on channels and sr
    • compute the scale factor band sizes for each MDCT block size