Windows Media Audio

From MultimediaWiki
Jump to navigation Jump to search

Windows Media Audio (WMA) is a perceptual audio codec that is usually packaged in ASF files.

There are 2 versions: v1 (ID 0x160) and v2 (ID 0x161) with slight differences.

Occasionally, WMA is referred to as DivX audio as it is often used in conjunction with Microsoft's family of MPEG-4 codecs, version 3 of which is sometimes known as 'DivX ;-)' video.

Data Format And Decoding Process

This section contains some random notes about what it takes to decode the WMA format.

  • multi-byte numbers are little endian
  • data tables include:
    • critical frequencies
    • exponent bands for 22050, 32000, and 44100 Hz
    • gain Huffman table (37 entries)
    • codebook of LSP coefficients
    • scale Huffman table (121 entries)
    • coefficient 0 Huffman table (666 entries)
    • coefficient 1 Huffman table (555 entries)
    • coefficient 2 Huffman table (1336 entries)
    • coefficient 3 Huffman table (1072 entries)
    • coefficient 4 Huffman table (476 entries)
    • coefficient 5 Huffman table (435 entries)
    • levels 0 (60 entries)
    • levels 1 (40 entries)
    • levels 2 (340 entries)
    • levels 3 (180 entries)
    • levels 4 (70 entries)
    • levels 5 (40 entries)
  • coding format seems to embody concepts of blocks, frames (one or more blocks), and superframes (one or more frames)
  • initialization:
    • naturally, container format (AVI, ASF, maybe WAV?) carries sample rate, channel, bit rate, and block alignment information
    • WAVEFORMATEX header contains extra setup data
    • v1: 4 extradata bytes:
 bytes 0-1: flags1
 bytes 2-3: flags2
    • v2: 6 extradata bytes:
 bytes 0-3: flags1
 bytes 4-5: flags2
    • flags 2 field:
 bit 0 indicates exp VLCs (exponential VLCs?)
 bit 1 indicates that a bit reservoir is to be used
 bit 2 indicates a variable block length (VBR audio?)
    • frame length constraints:
 if sample rate <= 16000,
   frame length bits = 9
 else if (sr <= 22050) || (v1 && sr <= 32000)
   frame length bits = 10
 else
   frame length bits = 11
    • frame length = 2 ^ (frame length bits)
    • if var block length ... add logic for determining block sizes ... based on upper 13 bits of flags2 ...
    • init rate dependent parameters
      • use noise coding = 1 as a default
      • high frequency = sample rate / 2
    • v2 forces normalized frequencies:
 if sr >= 44100, force to 44100...
 other cutoffs are 22050, 16000, 11025, 8000
    • bits/sec = bitrate / (channels * sr)
    • byte offset bits = log2(bps * frame length / 8) + 2
    • compute high frequency value and choose if noise coding should be activated based on channels and sr
    • compute the scale factor band sizes for each MDCT block size