Windows Media Audio: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
No edit summary
(Add samples link.)
Line 2: Line 2:
* Company: [[Microsoft]]
* Company: [[Microsoft]]
* US Patent links: [http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220050015259%22.PGNR.&OS=DN/20050015259&RS=DN/20050015259][http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220050015246%22.PGNR.&OS=DN/20050015246&RS=DN/20050015246]
* US Patent links: [http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220050015259%22.PGNR.&OS=DN/20050015259&RS=DN/20050015259][http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220050015246%22.PGNR.&OS=DN/20050015246&RS=DN/20050015246]
Windows Media Audio (WMA) is a perceptual audio codec that is usually packaged in [[Microsoft Advanced Streaming Format|ASF]] files. There are 2 versions: v1 (ID 0x160) and v2 (ID 0x161) with slight differences.
Windows Media Audio (WMA) is a perceptual audio codec that is usually packaged in [[Microsoft Advanced Streaming Format|ASF]] files. * Samples: http://samples.mplayerhq.hu/A-codecs/WMA1/ http://samples.mplayerhq.hu/A-codecs/WMA2/
 
There are 2 versions: v1 (ID 0x160) and v2 (ID 0x161) with slight differences.


Occasionally, WMA is referred to as DivX audio as it is often used in conjunction with Microsoft's family of MPEG-4 codecs, version 3 of which is sometimes known as 'DivX ;-)' video.
Occasionally, WMA is referred to as DivX audio as it is often used in conjunction with Microsoft's family of MPEG-4 codecs, version 3 of which is sometimes known as 'DivX ;-)' video.

Revision as of 01:23, 15 October 2007

Windows Media Audio (WMA) is a perceptual audio codec that is usually packaged in ASF files. * Samples: http://samples.mplayerhq.hu/A-codecs/WMA1/ http://samples.mplayerhq.hu/A-codecs/WMA2/

There are 2 versions: v1 (ID 0x160) and v2 (ID 0x161) with slight differences.

Occasionally, WMA is referred to as DivX audio as it is often used in conjunction with Microsoft's family of MPEG-4 codecs, version 3 of which is sometimes known as 'DivX ;-)' video.

Data Format And Decoding Process

This section contains some random notes about what it takes to decode the WMA format.

  • multi-byte numbers are little endian
  • data tables include:
    • critical frequencies
    • exponent bands for 22050, 32000, and 44100 Hz
    • gain Huffman table (37 entries)
    • codebook of LSP coefficients
    • scale Huffman table (121 entries)
    • coefficient 0 Huffman table (666 entries)
    • coefficient 1 Huffman table (555 entries)
    • coefficient 2 Huffman table (1336 entries)
    • coefficient 3 Huffman table (1072 entries)
    • coefficient 4 Huffman table (476 entries)
    • coefficient 5 Huffman table (435 entries)
    • levels 0 (60 entries)
    • levels 1 (40 entries)
    • levels 2 (340 entries)
    • levels 3 (180 entries)
    • levels 4 (70 entries)
    • levels 5 (40 entries)
  • coding format seems to embody concepts of blocks, frames (one or more blocks), and superframes (one or more frames)
  • initialization:
    • naturally, container format (AVI, ASF, maybe WAV?) carries sample rate, channel, bit rate, and block alignment information
    • WAVEFORMATEX header contains extra setup data
    • v1: 4 extradata bytes:
 bytes 0-1: flags1
 bytes 2-3: flags2
    • v2: 6 extradata bytes:
 bytes 0-3: flags1
 bytes 4-5: flags2
    • flags 2 field:
 bit 0 indicates exp VLCs (exponential VLCs?)
 bit 1 indicates that a bit reservoir is to be used
 bit 2 indicates a variable block length (VBR audio?)
    • frame length constraints:
 if sample rate <= 16000,
   frame length bits = 9
 else if (sr <= 22050) || (v1 && sr <= 32000)
   frame length bits = 10
 else
   frame length bits = 11
    • frame length = 2 ^ (frame length bits)
    • if var block length ... add logic for determining block sizes ... based on upper 13 bits of flags2 ...
    • init rate dependent parameters
      • use noise coding = 1 as a default
      • high frequency = sample rate / 2
    • v2 forces normalized frequencies:
 if sr >= 44100, force to 44100...
 other cutoffs are 22050, 16000, 11025, 8000
    • bits/sec = bitrate / (channels * sr)
    • byte offset bits = log2(bps * frame length / 8) + 2
    • compute high frequency value and choose if noise coding should be activated based on channels and sr
    • compute the scale factor band sizes for each MDCT block size