QDesign Music Codec

From MultimediaWiki
Revision as of 03:01, 8 July 2020 by Kostya (talk | contribs) (document QDM2)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

QDesign Music Codec is a perceptual audio codec commonly used in MOV files between 1998 and 2005, after which it was more or less supplanted by AAC for the same role. Variants of QDM2 with 32, 16 or 8 subbands exist, 16 and 8 decode with noticeable artifacts (metallic sound). The codecs were the predecessors of DTS LBR (also known as DTS express).

QDMC was the first version of the codec which was apparently short lived and quickly superceded by the second version, QDM2. Both QDM2 and QDMC are supported in open source software (via FFmpeg).

Unlike many other audio codec, these ones split audio into pure tones that may span several subframes and residue, which may be coded either as pseudo-random noise (QDMC) or a mixture of pseudo-random noise and spectral coefficients in 32 sub-bands (using the same QMF as MPEG Audio Layer I/II).

QDesign Music Extradata

Both versions of the codec have the same extradata format consisting of QDCA and QDCP atoms. The former contains seven 32-bit big-endian words used by decoder to initialise various parameters while the latter contains several floating-point numbers and can be ignored.

QDCA contents (all entries are 32-bit big-endian words):

  • version, always 1 (even for QDM2)
  • number of channels
  • sampling rate
  • bitrate
  • audio frame length (it is always 32 times large than subframe length for QDMC and 16 times large for QDM2)
  • subframe length (should be a power of two)
  • bytes per frame.

QDesign Music Codec (v1)

This codec codes audio as a set of tones and noise parameters that are later reconstructed with FFT.

For sampling rates < 16kHz subframe should be 64 samples and full frame should be 2048 samples. For 16-32kHz it's 128 and 4096 samples correspondingly. For >=32kHz it's 256 and 8192 samples.

Bits are read from LSB using 16-bit little-endian words.

Subblock format

Noise data
 for (band = 0; band < noise_bands[mode]; band++) {
   v = get_huff(noise_val_tree);
   if (v & 1)
     v = (v + 1) >> 1;
   else
     v = -v / 2;
   noise[band][0] = v - 1;
   lastval = v;
   for (idx = 0; idx < 16;) {
     len = get_huff_long(noise_seg_len_tree);
     v = get_huff(noise_val_tree);
     if (v & 1)
       v = (v + 1) >> 1;
     else
       v = -v / 2;
     newval = lastval + v;
     for (j = 0; j < len; j++)
       noise[band][idx + 1 + j] = lastval + v * j / len;
     lastval = newval;
     idx += len + 1;
   }
 }
Wave data
 for (group = 0; group < 5; group++) {
   group_size = 1 << (frame_bits - group - 1);
   group_bits = 4 - group;
   freq = 1;
   off  = 0;
   pos2 = 0;
   do {
     freq += get_huff_long(freq_diff_tree);
     while (freq >= group_size - 1) {
       freq -= group_size - 2;
       off  += 1 << group_bits;
       pos2 += group_size;
     }
     if (pos2 >= frame_size)
       break;
     if (channels > 1)
       stereo_mode = get_bits(2);
     else
       stereo_mode = 0;
     amp   = get_huff(amp_tree);
     phase = get_bits(3);
     if (stereo_mode & 2) {
       amp2   =  amp   - get_huff(amp_diff_tree);
       phase2 = (phase - get_huff(phase_diff_tree)) & 7;
     } else {
       amp2   = 0;
       phase2 = 0;
     }
     add tone <off, freq, stereo_mode & 1, amp, phase>
     if (stereo_mode & 2)
       add tone <off, freq, ~stereo_mode & 1, amp2, phase2>
   } while (freq < group_size);
 }
Huffman code reading

Trees are static but the codes can have an additional data afterwards:

 int get_huff(tree) {
   v = read_code(tree);
   if (v)
     return v - 1;
   else {
     v = get_bits(3) + 1;
     return get_bits(v);
   }
 }
 int get_huff_long(tree) {
   v = read_code(tree);
   if (v)
     v--;
   else
     v = get_bits(get_bits(3) + 1);
   return code_prefix[v] + get_bits(v >> 2);
 }
 unsigned code_prefix[] = {
   0x0, 0x1, 0x2, 0x3, 0x4, 0x6, 0x8, 0xA,
   0xC, 0x10, 0x14, 0x18, 0x1C, 0x24, 0x2C, 0x34,
   0x3C, 0x4C, 0x5C, 0x6C, 0x7C, 0x9C, 0xBC, 0xDC,
   0xFC, 0x13C, 0x17C, 0x1BC, 0x1FC, 0x27C, 0x2FC, 0x37C,
   0x3FC, 0x4FC, 0x5FC, 0x6FC, 0x7FC, 0x9FC, 0xBFC, 0xDFC,
   0xFFC, 0x13FC, 0x17FC, 0x1BFC, 0x1FFC, 0x27FC, 0x2FFC, 0x37FC,
   0x3FFC, 0x4FFC, 0x5FFC, 0x6FFC, 0x7FFC, 0x9FFC, 0xBFFC, 0xDFFC,
   0xFFFC, 0x13FFC, 0x17FFC, 0x1BFFC, 0x1FFFC, 0x27FFC, 0x2FFFC, 0x37FFC,
   0x3FFFC
 };

Tables

Noise bands selector (depending on bitrate):

 4, 3, 2, 1, 0, 0, 0, 0

Number of noise bands:

 19, 14, 11, 9, 4

Noise subbands:

 0, 1, 2, 4, 6, 8, 12, 16, 24, 32, 48, 56, 64, 80, 96, 120, 144, 176, 208, 240, 256
 0, 2, 4, 8, 16, 24, 32, 48, 56, 64, 80, 104, 128, 160, 208, 256
 0, 2, 4, 8, 16, 32, 48, 64, 80, 112, 160, 208, 256
 0, 4, 8, 16, 32, 48, 64, 96, 144, 208, 256
 0, 4, 16, 32, 64, 256

Levels table:

 1.1875, 1.6835938, 2.375, 3.3671875, 4.75, 6.734375, 9.5, 13.46875,
 19.0, 26.9375, 38.0, 53.875, 76.0, 107.75, 152.0, 215.5,
 304.0, 431.0, 608.0, 862.0, 1216.0, 1724.0, 2432.0, 3448.0,
 4864.0, 6896.0, 9728.0, 13792.0, 19456.0, 27584.0, 38912.0, 55168.0,
 77824.0, 110336.0, 155648.0, 220672.0, 311296.0, 441344.0, 622592.0, 882688.0,
 1245184.0, 1765376.0, 2490368.0, 3530752.0, 4980736.0, 7061504.0

Frequencies for the trees

Table 0 — noise value

 3233, 1195, 1897, 877, 1240, 368, 364, 222, 103, 125, 18, 68, 10, 25, 7, 13,
 0, 18, 0, 20, 0, 31, 0, 28, 0, 31, 0, 19, 0, 23, 0, 10, 0, 9, 0, 1,

Table 1 — noise segment length

 7647, 1011, 380, 215, 180, 65, 33, 12, 4, 0, 0, 0, 16, 0, 0, 0, 84

Table 2 — amplitude

 2436, 1411, 692, 389, 316, 310, 368, 457, 651, 1359, 2563, 4732, 8946, 17150, 29621, 44245,
 50156, 45928, 33262, 20474, 9855, 3813, 1378, 514, 154, 82, 3

Table 3 — frequency difference

 57884, 27424, 14988, 11027, 17889, 14609, 11790, 9479,
 15948, 11581, 7815, 6917, 10486, 6603, 4897, 3983,
 5120, 3479, 2949, 2626, 3443, 2984, 3725, 3593,
 3307, 3283, 2954, 2384, 1777, 2042, 1641, 798,
 769, 863, 776, 239, 162, 104, 63, 43,
 49, 30, 6, 1, 0, 4, 1

Table 4 — amplitude difference tree

 8392, 14998, 5103, 1797, 648, 237, 42, 7

Table 5 — phase difference tree

 8860, 9620, 2138, 897, 618, 834, 1920, 6337

Converting frequencies into codes is left as an exercise to the reader.

QDesign Music Codec v2

This codec organises data into several sub-packets. All packets and sub-packets have the same header: ID byte and size. Top bit of ID byte set signals that packet size is stored as 16-bit little-endian word, otherwise it's just single byte.

There are several possible packet types:

  • 2 - intra packet with checksum being present
  • 3 - intra packet without checksum
  • 4 and 5 - inter packet with checksum
  • 6 and 7 - inter packet without checksum

Checksum is two bytes stored right after packet header (let's call them A and B). Checksum is selected in such way that A*257 + B*2 - sum of all frame bytes = 0.

After the packet header and optional checksum sub-packets follow with subpacket ID = 0 and size = 0 signalling end of data.

Subpacket type 9

This packet contains coarse quantisers used in QMF part for bands 1-N (where N is 3-10 depending on frame samples and bitrate). Each band has 8 quantisers coded in the same way as noise scales in QDMC.

Subpacket type 10

This packet contains coarse quantisers used in QMF part for band 0 and refining quantisers for other bands.

Refining quantisers are stored in the following order:

 for each coded QMF subband + 1 {
   for each channel {
     for i in 0..8 {
       if get_bit() {
         read eight Huffman-coded grid 1 quants
       } else {
         set eight grid 1 quants to zero
       }
     }
   }
 }
 for each coded QMF subband starting from 4 {
   for each channel {
     read Huffman-coded grid 3 average quant
   }
 }
 for coded QMF subband starting from 4 {
   for each channel {
     for i in 0..8 {
       read Huffman-coded grid 3 quant
     }
   }
 }

Subpacket type 11

This subpacket contains some 13-bit value related to bit allocation and coefficient data for low eight QMF sub-bands. Sub-band data is coded in the following way depending on bit allocation:

 no_noise = get_bit();
 idx = 0;
 while idx < 128 {
   switch quant_weight[ch][sub_band][idx / 2] {
     case 8: // 0.8 bits per sample
       if !no_noise {
         val = get_bits(8);
         unpack val as five values packed modulo 3 (i.e. samp[0] * 81 + samp[1] * 27 +...)
         use them as indices in QUANT_1BIT[is_jstereo][]
       } else {
         for each sample {
           if get_bit() {
             sample = QUANT_1BIT[is_jstereo][get_bit() * 2];
           } else {
             sample = 0;
           }
         } 
       }
       interleave five decoded samples with pseudo-random noise
       idx += 10;
       break;
     case 10: // 1 bit per sample
       get scale sign, modify scale depending on band, output sample
       idx++;
       break;
     case 16: // 1.6 bits per sample
       the same as case 8 but no interleaving with noise
       idx += 5;
       break;
     case 24: // 2.4 bits per sample
       read 7-bit value, unpack it into three modulo five indices
       output three samples
       idx += 3;
       break;
     case 30: // 3 bits per sample
       read Huffman-coded sample
       idx++;
       break;
     case 34: // 3.4 bits per sample
       if first sample of such kind in band {
         scale = 1.0 / (1 << get_bits(2));
         read 5 bits for first code value, use sample value as the predictor
       } else {
         read Huffman-coded difference, scale it and add to the predictor
         output new sample
       }
       idx++;
       break;
   }
 }

Subpacket type 12

This subpacket contains coefficient data for all but first eight QMF sub-bands.

Subpacket types 13 and 14

These subpackets contain six scales used to modify tonal amplitudes depending on frequency. Type 13 subpacket stores them as 6-bit values, type 14 subpacket has them Huffman-coded.

Subpacket type 15

This subpacket seem to contain information for four super-tones spanning four subframes each. Coding method for them is different from other tones.

Subpacket types 17-23

These packets contain tones for single groups. Tones are coded almost the same as wave data in QDMC but with special cases for frequency differences being 0 or 1 (then position should be advanced by one or eight group sizes instead).

Subpacket type 31

This subpacket contains data for all five tonal groups.

Subpacket types 33-39

The same as types 17-23 but using different codebook for tone amplitudes.

Subpacket type 46

This subpacket contains six tone envelope values coded with six bits each (like type 13 subpacket) plus tones for all five subgroups using an alternative level codebook.

Reconstruction

Reconstruction is performed as following:

 for each subframe {
   if subpacket 9 is present {
     if subpackets 10-12 are not present {
       fill subbands with noise using scales from subpacket 9
     }
     perform QMF on 8 sets of 32 sub-band samples for current subframe
     if subframe length is less than 128 samples take and add even second or fourth sample
     otherwise add all samples to the output
   }
   if subframe number is less than two then use tones from previous frame (coded for subpackets 14 and 15 correspondingly)
   otherwise use tones for current subframe N-2
   add tones still active from the previous subframes
   add delay from the previous subframe to the audio output
   perform inverse RDFT and add half of its output to the audio output (src[0].re, src[0].im, src[1].re, ...)
   save second part of iRDFT output as the delay for the next frame
 }