QDesign Music Codec
- FOURCCs: QDMC, QDM2
- Company: QDesign
- Samples: http://samples.mplayerhq.hu/A-codecs/QDMC/ (QDMC)
- Samples: http://samples.mplayerhq.hu/A-codecs/QDM2/ (QDM2)
- Technical Analysis: http://multimedia.cx/mirror/qdmc2.pdf
- Description and encoders: http://www.rarewares.org/rrw/qdmc.php
QDesign Music Codec is a perceptual audio codec commonly used in MOV files between 1998 and 2005, after which it was more or less supplanted by AAC for the same role. Variants of QDM2 with 32, 16 or 8 subbands exist, 16 and 8 decode with noticeable artifacts (metallic sound). The codecs were the predecessors of DTS LBR (also known as DTS express).
QDMC was the first version of the codec which was apparently short lived and quickly superceded by the second version, QDM2. Both QDM2 and QDMC are supported in open source software (via FFmpeg).
Unlike many other audio codec, these ones split audio into pure tones that may span several subframes and residue, which may be coded either as pseudo-random noise (QDMC) or a mixture of pseudo-random noise and spectral coefficients in 32 sub-bands (using the same QMF as MPEG Audio Layer I/II).
QDesign Music Extradata
Both versions of the codec have the same extradata format consisting of QDCA
and QDCP
atoms.
The former contains seven 32-bit big-endian words used by decoder to initialise various parameters while the latter contains several floating-point numbers and can be ignored.
QDCA contents (all entries are 32-bit big-endian words):
- version, always 1 (even for QDM2)
- number of channels
- sampling rate
- bitrate
- audio frame length (it is always 32 times large than subframe length for QDMC and 16 times large for QDM2)
- subframe length (should be a power of two)
- bytes per frame.
QDesign Music Codec (v1)
This codec codes audio as a set of tones and noise parameters that are later reconstructed with FFT.
For sampling rates < 16kHz subframe should be 64 samples and full frame should be 2048 samples. For 16-32kHz it's 128 and 4096 samples correspondingly. For >=32kHz it's 256 and 8192 samples.
Bits are read from LSB using 16-bit little-endian words.
Subblock format
Noise data
for (band = 0; band < noise_bands[mode]; band++) { v = get_huff(noise_val_tree); if (v & 1) v = (v + 1) >> 1; else v = -v / 2; noise[band][0] = v - 1; lastval = v; for (idx = 0; idx < 16;) { len = get_huff_long(noise_seg_len_tree); v = get_huff(noise_val_tree); if (v & 1) v = (v + 1) >> 1; else v = -v / 2; newval = lastval + v; for (j = 0; j < len; j++) noise[band][idx + 1 + j] = lastval + v * j / len; lastval = newval; idx += len + 1; } }
Wave data
for (group = 0; group < 5; group++) { group_size = 1 << (frame_bits - group - 1); group_bits = 4 - group; freq = 1; off = 0; pos2 = 0; do { freq += get_huff_long(freq_diff_tree); while (freq >= group_size - 1) { freq -= group_size - 2; off += 1 << group_bits; pos2 += group_size; } if (pos2 >= frame_size) break; if (channels > 1) stereo_mode = get_bits(2); else stereo_mode = 0; amp = get_huff(amp_tree); phase = get_bits(3); if (stereo_mode & 2) { amp2 = amp - get_huff(amp_diff_tree); phase2 = (phase - get_huff(phase_diff_tree)) & 7; } else { amp2 = 0; phase2 = 0; } add tone <off, freq, stereo_mode & 1, amp, phase> if (stereo_mode & 2) add tone <off, freq, ~stereo_mode & 1, amp2, phase2> } while (freq < group_size); }
Huffman code reading
Trees are static but the codes can have an additional data afterwards:
int get_huff(tree) { v = read_code(tree); if (v) return v - 1; else { v = get_bits(3) + 1; return get_bits(v); } }
int get_huff_long(tree) { v = read_code(tree); if (v) v--; else v = get_bits(get_bits(3) + 1); return code_prefix[v] + get_bits(v >> 2); }
unsigned code_prefix[] = { 0x0, 0x1, 0x2, 0x3, 0x4, 0x6, 0x8, 0xA, 0xC, 0x10, 0x14, 0x18, 0x1C, 0x24, 0x2C, 0x34, 0x3C, 0x4C, 0x5C, 0x6C, 0x7C, 0x9C, 0xBC, 0xDC, 0xFC, 0x13C, 0x17C, 0x1BC, 0x1FC, 0x27C, 0x2FC, 0x37C, 0x3FC, 0x4FC, 0x5FC, 0x6FC, 0x7FC, 0x9FC, 0xBFC, 0xDFC, 0xFFC, 0x13FC, 0x17FC, 0x1BFC, 0x1FFC, 0x27FC, 0x2FFC, 0x37FC, 0x3FFC, 0x4FFC, 0x5FFC, 0x6FFC, 0x7FFC, 0x9FFC, 0xBFFC, 0xDFFC, 0xFFFC, 0x13FFC, 0x17FFC, 0x1BFFC, 0x1FFFC, 0x27FFC, 0x2FFFC, 0x37FFC, 0x3FFFC };
Tables
Noise bands selector (depending on bitrate):
4, 3, 2, 1, 0, 0, 0, 0
Number of noise bands:
19, 14, 11, 9, 4
Noise subbands:
0, 1, 2, 4, 6, 8, 12, 16, 24, 32, 48, 56, 64, 80, 96, 120, 144, 176, 208, 240, 256 0, 2, 4, 8, 16, 24, 32, 48, 56, 64, 80, 104, 128, 160, 208, 256 0, 2, 4, 8, 16, 32, 48, 64, 80, 112, 160, 208, 256 0, 4, 8, 16, 32, 48, 64, 96, 144, 208, 256 0, 4, 16, 32, 64, 256
Levels table:
1.1875, 1.6835938, 2.375, 3.3671875, 4.75, 6.734375, 9.5, 13.46875, 19.0, 26.9375, 38.0, 53.875, 76.0, 107.75, 152.0, 215.5, 304.0, 431.0, 608.0, 862.0, 1216.0, 1724.0, 2432.0, 3448.0, 4864.0, 6896.0, 9728.0, 13792.0, 19456.0, 27584.0, 38912.0, 55168.0, 77824.0, 110336.0, 155648.0, 220672.0, 311296.0, 441344.0, 622592.0, 882688.0, 1245184.0, 1765376.0, 2490368.0, 3530752.0, 4980736.0, 7061504.0
Frequencies for the trees
Table 0 — noise value
3233, 1195, 1897, 877, 1240, 368, 364, 222, 103, 125, 18, 68, 10, 25, 7, 13, 0, 18, 0, 20, 0, 31, 0, 28, 0, 31, 0, 19, 0, 23, 0, 10, 0, 9, 0, 1,
Table 1 — noise segment length
7647, 1011, 380, 215, 180, 65, 33, 12, 4, 0, 0, 0, 16, 0, 0, 0, 84
Table 2 — amplitude
2436, 1411, 692, 389, 316, 310, 368, 457, 651, 1359, 2563, 4732, 8946, 17150, 29621, 44245, 50156, 45928, 33262, 20474, 9855, 3813, 1378, 514, 154, 82, 3
Table 3 — frequency difference
57884, 27424, 14988, 11027, 17889, 14609, 11790, 9479, 15948, 11581, 7815, 6917, 10486, 6603, 4897, 3983, 5120, 3479, 2949, 2626, 3443, 2984, 3725, 3593, 3307, 3283, 2954, 2384, 1777, 2042, 1641, 798, 769, 863, 776, 239, 162, 104, 63, 43, 49, 30, 6, 1, 0, 4, 1
Table 4 — amplitude difference tree
8392, 14998, 5103, 1797, 648, 237, 42, 7
Table 5 — phase difference tree
8860, 9620, 2138, 897, 618, 834, 1920, 6337
Converting frequencies into codes is left as an exercise to the reader.
QDesign Music Codec v2
This codec organises data into several sub-packets. All packets and sub-packets have the same header: ID byte and size. Top bit of ID byte set signals that packet size is stored as 16-bit little-endian word, otherwise it's just single byte.
There are several possible packet types:
- 2 - intra packet with checksum being present
- 3 - intra packet without checksum
- 4 and 5 - inter packet with checksum
- 6 and 7 - inter packet without checksum
Checksum is two bytes stored right after packet header (let's call them A
and B
). Checksum is selected in such way that A*257 + B*2 - sum of all frame bytes = 0
.
After the packet header and optional checksum sub-packets follow with subpacket ID = 0 and size = 0 signalling end of data.
Subpacket type 9
This packet contains coarse quantisers used in QMF part for bands 1-N (where N is 3-10 depending on frame samples and bitrate). Each band has 8 quantisers coded in the same way as noise scales in QDMC.
Subpacket type 10
This packet contains coarse quantisers used in QMF part for band 0 and refining quantisers for other bands.
Refining quantisers are stored in the following order:
for each coded QMF subband + 1 { for each channel { for i in 0..8 { if get_bit() { read eight Huffman-coded grid 1 quants } else { set eight grid 1 quants to zero } } } } for each coded QMF subband starting from 4 { for each channel { read Huffman-coded grid 3 average quant } } for coded QMF subband starting from 4 { for each channel { for i in 0..8 { read Huffman-coded grid 3 quant } } }
Subpacket type 11
This subpacket contains some 13-bit value related to bit allocation and coefficient data for low eight QMF sub-bands. Sub-band data is coded in the following way depending on bit allocation:
no_noise = get_bit(); idx = 0; while idx < 128 { switch quant_weight[ch][sub_band][idx / 2] { case 8: // 0.8 bits per sample if !no_noise { val = get_bits(8); unpack val as five values packed modulo 3 (i.e. samp[0] * 81 + samp[1] * 27 +...) use them as indices in QUANT_1BIT[is_jstereo][] } else { for each sample { if get_bit() { sample = QUANT_1BIT[is_jstereo][get_bit() * 2]; } else { sample = 0; } } } interleave five decoded samples with pseudo-random noise idx += 10; break; case 10: // 1 bit per sample get scale sign, modify scale depending on band, output sample idx++; break; case 16: // 1.6 bits per sample the same as case 8 but no interleaving with noise idx += 5; break; case 24: // 2.4 bits per sample read 7-bit value, unpack it into three modulo five indices output three samples idx += 3; break; case 30: // 3 bits per sample read Huffman-coded sample idx++; break; case 34: // 3.4 bits per sample if first sample of such kind in band { scale = 1.0 / (1 << get_bits(2)); read 5 bits for first code value, use sample value as the predictor } else { read Huffman-coded difference, scale it and add to the predictor output new sample } idx++; break; } }
Subpacket type 12
This subpacket contains coefficient data for all but first eight QMF sub-bands.
Subpacket types 13 and 14
These subpackets contain six scales used to modify tonal amplitudes depending on frequency. Type 13 subpacket stores them as 6-bit values, type 14 subpacket has them Huffman-coded.
Subpacket type 15
This subpacket seem to contain information for four super-tones spanning four subframes each. Coding method for them is different from other tones.
Subpacket types 17-23
These packets contain tones for single groups. Tones are coded almost the same as wave data in QDMC but with special cases for frequency differences being 0 or 1 (then position should be advanced by one or eight group sizes instead).
Subpacket type 31
This subpacket contains data for all five tonal groups.
Subpacket types 33-39
The same as types 17-23 but using different codebook for tone amplitudes.
Subpacket type 46
This subpacket contains six tone envelope values coded with six bits each (like type 13 subpacket) plus tones for all five subgroups using an alternative level codebook.
Reconstruction
Reconstruction is performed as following:
for each subframe { if subpacket 9 is present { if subpackets 10-12 are not present { fill subbands with noise using scales from subpacket 9 } perform QMF on 8 sets of 32 sub-band samples for current subframe if subframe length is less than 128 samples take and add even second or fourth sample otherwise add all samples to the output } if subframe number is less than two then use tones from previous frame (coded for subpackets 14 and 15 correspondingly) otherwise use tones for current subframe N-2 add tones still active from the previous subframes add delay from the previous subframe to the audio output perform inverse RDFT and add half of its output to the audio output (src[0].re, src[0].im, src[1].re, ...) save second part of iRDFT output as the delay for the next frame }