WavPack
- Extension: wv
- Website: http://www.wavpack.com/
- Samples: http://samples.mplayerhq.hu/A-codecs/lossless/ (luckynight.wv)
- Theory/Whitepaper: http://www.wavpack.com/WavPack.pdf
- FOURCC (unofficial): WVPK
WavPack is an open source lossless audio coding algorithm with floating point data support and optional lossy audio compression.
WavPack v.4
File Format
General details of WavPack format can be found in file 'format.txt' in wavpack sources archive. WavPack file consists of blocks each beginning with 'wvpk'. Every block contains all information about sound data - sampling rate, channels, bits per sample, etc. and so-called metadata. Metadata may contain different coefficients using for restoring samples, correction bitstream and actual compressed samples.
Block structure
Each block contains compressed data
Block header (all data is stored in little-endian words)
4 bytes - 'wvpk' 32 bits - total block size (not counting this field or 'wvpk') 16 bits - version (current valid versions are 0x402 - 0x410) 8 bits - track number (not currently implemented) 8 bits - track sub index (not currently implemented) 32 bits - total samples in file (may be 0xFFFFFFFF if unknown) 32 bits - offset in samples for current block (i.e. how many samples should be decoded by now) 32 bits - samples in this block (may be 0 if no audio present) 32 bits - flags 32 bits - CRC
Flags meaning:
bits 0- 1 - bytes per sample minus one bit 2 - sound is monaural bit 3 - hybrid profile (lossy compression) bit 4 - joint stereo coding scheme bit 5 - cross-decorrelation scheme is used bit 6 - shaping for hybrid profile is present bit 7 - floating point data present bit 8 - int32 mode bits 9-10 - hybrid profile flags bits 11-12 - multi-channel start and end blocks bits 13-17 - left-shift places when bitdepth is not a multiple of 8 (e.g. 12-bit, 20-bit) bits 18-22 - maximum magnitude of decoded data (can be used to optimize decoding arithmetic) bits 23-26 - sampling rate index (15 = unknown/custom) bit 27 - reserved (okay to ignore if encountered) bit 28 - robust block (experimental, okay to ignore if encountered) bit 29 - IIR filter for negative noise shaping in hybrid mode bit 30 - false stereo (stream is stereo but this block's data is mono, version >= 0x410) bit 31 - low-latency block (experimental, do not decode if encountered)
Metadata
Metadata can be divided into three parts: ID, length and data. Every metadata block has even length and data size is stored in words in either one or three bytes depending on ID flag
Flags for ID:
0x20 - decoder may ignore data contained here 0x40 - data size is odd 0x80 - data size is large
IDs:
* 0x00 - dummy (used for padding) * 0x02 - decorrelation terms * 0x03 - decorrelation weights * 0x04 - decorrelation samples * 0x05 - entropy info * 0x06 - hybrid profile * 0x07 - noise shaping profile (wvc file) * 0x08 - floating-point data profile * 0x09 - large or shifted integer profile * 0x0A - packed samples * 0x0B - packed correction data (wvc file) * 0x0C - packed overflow bits from floating-point or large integers * 0x0D - multichannel information (including Microsoft channel mask)
* 0x20 - RIFF header for .wav files (before audio) * 0x21 - RIFF trailer for .wav files (after audio) * 0x25 - some encoding details for info purposes * 0x26 - 16-byte MD5 sum of raw audio data * 0x27 - non-standard sampling rate
Decorrelation terms
Decorrelation terms are stored in one byte, lower 5 bits indicate predictor type, high 3 bits contain delta value.
Possible predictor values:
0-5 - predictors for stereo, only predictors 2-4 are implemented 6-12 - predictor uses 1-7 samples for prediction 13-16 - reserved 17-18 - predictor does prediction by two samples
Decorrelation weights
Each decorrelation term should have one or two weights depending on channels. Each weight is packed into one byte and can be restored in this way:
n = getchar() << 3; if(n > 0) n += (n + 64) >> 7;
Decorrelation samples
Each decorrelation term may have up to 16 samples depending on its value. Each sample is 32-bit but stored in 16 bits, lower 8 bits are mantiss and high 8 bits are exponent-9, i.e if exponent < 9 shift mantiss right, otherwise left
Entropy info
This section contains one or two sets of medians for samples decoding. Each median is log-packed into 16 bits as described above.
Samples coding
Samples are stored in metadata block with ID=0x0A and are packed with modified Golomb codes. Decoding process is specified below where get_unary() is the function which returns length of '1'-bits string (i.e. 111110b = 5, 10b = 1). Codeset is adaptively divided into four sets and every code has unary prefix (possibly escaped) defining interval of this code and mantis part like in Golomb code.
if(last_zero){ n = 0; last_zero = 0; }else{ n = get_unary(); if(n == 16){ n2 = get_unary(); if(n2 < 2) n += n2; else n += (1 << (n2-1)) | getbits(n2-1); } last_one = n & 1; if(last_one) n = (n>>1) + 1; else n = n >> 1; last_zero = !last_one; } if(n == 0){ base = 0; add = median[0] - 1; decrease median[0]; } else if(n == 1){ base = median[0]; add = median[1] - 1; increase median[0]; decrease median[1]; } else { base = median[0] + median[1] + median[2] * (n - 2); add = median[2] - 1; increase median[0]; increase median[1]; if(n == 2) derease median[2]; else increase median[2]; } k = log2(add); ex = (1 << k) - add - 1; t2 = getbits(k - 1); if(t2 >= ex) t2 = t2 * 2 - ex + getbit(); sign = getbit(); if(sign==0) result = base + t2; else result = ~(base + t2);