WavPack

From MultimediaWiki
Jump to: navigation, search

WavPack is an open source lossless audio coding algorithm with floating point data support and optional lossy audio compression.

WavPack v.4

File Format

General details of WavPack format can be found in file 'format.txt' in wavpack sources archive. WavPack file consists of blocks each beginning with 'wvpk'. Every block contains all information about sound data - sampling rate, channels, bits per sample, etc. and so-called metadata. Metadata may contain different coefficients using for restoring samples, correction bitstream and actual compressed samples.

Block structure

Each block contains compressed data

Block header (all data is stored in little-endian words)

 4 bytes - 'wvpk'
 32 bits - total block size (not counting this field or 'wvpk')
 16 bits - version (current valid versions are 0x402 - 0x410)
 8  bits - track number (not currently implemented)
 8  bits - track sub index (not currently implemented)
 32 bits - total samples in file (may be 0xFFFFFFFF if unknown)
 32 bits - offset in samples for current block (i.e. how many samples should be decoded by now)
 32 bits - samples in this block (may be 0 if no audio present)
 32 bits - flags
 32 bits - CRC

Flags meaning:

 bits  0- 1 - bytes per sample minus one
 bit      2 - sound is monaural
 bit      3 - hybrid profile (lossy compression)
 bit      4 - joint stereo coding scheme
 bit      5 - cross-decorrelation scheme is used
 bit      6 - shaping for hybrid profile is present
 bit      7 - floating point data present
 bit      8 - int32 mode
 bits  9-10 - hybrid profile flags
 bits 11-12 - multi-channel start and end blocks
 bits 13-17 - left-shift places when bitdepth is not a multiple of 8 (e.g. 12-bit, 20-bit)
 bits 18-22 - maximum magnitude of decoded data (can be used to optimize decoding arithmetic)
 bits 23-26 - sampling rate index (15 = unknown/custom)
 bit     27 - reserved (okay to ignore if encountered)
 bit     28 - robust block (experimental, okay to ignore if encountered)
 bit     29 - IIR filter for negative noise shaping in hybrid mode
 bit     30 - false stereo (stream is stereo but this block's data is mono, version >= 0x410)
 bit     31 - low-latency block (experimental, do not decode if encountered)

Metadata

Metadata can be divided into three parts: ID, length and data. Every metadata block has even length and data size is stored in words in either one or three bytes depending on ID flag

Flags for ID:

 0x20 - decoder may ignore data contained here
 0x40 - data size is odd
 0x80 - data size is large

IDs:

 * 0x00 - dummy (used for padding)
 * 0x02 - decorrelation terms
 * 0x03 - decorrelation weights
 * 0x04 - decorrelation samples
 * 0x05 - entropy info
 * 0x06 - hybrid profile
 * 0x07 - noise shaping profile (wvc file)
 * 0x08 - floating-point data profile
 * 0x09 - large or shifted integer profile
 * 0x0A - packed samples
 * 0x0B - packed correction data (wvc file)
 * 0x0C - packed overflow bits from floating-point or large integers
 * 0x0D - multichannel information (including Microsoft channel mask)
 * 0x20 - RIFF header for .wav files (before audio)
 * 0x21 - RIFF trailer for .wav files (after audio)
 * 0x25 - some encoding details for info purposes
 * 0x26 - 16-byte MD5 sum of raw audio data
 * 0x27 - non-standard sampling rate

Decorrelation terms

Decorrelation terms are stored in one byte, lower 5 bits indicate predictor type, high 3 bits contain delta value.

Possible predictor values:

 0-5 - predictors for stereo, only predictors 2-4 are implemented
 6-12 - predictor uses 1-7 samples for prediction
 13-16 - reserved
 17-18 - predictor does prediction by two samples

Decorrelation weights

Each decorrelation term should have one or two weights depending on channels. Each weight is packed into one byte and can be restored in this way:

 n = getchar() << 3;
 if(n > 0) n += (n + 64) >> 7;

Decorrelation samples

Each decorrelation term may have up to 16 samples depending on its value. Each sample is 32-bit but stored in 16 bits, lower 8 bits are mantiss and high 8 bits are exponent-9, i.e if exponent < 9 shift mantiss right, otherwise left

Entropy info

This section contains one or two sets of medians for samples decoding. Each median is log-packed into 16 bits as described above.

Samples coding

Samples are stored in metadata block with ID=0x0A and are packed with modified Golomb codes. Decoding process is specified below where get_unary() is the function which returns length of '1'-bits string (i.e. 111110b = 5, 10b = 1). Codeset is adaptively divided into four sets and every code has unary prefix (possibly escaped) defining interval of this code and mantis part like in Golomb code.

 if(last_zero){
   n = 0;
   last_zero = 0;
 }else{
   n = get_unary();
   if(n == 16){
     n2 = get_unary();
     if(n2 < 2) n += n2;
     else n += (1 << (n2-1)) | getbits(n2-1);
   }
   last_one = n & 1;
   if(last_one)
     n = (n>>1) + 1;
   else
     n = n >> 1;
   last_zero = !last_one;
 }
 if(n == 0){
   base = 0;
   add = median[0] - 1;
   decrease median[0];
 } else if(n == 1){
   base = median[0];
   add = median[1] - 1;
   increase median[0];
   decrease median[1];
 } else {
   base = median[0] + median[1] + median[2] * (n - 2);
   add = median[2] - 1;
   increase median[0];
   increase median[1];
   if(n == 2) derease median[2];
   else increase median[2];
 }
 k = log2(add);
 ex = (1 << k) - add - 1;
 t2 = getbits(k - 1);
 if(t2 >= ex)
   t2 = t2 * 2 - ex + getbit();
 sign = getbit();
 if(sign==0) result = base + t2;
 else result = ~(base + t2);