WavPack: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
No edit summary
(More info)
Line 3: Line 3:
* Samples: http://samples.mplayerhq.hu/A-codecs/lossless/ (luckynight.wv)
* Samples: http://samples.mplayerhq.hu/A-codecs/lossless/ (luckynight.wv)


WavPack is an open source lossless audio coding algorithm.
WavPack is an open source lossless audio coding algorithm with floating point data support and optional lossy audio compression.


== WavPack v.4 ==
== WavPack v.4 ==
Line 10: Line 10:
WavPack file consists of blocks each beginning with 'wvpk'. Every block contains all information about sound data - sampling rate, channels, bits per sample, etc.
WavPack file consists of blocks each beginning with 'wvpk'. Every block contains all information about sound data - sampling rate, channels, bits per sample, etc.
and so-called metadata. Metadata may contain different coefficients using for restoring samples, correction bitstream and actual compressed samples.
and so-called metadata. Metadata may contain different coefficients using for restoring samples, correction bitstream and actual compressed samples.
=== Block structure ===
Each block contains compressed data
Block header (all data is stored in little-endian words)
  4 bytes - 'wvpk'
  32 bits - block size
  16 bits  - version
  8  bits - track number
  8  bits - track sub index
  32 bits - total samples in file (may be 0xFFFFFFFF)
  32 bits - offset in samples for current block (i.e. how much samples should be decoded by now)
  32 bits - samples in this block
  32 bits - flags
  32 bits - CRC
Flags meaning:
  bits  0- 1 - bytes per sample minus one
  bit      2 - sound is monaural
  bit      3 - hybrid profile (lossy compression)
  bit      4 - joint stereo coding scheme
  bit      5 - cross-decorrelation scheme is used
  bit      6 - shaping for hybrid profile is present
  bit      7 - floating point data present
  bit      8 - int32 mode
  bits  9-10 - hybrid profile flags
  bits 11-12 - multi-channel start and end blocks
  bits 13-17 - shift parameter?
  bits 18-22 - scaling parameter?
  bits 23-26 - sampling rate index
=== Metadata ===
Metadata can be divided into three parts: ID, length and data. Every metadata block has even length and data size is stored in words in either one or three bytes depending on ID flag
Flags for ID:
  0x20 - decoder may ignore data contained here
  0x40 - data size is odd
  0x80 - data size is large
IDs:
  * 0x01 - encoder info
  * 0x02 - decorrelation terms
  * 0x03 - decorrelation weights
  * 0x04 - decorrelation samples
  * 0x05 - entropy info
  * 0x0A - packed samples
=== Decorrelation terms ===
**TODO**
=== Decorrelation weights ===
Each decorrelation term should have one or two weights depending on channels.
Each weight is packed into one byte and can be restored in this way:
  n = getchar() << 3;
  if(n > 0) n += (n + 64) >> 7;
=== Decorrelation samples ===
Each decorrelation term may have up to 16 samples depending on its value. Each sample is 32-bit but stored in 16 bits, lower 8 bits are mantiss and high 8 bits are exponent-9, i.e if exponent < 9 shift mantiss right, otherwise left
=== Entropy info ===
This section contains one or two sets of medians for samples decoding. Each median is log-packed into 16 bits as described above.


=== Samples coding ===
=== Samples coding ===

Revision as of 22:20, 25 September 2006

WavPack is an open source lossless audio coding algorithm with floating point data support and optional lossy audio compression.

WavPack v.4

File Format

General details of WavPack format can be found in file 'format.txt' in wavpack sources archive. WavPack file consists of blocks each beginning with 'wvpk'. Every block contains all information about sound data - sampling rate, channels, bits per sample, etc. and so-called metadata. Metadata may contain different coefficients using for restoring samples, correction bitstream and actual compressed samples.

Block structure

Each block contains compressed data

Block header (all data is stored in little-endian words)

 4 bytes - 'wvpk'
 32 bits - block size
 16 bits  - version 
 8  bits - track number
 8  bits - track sub index
 32 bits - total samples in file (may be 0xFFFFFFFF)
 32 bits - offset in samples for current block (i.e. how much samples should be decoded by now)
 32 bits - samples in this block
 32 bits - flags
 32 bits - CRC

Flags meaning:

 bits  0- 1 - bytes per sample minus one
 bit      2 - sound is monaural
 bit      3 - hybrid profile (lossy compression)
 bit      4 - joint stereo coding scheme
 bit      5 - cross-decorrelation scheme is used
 bit      6 - shaping for hybrid profile is present
 bit      7 - floating point data present
 bit      8 - int32 mode
 bits  9-10 - hybrid profile flags
 bits 11-12 - multi-channel start and end blocks
 bits 13-17 - shift parameter?
 bits 18-22 - scaling parameter?
 bits 23-26 - sampling rate index

Metadata

Metadata can be divided into three parts: ID, length and data. Every metadata block has even length and data size is stored in words in either one or three bytes depending on ID flag

Flags for ID:

 0x20 - decoder may ignore data contained here
 0x40 - data size is odd
 0x80 - data size is large

IDs:

 * 0x01 - encoder info
 * 0x02 - decorrelation terms
 * 0x03 - decorrelation weights
 * 0x04 - decorrelation samples
 * 0x05 - entropy info
 * 0x0A - packed samples

Decorrelation terms

    • TODO**

Decorrelation weights

Each decorrelation term should have one or two weights depending on channels. Each weight is packed into one byte and can be restored in this way:

 n = getchar() << 3;
 if(n > 0) n += (n + 64) >> 7;

Decorrelation samples

Each decorrelation term may have up to 16 samples depending on its value. Each sample is 32-bit but stored in 16 bits, lower 8 bits are mantiss and high 8 bits are exponent-9, i.e if exponent < 9 shift mantiss right, otherwise left

Entropy info

This section contains one or two sets of medians for samples decoding. Each median is log-packed into 16 bits as described above.

Samples coding

Samples are stored in metadata block with ID=0x0A and are packed with modified Golomb codes. Decoding process is specified below where get_unary() is the function which returns length of '1'-bits string (i.e. 111110b = 5, 10b = 1). Codeset is adaptively divided into four sets and every code has unary prefix (possibly escaped) defining interval of this code and mantis part like in Golomb code.

 if(last_zero){
   n = 0;
   last_zero = 0;
 }else{
   n = get_unary();
   if(n == 16){
     n2 = get_unary();
     if(n2 < 2) n += n2;
     else n += (1 << (n2-1)) | getbits(n2-1);
   }
   last_one = n & 1;
   if(last_one)
     n = (n>>1) + 1;
   else
     n = n >> 1;
   last_zero = !last_one;
 }
 if(n == 0){
   base = 0;
   add = median[0] - 1;
   decrease median[0];
 } else if(n == 1){
   base = median[0];
   add = median[1] - 1;
   increase median[0];
   decrease median[1];
 } else {
   base = median[0] + median[1] + median[2] * (n - 2);
   add = median[2] - 1;
   increase median[0];
   increase median[1];
   if(n == 2) derease median[2];
   else increase median[2];
 }
 k = log2(add);
 ex = (1 << k) - add - 1;
 t2 = getbits(k - 1);
 if(t2 >= ex)
   t2 = t2 * 2 - ex + getbit();
 sign = getbit();
 if(sign==0) result = base + t2;
 else result = ~(base + t2);