OptimFROG

From MultimediaWiki
Revision as of 10:14, 16 September 2020 by Kostya (talk | contribs) (fill some details about the format)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

OptimFROG is a lossless audio coding algorithm employing multi-layer adaptive filter and range coding. There are two formats known: 4.2alpha and the current one.

Adaptive filter differs from the conventional LMS filters by the fact that after certain amount of samples is decoded the filter is re-calculated.

Coefficient coding may use adaptive models. The models are usually selected using exponent of weighed energy of decoded coefficients (e.g. for old format it is energy_new = energy_old * 0.91700404 + coef * coef * 0.08299596, new format can set custom weights).

Old format

Old format starts with a 44-byte RIFF WAV header with first four bytes replaced with *RIF, then 32-bit number of coded samples and actual coded data follow.

This codec employs single adaptive filter with order up to 64. Residue is coded using a set of 32 adaptive models initialised with pre-defined frequencies.

New format

New format has a chunked format and supports a correction stream. Supported chunks:

  • OFR or OFRX - header
  • HEAD - WAV file header
  • COMP - compressed audio data
  • CORR - correction data for hybrid streams
  • TAIL - WAV file trailer

Header

All values are in little-endian format.

  • 4 bytes - header size (12 bytes for 4.5alpha, 15 bytes for older versions, 17 bytes for newer versions)
  • 6 bytes - number of samples (for all channels)
  • 1 byte - format ID (u8/s8, u16/s16, u24/s24, u32/s32, f32 in -1.0..1.0 range, f32 for 16-bit integers, f32 for 24-bit integers)
  • 1 byte - channel configuration (0 - mono, 1 - stereo)
  • 4 bytes - sample rate
  • 2 bytes - some packed information ((version - 4200) * 16 + something)
  • 1 byte - packed information (8 * method + speed). Known methods are fast, normal, high, extra, best, ultra, insane, highnew, extranew, bestnew, ultranew, extrafast, turbonew, fastnew, normalnew. Knows speeds are 1x/2x/4x.
  • 2 bytes - (version - 4500)

Compressed audio chunk

Compressed data begins with 4-byte unknown value (most probably CRC), 4-byte number of samples in the block, 1-byte format ID (same meaning as in the header), 1-byte channel ID (the same) and 2-byte block packing methods stored as reader_id << 11 | filter_id << 6 | output_mode_id.

Known reader_id values:

  • 1 - use a set of adaptive models to determine how many bits of value to read
  • 2 - use single adaptive model to determine how many bits of value to read
  • 3 - use a adaptive models to decode which model to use next for decoding amount of bits of the value

Known filter_id values are 1-4 that tell decoder which filters should be used for reconstructing. It seems to be several layers (minimum of two) of adaptive filters whose output if fed to the next layer. Remarkably filters use floating-point calculation instead of integer arithmetics.

Known output_mode_id values:

  • 1 - multiply output by a constant, add another constant, clip to constant range (all those parameters are transmitted)
  • 2 - remap input values to the output floating-point values

For all of these stages there are range-coded parameters that are transmitted before the actual data start. First it's the reader data, then filter data, and finally output mode data.