CRI ADX ADPCM

From MultimediaWiki
Jump to: navigation, search


CRI ADX ADPCM is an ADPCM format used in CRI's popular suite of middleware, used on most optical disc game consoles since the Sega Saturn. It is most often seen in the eponymous ADX file container. Sometimes it is packaged in Sofdec files along with MPEG video data. It is also seen packaged in Sega FILM files on some Sega Saturn games. The container format specifies the following parameters for the decoder:

  • the playback frequency of the audio data
  • whether the audio is monaural or stereo
  • cutoff frequency for the predictor (always 500 Hz in practice)
  • frame size (always 18 bytes in practice)
  • whether there is encryption present


Coding specifics

Audio Data

Audio data consists of 18 byte frames, each decoding to 32 samples. Each frame starts with a 2 byte header (a 16 bit big-endian integer) that specifies the scale for all the nibbles in the frame. The remaining bytes are each two samples, one per nibble, high nibble first. Each nibble, scaled linearly by the scale for the current frame, is a correction to apply to the output of a fixed 2nd order linear predictor, using the last two decoded samples.

Each channel is independently decoded (with the exception of decryption, discussed below).

Coefficients

The linear predictor coefficients are not stored with the data, but they can be calculated from the sample rate and the cutoff frequency as follows:

/* output, the coefficients as they will be used by the decoder */
int16_t coef1, coef2;

/* input variables */
double cutoff, sample_rate;

/* temps to keep the calculation simple */
double z,a,b,c;

z = cos(2.0*M_PI*cutoff/sample_rate);

a = M_SQRT2-z;
b = M_SQRT2-1.0;
c = (a-sqrt((a+b)*(a-b)))/b;

/* compute the coefficients as fixed point values, with 12 fractional bits */
coef1 = floor(c*8192);
coef2 = floor(c*c*-4096);

A game will often have the coefficients hard-coded to fixed values for faster multiplication. Reverse-engineered decoders often used the fixed coefficients 0x7298 and 0x3350, presumably based on the disassembly of a game using a 44100 Hz sample rate.

example decoder

/* input: the data for this frame */
uint8_t data[18];

/* input: the filter coefficients, precomputed for the whole stream, see above */
int16_t coef1, coef2;
/* input/output: last two samples, kept between frames within a channel*/
int16_t hist1,hist2;
/* first two bytes are scale, a 16 bit big endian value < 0x2000 */

/* the +1 becomes important on quiet ADXs */
int scale = get_16bitBE(data) + 1;

int i;
/* 32 samples per frame */
for (i=0; i<32; i++) {

   /* this byte contains nibbles for two samples */
   int sample_byte = data[2+i/2];

   /* Even samples use the high nibble. */
   /* Each nibble is a 4 bit two's complement signed value. */
   int sample_nibble = (i&1?
       get_low_nibble_signed(sample_byte):
       get_high_nibble_signed(sample_byte)
   );

   /* Scale the nibble to determine how much this sample differs from the prediction. */
   int sample_delta = sample_nibble * scale;

   /* Compute the predicted sample (coefficients are 12 bit fixed point) */
   int predicted_sample_12 = coef1 * hist1 + coef2 * hist2;

   /* Convert back to an integer, truncate */
   int predicted_sample = predicted_sample_12 >> 12;

   /* Correct the prediction */
   int sample_raw = predicted_sample + sample_delta;

   /* Force into the 16 bit signed integer range */
   int16_t sample = clamp16(sample_raw);

   /* Update the histories */
   hist2 = hist1;
   hist1 = sample;
}

Encryption

When the encryption flag byte in the container is 08, rather than 00, the file is encrypted. ADX supports a very basic encryption, wherein the output of a linear congruential random number generator is XORed with each frame's scale. This encryption is removed in the same way as it is applied, for instance:

/* the key components */
uint16_t start, mult, add;

/* let's say that we have all the scales in an array
   (in the order they appear in the file, i.e. with left and right channels interleaved) */
uint16_t scales[SCALE_COUNT];

/* initial XOR value */
uint16_t xor = start;

int i;
for (i=0; i<SCALE_COUNT; i++) {
    scales[i] ^= xor;
    xor=(xor*mult+add)&0x7fff;
}

In practice the mult (multiplier) and add (increment) values are prime. This is likely due to an attempt to ensure that the LCG generates all 32768 possible outputs before repeating (one of the conditions for which is that the increment and modulus (32768) be relatively prime), however that alone is not sufficient and cycles as short as 32 have been seen. Additionally, as the valid range for scale values is only 0 to 0x1fff, the upper two bits of the pseudorandom stream are directly exposed, making check possible keys very inexpensive. Combined with the artificially limited keyspace, this makes determining the key for a given file (usually used for all files in a game) take only a few minutes of computation.