Maxis XA: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
(Add to formats missing in FFmpeg category.)
(link to samples)
Line 1: Line 1:
*Extension: XA
* Extension: XA
*Company: [[Maxis]]
* Company: Maxis
* Samples: http://samples.mplayerhq.hu/game-formats/maxis-xa/


== Credit ==
== Credit ==

Revision as of 15:24, 2 April 2008

Credit

The information on this page was originally based on one of the many format documents written up by Valery V. Anisimovsky, available on http://wotsit.org/ and many other sites across the internet.


XA File Header

The XA file has the following header:

struct XAHeader
{
 char	szID[4];
 DWORD dwOutSize;
 WORD	wTag;
 WORD	wChannels;
 DWORD dwSampleRate;
 DWORD dwAvgByteRate;
 WORD	wAlign;
 WORD	wBits;
};

szID -- string ID, which is equal to "XAI\0" (sound/speech) or "XAJ\0" (music).

dwOutSize -- the output size of the audio stream stored in the file (in bytes).

wTag -- seems to be PCM waveformat tag (0x0001). This corresponds to the (decompressed) output audio stream, of course.

wChannels -- number of channels for the file.

dwSampleRate -- sample rate for the file.

dwAvgByteRate -- average byte rate for the file (equal to (dwSampleRate)*(wAlign)). Note that this also corresponds to the decompressed output audio stream.

wAlign -- the sample align value for the file (equal to (wBits/8)*(wChannels)). Again, this corresponds to the decompressed output audio stream.

wBits -- resolution of the file (8 (8-bit), 16 (16-bit), etc.).

Note that the part of the header from (wTag) until (wBits) is really WAVEFORMATEX structure (the contents of PCM .WAV fmt chunk).

XA File Data

Right after the XA header comes the compressed audio stream. The compression algorithm used is EA ADPCM (see below). Music files in SimCity3000 are stereo 22050 Hz 16-bit, and speech/sfx are mono 22050 Hz 16-bit.

EA ADPCM Decompression Algorithm

During the decompression four LONG variables must be maintained for stereo stream: lCurSampleLeft, lCurSampleRight, lPrevSampleLeft, lPrevSampleRight and two -- for mono stream: lCurSample, lPrevSample. At the beginning of the audio stream you must initialize these variables to zeros. Note that LONG here is signed.

The stream is divided into small blocks of 0x1E (stereo) or 0xF (mono) bytes. You should process all blocks in their turn. Here's the code which decompresses one stereo stream block.

BYTE  InputBuffer[InputBufferSize]; // buffer containing data for one block
BYTE  bInput;
DWORD i;
LONG  c1left,c2left,c1right,c2right,left,right;
BYTE  dleft,dright;

bInput=InputBuffer[0];
c1left=EATable[HINIBBLE(bInput)];   // predictor coeffs for left channel
c2left=EATable[HINIBBLE(bInput)+4];
dleft=LONIBBLE(bInput)+8;   // shift value for left channel

bInput=InputBuffer[1];
c1right=EATable[HINIBBLE(bInput)];  // predictor coeffs for right channel
c2right=EATable[HINIBBLE(bInput)+4];
dright=LONIBBLE(bInput)+8;  // shift value for right channel

for (i=2;i<0x1E;i+=2)
{
 left=HINIBBLE(InputBuffer[i]);  // HIGHER nibble for left channel
 left=(left<<0x1c)>>dleft;
 left=(left+lCurSampleLeft*c1left+lPrevSampleLeft*c2left+0x80)>>8;
 left=Clip16BitSample(left);
 lPrevSampleLeft=lCurSampleLeft;
 lCurSampleLeft=left;

 right=HINIBBLE(InputBuffer[i+1]); // HIGHER nibble for right channel
 right=(right<<0x1c)>>dright;
 right=(right+lCurSampleRight*c1right+lPrevSampleRight*c2right+0x80)>>8;
 right=Clip16BitSample(right);
 lPrevSampleRight=lCurSampleRight;
 lCurSampleRight=right;

 // Now we've got lCurSampleLeft and lCurSampleRight which form one stereo
 // sample and all is set for the next step...
 Output((SHORT)lCurSampleLeft,(SHORT)lCurSampleRight); // send the sample to output

 // now do just the same for LOWER nibbles...
 // note that nubbles for each channel are packed pairwise into one byte

 left=LONIBBLE(InputBuffer[i]);  // LOWER nibble for left channel
 left=(left<<0x1c)>>dleft;
 left=(left+lCurSampleLeft*c1left+lPrevSampleLeft*c2left+0x80)>>8;
 left=Clip16BitSample(left);
 lPrevSampleLeft=lCurSampleLeft;
 lCurSampleLeft=left;

 right=LONIBBLE(InputBuffer[i+1]); // LOWER nibble for right channel
 right=(right<<0x1c)>>dright;
 right=(right+lCurSampleRight*c1right+lPrevSampleRight*c2right+0x80)>>8;
 right=Clip16BitSample(right);
 lPrevSampleRight=lCurSampleRight;
 lCurSampleRight=right;

 // Now we've got lCurSampleLeft and lCurSampleRight which form one stereo
 // sample and all is set for the next step...
 Output((SHORT)lCurSampleLeft,(SHORT)lCurSampleRight); // send the sample to output
}

HINIBBLE and LONIBBLE are higher and lower 4-bit nibbles:

#define HINIBBLE(byte) ((byte) >> 4)
#define LONIBBLE(byte) ((byte) & 0x0F)

Note that depending on your compiler you may need to use additional nibble separation in these defines, e.g. (((byte) >> 4) & 0x0F).

EATable is the table given in the next section of this document.

Output() is just a placeholder for any action you would like to perform for decompressed sample value.

Clip16BitSample is quite evident:

LONG Clip16BitSample(LONG sample)
{
 if (sample>32767)
    return 32767;
 else if (sample<-32768)
    return (-32768);
 else
    return sample;
}

As to mono sound, it's just analoguous -- you should process the blocks each being 0xF bytes long:

bInput=InputBuffer[0];
c1=EATable[HINIBBLE(bInput)];	// predictor coeffs
c2=EATable[HINIBBLE(bInput)+4];
d=LONIBBLE(bInput)+8;  // shift value

for (i=1;i<0xF;i++)
{
 left=HINIBBLE(InputBuffer[i]);  // HIGHER nibble for left channel
 left=(left<<0x1c)>>dleft;
 left=(left+lCurSampleLeft*c1left+lPrevSampleLeft*c2left+0x80)>>8;
 left=Clip16BitSample(left);
 lPrevSampleLeft=lCurSampleLeft;
 lCurSampleLeft=left;

 // Now we've got lCurSampleLeft which is one mono sample and all is set
 // for the next input nibble...
 Output((SHORT)lCurSampleLeft); // send the sample to output

 left=LONIBBLE(InputBuffer[i]);  // LOWER nibble for left channel
 left=(left<<0x1c)>>dleft;
 left=(left+lCurSampleLeft*c1left+lPrevSampleLeft*c2left+0x80)>>8;
 left=Clip16BitSample(left);
 lPrevSampleLeft=lCurSampleLeft;
 lCurSampleLeft=left;

 // Now we've got lCurSampleLeft which is one mono sample and all is set
 // for the next input byte...
 Output((SHORT)lCurSampleLeft); // send the sample to output
}

So, you should process HIGHER nibble of the input byte first and then LOWER nibble for mono sound.

Of course, this decompression routine may be greatly optimized.

EA ADPCM Table

LONG EATable[]=
{
 0x00000000,
 0x000000F0,
 0x000001CC,
 0x00000188,
 0x00000000,
 0x00000000,
 0xFFFFFF30,
 0xFFFFFF24,
 0x00000000,
 0x00000001,
 0x00000003,
 0x00000004,
 0x00000007,
 0x00000008,
 0x0000000A,
 0x0000000B,
 0x00000000,
 0xFFFFFFFF,
 0xFFFFFFFD,
 0xFFFFFFFC
};

PC Games Using Maxis XA

Simcity 3000