Nelly Moser

From MultimediaWiki
Jump to navigation Jump to search

Speech codec found in older Flash Video files.

The Nellymoser Asao codec is a proprietary single-channel (mono) format optimized for low-bitrate transmission of audio. The format was developed by Nellymoser Inc.

Sound data is grouped into frames of 256 samples. Each frame is converted into the frequency domain and the most significant (highest-amplitude) frequencies are identified. A number of frequency bands are selected for encoding; the rest are discarded. The bitstream for each frame then encodes which frequency bands are in use and what their amplitudes are. This codec does not take into consideration actual samplerate, and has a fixed ratio between the amount of input samples and output packet size (2 bits per input sample).

Use in Flash technology

Nellymoser Asao codec is a format owned by a Nellymoser and it licensed for use in Flash technology to Macromedia/Adobe. The Nellymoser Asao codec is an integral part of the Flash-plugin since Flash version 6. The codec is optimized for realtime and low-latency encoding of audio. Adobe Flash Player clients, when recording audio from a user's microphone, use the Nellymoser Asao codec and do not allow Flash programmers to select any other codec. The sampling rate of the audio capture can be controlled by the Flash programmer to increase and decrease encoding bitrate and quality. Encoding is done on the client host, and compressed data is the sent using Adobe's RTMP protocol to an RTMP server (Flash Media Server, Red5, Wowza).

References

Forum entry with some information about the codec: http://www.actionscript.org/forums/showthread.php3?t=20430

Blog that lists some information: http://www.progettosinergia.com/flashvideo/flashvideoblog.htm

Encoding description

The voice encoding steps of the Nellymoser encoder can be described in 4 steps:

   1. Transformation:
      the original 256 audio samples are transformed in the frequency
      domain with a MDCT.
   2. Masking:
      the masking is applied on the frequency domain to reduce the number
      of the significant coefficients.
   3. Quantisation:
      a number of the most significant frequency coefficients are quantized.
   4. Compression:
      the coefficients are (a)dpcm encoded to reduce
      redundancies and to take advantage of low entropy. The binary data
      represented as stream or vectors (refer to MPEG-1 encoding) usually
      contain many consecutive zeros bits as a result of masking frequency
      domains and coefficient quantization.

The optimal encoding quality will be reached when masking (which is probably constant) and quantization (likely to be dynamic) parameters are adjusted in a way that the resulting compressed binary stream reduces the sampled input data by a factor of 8.

Nellymoser ASAO stream data format: The final compressed ASAO packet is always 64 bytes long. FLV audio Tags may contain 1,2 or 4 ASAO packets. Typically there are 20-40 audio tags per second. The FLV audio tag header is 13 bytes long.


Bitstream format

One nellymoser packet can be splitted into 3 parts. First there is a header and after that there are 2 payload blocks, the payload blocks share the parameters from the header.

Header Payload Payload


Each payload contains a mdct coded frame, each frame has 23 bands in the mdct domain, each mdct frame is 128 coeffs large, with the last 4 always set to 4. The following table describes the size of each band.


const uint8_t ff_nelly_band_sizes_table[NELLY_BANDS] = {
   2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8, 9, 10, 12, 14, 15
};

Header

The header containes the quantizer values for the 23 bands. They are dpcm coded. The first 6 bits is an index into a start value table.

const uint16_t ff_nelly_init_table[64] = {
  3134, 5342, 6870, 7792, 8569, 9185, 9744, 10191, 10631, 11061, 11434, 11770,
  12116, 12513, 12925, 13300, 13674, 14027, 14352, 14716, 15117, 15477, 15824,
  16157, 16513, 16804, 17090, 17401, 17679, 17948, 18238, 18520, 18764, 19078,
  19381, 19640, 19921, 20205, 20500, 20813, 21162, 21465, 21794, 22137, 22453,
  22756, 23067, 23350, 23636, 23926, 24227, 24521, 24819, 25107, 25414, 25730,
  26120, 26497, 26895, 27344, 27877, 28463, 29426, 31355
};

After that there are 22 5-bit indexes into a delta table.

const int16_t ff_nelly_delta_table[32] = {
   -11725, -9420, -7910, -6801, -5948, -5233, -4599, -4039, -3507, -3030, -2596,
   -2170, -1774, -1383, -1016, -660, -329, -1, 337, 696, 1085, 1512, 1962, 2433,
   2968, 3569, 4314, 5279, 6622, 8154, 10076, 12975
};

Payload block

Each payload block is 198 bits long.

Decoding loop:

 for (i = 0; i < 124; i++) {
   if (bits[i] <= 0){
     v = 1/sqrt(2) with randomly flipped sign;
   }else{
     v = get_bits(bits[i]);
     v = dequan_table[bits[i]][v];
   }
   coeffs[i] = v * -pow(2, band_scale[i] / 2048);
 }

bit allocation

Bit allocation algorithm finds out how many bits should be used by each coefficient (in 0..6 range).

Destination bit length are calculated by formula:

 bits[i] = (((sbuf[i] - offset) >> shift) + 1) >> 1;
 bits[i] = clip(bits[i], 0, 6);

sbuf is derived from band_scale, offset and shift are calculated by modifying initial values to achieve final bit allocation close to (and not exceeding) one payload block size, i.e. 198 bits.

dequantization tables

 for bits = 1:
   -0.8472560048, 0.7224709988,
 for bits = 2:
   -1.5247479677,-0.4531480074, 0.3753609955, 1.4717899561,
 for bits = 3:
   -1.9822579622,-1.1929379702,-0.5829370022,-0.0693780035, 0.3909569979, 0.9069200158, 1.4862740040, 2.2215409279,
 for bits = 4:
 -2.3887870312,-1.8067539930,-1.4105420113,-1.0773609877,-0.7995010018,-0.5558109879,-0.3334020078,-0.1324490011,
  0.0568020009, 0.2548770010, 0.4773550034, 0.7386850119, 1.0443060398, 1.3954459429, 1.8098750114, 2.3918759823,
 for bits = 5:
 -2.3893830776,-1.9884680510,-1.7514040470,-1.5643119812,-1.3922129869,-1.2164649963,-1.0469499826,-0.8905100226,
 -0.7645580173,-0.6454579830,-0.5259280205,-0.4059549868,-0.3029719889,-0.2096900046,-0.1239869967,-0.0479229987,
  0.0257730000, 0.1001340002, 0.1737180054, 0.2585540116, 0.3522900045, 0.4569880068, 0.5767750144, 0.7003160119,
  0.8425520062, 1.0093879700, 1.1821349859, 1.3534560204, 1.5320819616, 1.7332619429, 1.9722349644, 2.3978140354,
 for bits = 6:
 -2.5756309032,-2.0573320389,-1.8984919786,-1.7727810144,-1.6662600040,-1.5742180347,-1.4993319511,-1.4316639900,
 -1.3652280569,-1.3000990152,-1.2280930281,-1.1588579416,-1.0921250582,-1.0135740042,-0.9202849865,-0.8287050128,
 -0.7374889851,-0.6447759867,-0.5590940118,-0.4857139885,-0.4110319912,-0.3459700048,-0.2851159871,-0.2341620028,
 -0.1870580018,-0.1442500055,-0.1107169986,-0.0739680007,-0.0365610011,-0.0073290002, 0.0203610007, 0.0479039997,
  0.0751969963, 0.0980999991, 0.1220389977, 0.1458999962, 0.1694349945, 0.1970459968, 0.2252430022, 0.2556869984,
  0.2870100141, 0.3197099864, 0.3525829911, 0.3889069855, 0.4334920049, 0.4769459963, 0.5204820037, 0.5644530058,
  0.6122040153, 0.6685929894, 0.7341650128, 0.8032159805, 0.8784040213, 0.9566209912, 1.0397069454, 1.1293770075,
  1.2211159468, 1.3080279827, 1.4024800062, 1.5056819916, 1.6227730513, 1.7724959850, 1.9430880547, 2.2903931141