Nelly Moser
- Company: http://nellymoser.com/
- Samples: http://samples.mplayerhq.hu/A-codecs/Nelly_Moser/
- Decoder: http://code.google.com/p/nelly2pcm/
- Commercial lib with encoder: http://nellymoser.narod.ru/
Speech codec found in older Flash Video files.
The Nellymoser Asao codec is a proprietary single-channel (mono) format optimized for low-bitrate transmission of audio. The format was developed by Nellymoser Inc.
Sound data is grouped into frames of 256 samples. Each frame is converted into the frequency domain and the most significant (highest-amplitude) frequencies are identified. A number of frequency bands are selected for encoding; the rest are discarded. The bitstream for each frame then encodes which frequency bands are in use and what their amplitudes are. This codec does not take into consideration actual samplerate, and has a fixed ratio between the amount of input samples and output packet size (2 bits per input sample).
Use in Flash technology
Nellymoser Asao codec is a format owned by a Nellymoser and it licensed for use in Flash technology to Macromedia/Adobe. The Nellymoser Asao codec is an integral part of the Flash-plugin since Flash version 6. The codec is optimized for realtime and low-latency encoding of audio. Adobe Flash Player clients, when recording audio from a user's microphone, use the Nellymoser Asao codec and do not allow Flash programmers to select any other codec. The sampling rate of the audio capture can be controlled by the Flash programmer to increase and decrease encoding bitrate and quality. Encoding is done on the client host, and compressed data is the sent using Adobe's RTMP protocol to an RTMP server (Flash Media Server, Red5, Wowza).
References
Forum entry with some information about the codec: http://www.actionscript.org/forums/showthread.php3?t=20430
Blog that lists some information: http://www.progettosinergia.com/flashvideo/flashvideoblog.htm
Encoding description
The voice encoding steps of the Nellymoser encoder can be described in 4 steps:
1. Transformation: the original 256 audio samples are transformed in the frequency domain with a MDCT. 2. Masking: the masking is applied on the frequency domain to reduce the number of the significant coefficients. 3. Quantisation: a number of the most significant frequency coefficients are quantized. 4. Compression: the coefficients are (a)dpcm encoded to reduce redundancies and to take advantage of low entropy. The binary data represented as stream or vectors (refer to MPEG-1 encoding) usually contain many consecutive zeros bits as a result of masking frequency domains and coefficient quantization.
The optimal encoding quality will be reached when masking (which is probably constant) and quantization (likely to be dynamic) parameters are adjusted in a way that the resulting compressed binary stream reduces the sampled input data by a factor of 8.
Nellymoser ASAO stream data format: The final compressed ASAO packet is always 64 bytes long. FLV audio Tags may contain 1,2 or 4 ASAO packets. Typically there are 20-40 audio tags per second. The FLV audio tag header is 13 bytes long.
Bitstream format
One nellymoser packet can be splitted into 3 parts. First there is a header and after that there are 2 payload blocks, the payload blocks share the parameters from the header.
Header | Payload | Payload |
---|
Each payload contains a mdct coded frame, each frame has 23 bands in the mdct domain, each mdct frame is 128 coeffs large, with the last 4 always set to 4. The following table describes the size of each band.
const uint8_t ff_nelly_band_sizes_table[NELLY_BANDS] = { 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8, 9, 10, 12, 14, 15 };
Header
The header containes the quantizer values for the 23 bands. They are dpcm coded. The first 6 bits is an index into a start value table.
const uint16_t ff_nelly_init_table[64] = { 3134, 5342, 6870, 7792, 8569, 9185, 9744, 10191, 10631, 11061, 11434, 11770, 12116, 12513, 12925, 13300, 13674, 14027, 14352, 14716, 15117, 15477, 15824, 16157, 16513, 16804, 17090, 17401, 17679, 17948, 18238, 18520, 18764, 19078, 19381, 19640, 19921, 20205, 20500, 20813, 21162, 21465, 21794, 22137, 22453, 22756, 23067, 23350, 23636, 23926, 24227, 24521, 24819, 25107, 25414, 25730, 26120, 26497, 26895, 27344, 27877, 28463, 29426, 31355 };
After that there are 22 5-bit indexes into a delta table.
const int16_t ff_nelly_delta_table[32] = { -11725, -9420, -7910, -6801, -5948, -5233, -4599, -4039, -3507, -3030, -2596, -2170, -1774, -1383, -1016, -660, -329, -1, 337, 696, 1085, 1512, 1962, 2433, 2968, 3569, 4314, 5279, 6622, 8154, 10076, 12975 };
Payload block
Each payload block is 198 bits long.
Decoding loop:
for (i = 0; i < 124; i++) { if (bits[i] <= 0){ v = 1/sqrt(2) with randomly flipped sign; }else{ v = get_bits(bits[i]); v = dequan_table[bits[i]][v]; } coeffs[i] = v * -pow(2, band_scale[i] / 2048); }
bit allocation
Bit allocation algorithm finds out how many bits should be used by each coefficient (in 0..6 range).
Destination bit length are calculated by formula:
bits[i] = (((sbuf[i] - offset) >> shift) + 1) >> 1; bits[i] = clip(bits[i], 0, 6);
sbuf
is derived from band_scale
, offset
and shift
are calculated by modifying initial values to achieve final bit allocation close to (and not exceeding) one payload block size, i.e. 198 bits.
dequantization tables
for bits = 1: -0.8472560048, 0.7224709988,
for bits = 2: -1.5247479677,-0.4531480074, 0.3753609955, 1.4717899561,
for bits = 3: -1.9822579622,-1.1929379702,-0.5829370022,-0.0693780035, 0.3909569979, 0.9069200158, 1.4862740040, 2.2215409279,
for bits = 4: -2.3887870312,-1.8067539930,-1.4105420113,-1.0773609877,-0.7995010018,-0.5558109879,-0.3334020078,-0.1324490011, 0.0568020009, 0.2548770010, 0.4773550034, 0.7386850119, 1.0443060398, 1.3954459429, 1.8098750114, 2.3918759823,
for bits = 5: -2.3893830776,-1.9884680510,-1.7514040470,-1.5643119812,-1.3922129869,-1.2164649963,-1.0469499826,-0.8905100226, -0.7645580173,-0.6454579830,-0.5259280205,-0.4059549868,-0.3029719889,-0.2096900046,-0.1239869967,-0.0479229987, 0.0257730000, 0.1001340002, 0.1737180054, 0.2585540116, 0.3522900045, 0.4569880068, 0.5767750144, 0.7003160119, 0.8425520062, 1.0093879700, 1.1821349859, 1.3534560204, 1.5320819616, 1.7332619429, 1.9722349644, 2.3978140354,
for bits = 6: -2.5756309032,-2.0573320389,-1.8984919786,-1.7727810144,-1.6662600040,-1.5742180347,-1.4993319511,-1.4316639900, -1.3652280569,-1.3000990152,-1.2280930281,-1.1588579416,-1.0921250582,-1.0135740042,-0.9202849865,-0.8287050128, -0.7374889851,-0.6447759867,-0.5590940118,-0.4857139885,-0.4110319912,-0.3459700048,-0.2851159871,-0.2341620028, -0.1870580018,-0.1442500055,-0.1107169986,-0.0739680007,-0.0365610011,-0.0073290002, 0.0203610007, 0.0479039997, 0.0751969963, 0.0980999991, 0.1220389977, 0.1458999962, 0.1694349945, 0.1970459968, 0.2252430022, 0.2556869984, 0.2870100141, 0.3197099864, 0.3525829911, 0.3889069855, 0.4334920049, 0.4769459963, 0.5204820037, 0.5644530058, 0.6122040153, 0.6685929894, 0.7341650128, 0.8032159805, 0.8784040213, 0.9566209912, 1.0397069454, 1.1293770075, 1.2211159468, 1.3080279827, 1.4024800062, 1.5056819916, 1.6227730513, 1.7724959850, 1.9430880547, 2.2903931141