Difference between revisions of "AMR-NB-WIP"

From MultimediaWiki
Jump to navigation Jump to search
Line 212: Line 212:
 
==== 12.2kbps mode - 1/6 resolution pitch lag ====
 
==== 12.2kbps mode - 1/6 resolution pitch lag ====
  
* indices give integer and fractional parts of the pitch lag
+
===== Indices give integer and fractional parts of the pitch lag =====
  
 
In the first and third subframes, a fractional pitch delay is used with
 
In the first and third subframes, a fractional pitch delay is used with
Line 218: Line 218:
 
[95, 143].
 
[95, 143].
  
The lower bound of the pitch lag is 17.5 and the fractional part is in 1/6
+
The lower bound of the pitch lag is 17 3/6 and the fractional part is in 1/6
 
resolution, so the pitch index is given by:
 
resolution, so the pitch index is given by:
  
pitch index = integer part*6 + fractional part -17.5*6
+
pitch index = integer part*6 + fractional part -(17 3/6)*6
  
 
so theoretically...
 
so theoretically...
  
 +
<code>
 
if(pitch_index < (94 4/6 - 17 3/6)*6)
 
if(pitch_index < (94 4/6 - 17 3/6)*6)
 
   // fractional part is encoded
 
   // fractional part is encoded
Line 238: Line 239:
 
   pitch_lag_int  = pitch_index - 368;
 
   pitch_lag_int  = pitch_index - 368;
 
   pitch_lag_frac = 0;
 
   pitch_lag_frac = 0;
 +
</code>
  
 
I have not yet discovered the meaning of 368 (368/6 = 61 2/6)
 
I have not yet discovered the meaning of 368 (368/6 = 61 2/6)
 +
Q: What is the meaning of 368?
  
 
For the second and fourth subframes, a pitch resolution of 1/6 is always used in
 
For the second and fourth subframes, a pitch resolution of 1/6 is always used in
Line 245: Line 248:
 
pitch lag of the previous (1st or 3rd) subframe, bounded by 18...143.
 
pitch lag of the previous (1st or 3rd) subframe, bounded by 18...143.
  
 +
Q: What search is being conducted with the following ranges?
 +
 +
<code>
 
// find the search range
 
// find the search range
 
search_range_min = max(pitch_lag_int - 5, 18);
 
search_range_min = max(pitch_lag_int - 5, 18);
(only subtract 5 because of the above mentioned rounding?)
+
</code>
  
 +
Q: Why only subtract 5 rather than 5 3/6? Because of the above mentioned rounding?
 +
 +
<code>
 
search_range_max = search_range_min + 9;
 
search_range_max = search_range_min + 9;
 
if(search_range_max > 143) {
 
if(search_range_max > 143) {
Line 254: Line 263:
 
   search_range_min = search_range_max - 9;
 
   search_range_min = search_range_max - 9;
 
}
 
}
(only add/subtract 9 because of the above mentioned rounding?)
+
</code>
 +
Q: Why only add/subtract 9 instead of 10? Because of the above mentioned rounding?
  
 +
<code>
 
// calculate the pitch lag
 
// calculate the pitch lag
 
pitch_lag_int  = (pitch_index + 5) + search_range_min - 1;
 
pitch_lag_int  = (pitch_index + 5) + search_range_min - 1;
 
pitch_lag_frac = -2;
 
pitch_lag_frac = -2;
 +
</code>
 +
Q: Why?
  
 
The pitch delay is encoded with 9 bits in the first and third subframes and the
 
The pitch delay is encoded with 9 bits in the first and third subframes and the
Line 264: Line 277:
  
  
 
+
=== Decoding of the algebraic (innovative or fixed) codebook vector ===
Decoding of the algebraic (innovative or fixed) codebook vector
 
...............................................................
 
  
 
The parsed algebraic codebook index is used to find the positions and amplitudes (signs) of the excitation pulses and to find
 
The parsed algebraic codebook index is used to find the positions and amplitudes (signs) of the excitation pulses and to find
Line 276: Line 287:
  
  
 +
=== Decoding of the adaptive and fixed codebook gains ===
  
 +
==== 12.2kbps and 7.95kbps - scalar quantised gains ====
  
Decoding of the adaptive and fixed codebook gains
 
.................................................
 
 
12.2kbps and 7.95kbps - scalar quantised gains:
 
 
The received indices are used to find the adaptive codebook gain, ^g_p, and the algebraic codebook gc factor, ^γ_gc (gc for
 
The received indices are used to find the adaptive codebook gain, ^g_p, and the algebraic codebook gc factor, ^γ_gc (gc for
 
gain correction), from the corresponding quantisation tables.
 
gain correction), from the corresponding quantisation tables.
  
Other modes - vector quantised gains:
+
==== Other modes - vector quantised gains ====
 +
 
 
The received index gives both the adaptive codebook gain, ^g_p, and the algebraic codebook gc factor, ^γ_gc.
 
The received index gives both the adaptive codebook gain, ^g_p, and the algebraic codebook gc factor, ^γ_gc.
 
The estimated algebraic codebook gain gc′ is found as described in clause 5.7.
 
The estimated algebraic codebook gain gc′ is found as described in clause 5.7.
  
  
 +
=== Smoothing of the fixed codebook gain ===
  
 +
==== 10.2, 6.70, 5.90, 5.15, 4.75 kbit/s modes ====
  
Smoothing of the fixed codebook gain
+
Adaptive smoothing of fixed codebook gain. (c.f. §6.1 part 4)
....................................
 
  
10.2, 6.70, 5.90, 5.15, 4.75 kbit/s modes
 
  
Adaptive smoothing of fixed codebook gain. 6.1 4)
+
=== Anti-sparseness processing ===
  
 
+
==== 7.95, 6.70, 5.90, 5.15, 4.75 kbit/s modes ====
 
 
 
 
Anti-sparseness processing
 
..........................
 
 
 
7.95, 6.70, 5.90, 5.15, 4.75 kbit/s modes
 
  
 
An adaptive anti-sparseness postprocessing procedure is applied to the fixed codebook vector c(n) in order to reduce
 
An adaptive anti-sparseness postprocessing procedure is applied to the fixed codebook vector c(n) in order to reduce
Line 313: Line 317:
 
corresponds to no modification, a value of 1 corresponds to medium modification, while a value of 0 corresponds to strong
 
corresponds to no modification, a value of 1 corresponds to medium modification, while a value of 0 corresponds to strong
 
modification. The selection of the impulse response is performed adaptively from the adaptive and fixed codebook gains.
 
modification. The selection of the impulse response is performed adaptively from the adaptive and fixed codebook gains.
See 6.1 5)
+
(c.f. §6.1 5)
  
  
 
+
=== Computing the reconstructed speech ===
 
 
Computing the reconstructed speech
 
..................................
 
  
 
Construct excitation:
 
Construct excitation:
Line 325: Line 326:
 
u(n) = ^g_p.v(n) + ^g_c.c(n)
 
u(n) = ^g_p.v(n) + ^g_c.c(n)
  
6.1 6)
+
(c.f. §6.1 part 6)
 
 
 
 
 
 
 
 
Additional instability protection
 
.................................
 
 
 
6.1 7)
 
 
 
  
  
 +
=== Additional instability protection ===
  
Adaptive post-filtering
+
(c.f. §6.1 part 7)
.......................
 
  
6.2.1
 
  
 +
=== Adaptive post-filtering ===
  
 +
(c.f. §6.2.1)
  
  
High-pass filtering and upscaling
+
=== High-pass filtering and upscaling ===
.................................
 
  
6.2.2
+
(c.f. §6.2.2)

Revision as of 07:32, 8 September 2007

Contents

AMR narrow band decoder

This text aims to be a simpler and more explicit document of the AMR narrow band decoding processes to aid in development of a decoder. Reference to sections of the specification will be made in the following format: (c.f. §5.2.5). Happy reading.

Nomenclature weirdness

Throughout the specification, a number of references are made to the same (or very similar) items with fairly confusing variation. They are listed below to aid understanding of the following text but efforts will be made to consistently use one item name throughout or to use both with the lesser used name in parenthesis.

  • Pitch / Adaptive codebook
  • Fixed / Innovative (also algebraic when referring to the codebook)
  • Quantified means not quantised


Summary

  • Mode dependent bitstream parsing
  • Indices parsed from bitstream
  • Indices decoded to give LSF vectors, fractional pitch lags, innovative code vectors and the pitch and innovative gains
  • LSF vectors converted to LP filter coefficients at each subframe
  • Subframe decoding
    • Excitation vector = adaptive code vector * adaptive (pitch) gain + innovative code vector * innovative gain
    • Excitation vector filtered through an LP synthesis filter to reconstruct speech
    • Speech signal filtered with adaptive postfilter


Bitstream parsing

Documented on http://wiki.multimedia.cx/index.php?title=AMR-NB and in 26.101 For implementation, see http://svn.mplayerhq.hu/soc/amr/amrnbdec.c?view=markup


Decoding of LP filter parameters

The received indices of LSP quantization are used to reconstruct the quantified LSP vectors. (c.f. §5.2.5)

12.2kbps mode summary

  • indices into code books are parsed from the bit stream
  • indices give elements of split matrix quantised (SMQ) residual LSF vectors from the relevant code books
  • prediction from the previous frame is added to obtain the mean-removed LSF vectors
  • the mean is added
  • the LSF vectors are converted to cosine domain LSP vectors


Indices give elements of split matrix quantised (SMQ) residual LSF vectors from the relevant code books

The elements of the SMQ vectors are stored at an index into a code book that varies according to the mode. There are 5 code books for the 12.2kbps mode corresponding to the 5 indices. These tables will be referred to as:

lsf_m_n

the number of indices parsed according to the mode
the index 'position' i.e. 1 for the first index, etc

The 5 indices are stored using 7, 8, 8 + sign bit, 8, 6 bits respectively. The four elements of a 'split quantized sub-matrix' are stored at the index position in the appropriate code book are:

1st index in 1st code book 
r1_1, r1_2, r2_1, r2_2
2nd index in 2nd code book 
r1_3, r1_4, r2_3, r2_4
3rd index in 3rd code book 
r1_5, r1_6, r2_5, r2_6
4th index in 4th code book 
r1_7, r1_8, r2_7, r2_8
5th index in 5th code book 
r1_9, r1_10, r2_9, r2_10

With rj_i :

the first or second residual lsf vector
the coefficient of a residual lsf vector ( i = 1, ..., 10 )
rj_i 
residual line spectral frequencies (LSFs) in Hz


Prediction from the previous frame is added to obtain the mean-removed LSF vectors

zj(n) = rj(n) + 0.65*^r2(n-1)

zj(n) 
a mean-removed LSF vector from the current frame (denoted n)
^r2(n-1) 
the quantified 2nd residual vector of the last frame (denoted n-1)


The mean is added

fj = zj + lsf_mean_m

lsf_mean_m 
a table of the means of the LSF coefficients
the number of indices parsed according to the mode
fj 
the LSF vectors
The LSF vectors are converted to cosine domain LSP vectors

qk_i = cos( fj_i * 2 * π / f_s )

qk_i 
line spectral pairs (LSPs) in the cosine domain
the two lsf vectors give the LSP vectors q2, q4 at the 2nd and 4th subframes; k = 2*j
fj_i 
ith coefficient of the jth LSF vector; [0,4000] Hz
f_s 
sampling frequency in Hz (8kHz)


Other active modes summary

The process for the other modes is similar to that for the 12.2kbps mode.

  • indices into code books are parsed from the bit stream
  • indices give elements of a split matrix quantised (SMQ) residual LSF vector from the relevant code books
  • prediction from the previous frame is added to obtain the mean-removed LSF vector
  • the mean is added
  • the LSF vector is converted to a cosine domain LSP vector


Indices give elements of a split matrix quantised (SMQ) residual LSF vector from the relevant code books

The 3 indices are stored with the following numbers of bits:

Mode (kbps) 1st index (bits) 2nd index (bits) 3rd index (bits)
10.2 8 9 9
7.95 9 9 9
7.40 8 9 9
6.70 8 9 9
5.90 8 9 9
5.15 8 8 7
4.75 8 8 7

The four elements of a 'split quantized sub-matrix' are stored at the index position in the appropriate code book are:

1st index in 1st code book 
r_1, r_2, r_3
2nd index in 2nd code book 
r_4, r_5, r_6
3rd index in 3rd code book 
r_7, r_8, r_9, r_10
r_i 
residual LSF vector (Hz)
the coefficient of vector ( i = 1, ..., 10 )


Prediction from the previous frame is added to obtain the mean-removed LSF vector

z_i(n) = r_i(n) + pred_fac_i * ^r_i(n-1)

z_i(n) 
the mean-removed LSF vector from the current frame (denoted n)
pred_fac_i 
the prediction factor for the ith LSF coefficient
^r_i(n-1) 
the quantified residual vector of the last frame (denoted n-1)

These processes give the LSP vector at the 4th subframe (q4)


The available LSP vector(s) are used to linearly interpolate vectors for the other subframes (c.f. §5.2.6)

12.2 kbps mode

q1(n) = 0.5*q4(n-1) + 0.5*q2(n) q3(n) = 0.5*q2(n) + 0.5*q4(n)

Other modes

q1(n) = 0.75*q4(n-1) + 0.25*q4(n) q2(n) = 0.5 *q4(n-1) + 0.5 *q4(n) q3(n) = 0.25*q4(n-1) + 0.75*q4(n)


The LSP vector is converted to LP filter coefficients (c.f. §5.2.4)

for i=1..5

 f1_i  = 2*f1(i-2) - 2 * q_2i-1 * f1(i-1)
 for j=i-1..1
   f1_j +=   f1(j-2) - 2 * q_2i-1 * f1(j-1)
 end

end

f1_-1 = 0; f1_0 = 0;

Same for f2_i with q_2i insteand of q_2i-1

for i=1..5

 f'1_i = f1_i + f1_i-1
 f'2_i = f2_i - f2_i-1

end

for i=1..5

 a_i = 0.5*f'1_i    + 0.5*f'2_i

end for i=6..10

 a_i = 0.5*f'1_11-i - 0.5*f'2_11-i

end

a_i 
the LP filter coefficients


Decoding of the adaptive (pitch) codebook vector

  • indices parsed from bitstream
  • indices give integer and fractional parts of the pitch lag
  • adaptive codebook vector v(n) is found by interpolating the past excitation u(n) at the pitch lag using an FIR filter. (c.f. §5.6)

12.2kbps mode - 1/6 resolution pitch lag

Indices give integer and fractional parts of the pitch lag

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/6 in the range [17 3/6,94 3/6] and integers only in the range [95, 143].

The lower bound of the pitch lag is 17 3/6 and the fractional part is in 1/6 resolution, so the pitch index is given by:

pitch index = integer part*6 + fractional part -(17 3/6)*6

so theoretically...

if(pitch_index < (94 4/6 - 17 3/6)*6)

 // fractional part is encoded
 pitch_lag_int  = pitch_index/6 + 17 3/6;
 pitch_lag_frac = pitch_index - pitch_lag_int*6 + (17 3/6)*6;
 but the reference source adds an extra 2/6, i assume for rounding:
 pitch_lag_int = (pitch_index + 5)/6 + 17;

else

 // only integer part encoded, no fractional part
 pitch_lag_int  = pitch_index - 368;
 pitch_lag_frac = 0;

I have not yet discovered the meaning of 368 (368/6 = 61 2/6) Q: What is the meaning of 368?

For the second and fourth subframes, a pitch resolution of 1/6 is always used in the range [T1−5 3/6,T1+4 3/6], where T1 is nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe, bounded by 18...143.

Q: What search is being conducted with the following ranges?

// find the search range search_range_min = max(pitch_lag_int - 5, 18);

Q: Why only subtract 5 rather than 5 3/6? Because of the above mentioned rounding?

search_range_max = search_range_min + 9; if(search_range_max > 143) {

 search_range_max = 143;
 search_range_min = search_range_max - 9;

} Q: Why only add/subtract 9 instead of 10? Because of the above mentioned rounding?

// calculate the pitch lag pitch_lag_int = (pitch_index + 5) + search_range_min - 1; pitch_lag_frac = -2; Q: Why?

The pitch delay is encoded with 9 bits in the first and third subframes and the relative delay of the other subframes is encoded with 6 bits.


Decoding of the algebraic (innovative or fixed) codebook vector

The parsed algebraic codebook index is used to find the positions and amplitudes (signs) of the excitation pulses and to find the algebraic codebook vector c(n).

If the integer part of the pitch lag, T, is less than the subframe size 40, the pitch sharpening procedure is applied which translates into c(n) += βc(n−T) , where β is the decoded pitch gain, ^g_p, bounded by [0.0,1.0] or [0.0,0.8], depending on mode.


Decoding of the adaptive and fixed codebook gains

12.2kbps and 7.95kbps - scalar quantised gains

The received indices are used to find the adaptive codebook gain, ^g_p, and the algebraic codebook gc factor, ^γ_gc (gc for gain correction), from the corresponding quantisation tables.

Other modes - vector quantised gains

The received index gives both the adaptive codebook gain, ^g_p, and the algebraic codebook gc factor, ^γ_gc. The estimated algebraic codebook gain gc′ is found as described in clause 5.7.


Smoothing of the fixed codebook gain

10.2, 6.70, 5.90, 5.15, 4.75 kbit/s modes

Adaptive smoothing of fixed codebook gain. (c.f. §6.1 part 4)


Anti-sparseness processing

7.95, 6.70, 5.90, 5.15, 4.75 kbit/s modes

An adaptive anti-sparseness postprocessing procedure is applied to the fixed codebook vector c(n) in order to reduce perceptual artifacts arising from the sparseness of the algebraic fixed codebook vectors with only a few non-zero samples per subframe. The anti-sparseness processing consists of circular convolution of the fixed codebook vector with an impulse response. Three pre-stored impulse responses are used and a number impNr = 0,1,2 is set to select one of them. A value of 2 corresponds to no modification, a value of 1 corresponds to medium modification, while a value of 0 corresponds to strong modification. The selection of the impulse response is performed adaptively from the adaptive and fixed codebook gains. (c.f. §6.1 5)


Computing the reconstructed speech

Construct excitation:

u(n) = ^g_p.v(n) + ^g_c.c(n)

(c.f. §6.1 part 6)


Additional instability protection

(c.f. §6.1 part 7)


Adaptive post-filtering

(c.f. §6.2.1)


High-pass filtering and upscaling

(c.f. §6.2.2)