Difference between revisions of "RealAudio sipr"

From MultimediaWiki
Jump to: navigation, search
 
(7 intermediate revisions by 3 users not shown)
Line 7: Line 7:
 
Audio codec found in [[RealMedia]] files, not as common as [[RealAudio_cook|cook]].
 
Audio codec found in [[RealMedia]] files, not as common as [[RealAudio_cook|cook]].
  
Allegedly the same codec as [[ACELP.net]].  May be partly based on G.729, however RealAudio predates the finalization of the G.729 specifications for the 6.5 kbit/s and 11.8 kbit/s variants. [http://www.itu.int/rec/recommendation.asp?type=folders&lang=e&parent=T-REC-G.729 ITU-T G.729 page]
+
It is the same codec as [[ACELP.net]].  May be partly based on G.729, however RealAudio predates the finalization of the G.729 specifications for the 6.5 kbit/s and 11.8 kbit/s variants. [http://www.itu.int/rec/recommendation.asp?type=folders&lang=e&parent=T-REC-G.729 ITU-T G.729 page]
  
 
Codec library with debugging symbols: [http://wwwa2.kph.uni-mainz.de/ftp/pub/machines/linux/multimedia/v50b3_linux20elf.tar.gz v50b3_linux20elf.tar.gz]
 
Codec library with debugging symbols: [http://wwwa2.kph.uni-mainz.de/ftp/pub/machines/linux/multimedia/v50b3_linux20elf.tar.gz v50b3_linux20elf.tar.gz]
Line 17: Line 17:
 
  3  16 Kbps Voice
 
  3  16 Kbps Voice
  
=== 16 Kbps Voice ===
+
= 16 Kbps Voice Format description =
 
* Sampling rate is 16000 Hz.
 
* Sampling rate is 16000 Hz.
 
* Each frame is 160 samples long and represents 10ms of speech data.
 
* Each frame is 160 samples long and represents 10ms of speech data.
Line 23: Line 23:
  
 
== Bit stream frame format ==
 
== Bit stream frame format ==
=== 16 Kbps Voice ===
 
  
 
{|border =1
 
{|border =1
Line 74: Line 73:
 
|}
 
|}
  
==Decoding of the pitch (adaptive codebook) vector==
+
== Decode of LP filter parameters ==
 +
=== LSF vectors decoding ===
 +
=== LSF to LSP vector conversion ===
 +
Same as for [[AMR-NB#LSF_to_LSP_vector_conversion|AMR-NB]], except that coefficients belongs to [0; 8000]Hz range and sampling rate is 16 kHz.
 +
 
 +
=== LSP vector interpolation ===
 +
Same as for [[AMR-NB#LSP_vector_interpolation_.28c.f._.C2.A75.2.6.29|AMR-NB, 12.2k mode]], except that only two subframes are used.
 +
=== LSP to LP vector conversion ===
 +
Same as for [[AMR-NB#LSP_vector_to_LP_filter_coefficient_conversion_.28c.f._.C2.A75.2.4.29|AMR-NB ]]
 +
 
 +
== Decoding of the pitch (adaptive codebook) vector ==
 
=== Decode pitch lag ===
 
=== Decode pitch lag ===
==== 16 Kbps Voice, 1/3 resolution ====
 
  
 
In the first subframe, a fractional pitch lag is used with resolutions:
 
In the first subframe, a fractional pitch lag is used with resolutions:
Line 92: Line 100:
 
==Decoding of the innovative (algebraic or fixed codebook) vector==
 
==Decoding of the innovative (algebraic or fixed codebook) vector==
 
=== Decoding the pulse positions ===
 
=== Decoding the pulse positions ===
==== 16 Kbps Voice ====
 
 
The fixed codebook vector is reconstructed using 10 pulses in 5 overlapping tracks.
 
The fixed codebook vector is reconstructed using 10 pulses in 5 overlapping tracks.
 
Pulses in each track are encoded using 9 bits:
 
Pulses in each track are encoded using 9 bits:
Line 127: Line 134:
 
<references/>
 
<references/>
  
 +
Codebook structure is similar to [[AMR-NB#12.2_kbps_mode_2|AMR-NB's 12.2k mode]], except that no gray coding is used and each index is extended to 4 bits.
  
 
 
[[Category:Undiscovered Audio Codecs]]
 
 
[[Category:Audio Codecs]]
 
[[Category:Audio Codecs]]
 
[[Category:Vocoders]]
 
[[Category:Vocoders]]

Latest revision as of 22:29, 15 January 2010

Summary

Audio codec found in RealMedia files, not as common as cook.

It is the same codec as ACELP.net. May be partly based on G.729, however RealAudio predates the finalization of the G.729 specifications for the 6.5 kbit/s and 11.8 kbit/s variants. ITU-T G.729 page

Codec library with debugging symbols: v50b3_linux20elf.tar.gz

Sipr flavors

0  6.5 Kbps Voice
1  8.5 Kbps Voice
2  5 Kbps Voice
3  16 Kbps Voice

16 Kbps Voice Format description

  • Sampling rate is 16000 Hz.
  • Each frame is 160 samples long and represents 10ms of speech data.
  • Bitstream rate is 16 Kbps

Bit stream frame format

Bits Meaning
1 Switched MA predictor
7 LSP quantization, index 1
8 LSP quantization, index 2
7 LSP quantization, index 3
7 LSP quantization, index 4
7 LSP quantization, index 5
First subframe
9 Pitch delay
4 Gain codebook index
9 Fixed codebook index (pulses 1 and 6)
9 Fixed codebook index (pulses 2 and 7)
9 Fixed codebook index (pulses 3 and 8)
9 Fixed codebook index (pulses 4 and 9)
9 Fixed codebook index (pulses 5 and 10)
Second subframe
9 Pitch delay
4 Gain codebook index
9 Fixed codebook index (pulses 1 and 6)
9 Fixed codebook index (pulses 2 and 7)
9 Fixed codebook index (pulses 3 and 8)
9 Fixed codebook index (pulses 4 and 9)
9 Fixed codebook index (pulses 5 and 10)

Decode of LP filter parameters

LSF vectors decoding

LSF to LSP vector conversion

Same as for AMR-NB, except that coefficients belongs to [0; 8000]Hz range and sampling rate is 16 kHz.

LSP vector interpolation

Same as for AMR-NB, 12.2k mode, except that only two subframes are used.

LSP to LP vector conversion

Same as for AMR-NB

Decoding of the pitch (adaptive codebook) vector

Decode pitch lag

In the first subframe, a fractional pitch lag is used with resolutions:

  • 1/3 in the range [29 1/3; 159]
  • 1 in the range [160; 281]

...encoded using 9 bits.


In the second subframe, a pitch lag resolution of 1/3 is always used in the range [T1 - 10 2/3, T1 + 9 2/3], where T1 is nearest integer to the fractional pitch lag of the previous (1st) subframe. The search range is bounded by [30, 281].

Above procedure is applied only if encoded pitch delay is in the range [0, 61]. In the other case the pitch delay is set to T1 + 1, where T1 is nearest integer to the fractional pitch lag of the previous (1st) subframe.

Decoding of the innovative (algebraic or fixed codebook) vector

Decoding the pulse positions

The fixed codebook vector is reconstructed using 10 pulses in 5 overlapping tracks. Pulses in each track are encoded using 9 bits:

  • 1 bit - first pulse's sign<ref>The signs of two pulses in one track are set to the same value if second pulse has greater or equal index and opposite - otherwise.</ref> (1 - negative, 0 - positive)
  • 4 bits - encoded index of first pulse in pair
  • 4 bits - encoded index of second pulse in pair
Track Pulses Indexes
1 i0,i5 0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75
2 i1,i6 1,6,11,16,21,26,31,36,41,46,51,56,61,66,71,76
3 i2,i7 2,7,12,17,22,27,32,37,42,47,52,57,62,67,72,77
4 i3,i8 3,8,13,18,23,28,33,38,43,48,53,58,63,68,73,78
5 i4,i9 4,9,14,19,24,29,34,39,44,49,54,59,64,69,74,79

<references/>

Codebook structure is similar to AMR-NB's 12.2k mode, except that no gray coding is used and each index is extended to 4 bits.