Dolby E: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
Dolby E is a codec from [[Dolby Laboratories]] that is used to transport up to 8 channels of audio across AES-3 cabling (AES-3 is the professional version of [[SPDIF]]). It is carried in a [[SMPTE-337M]] data burst. Dolby E also carries metadata such as downmixing information which is intended to be passed through to the final distribution encoder.
Dolby E is a codec from [[Dolby Laboratories]] that is used to transport up to 8 channels of audio across AES-3 cabling (AES-3 is the professional version of [[SPDIF]]). It is carried in a [[SMPTE-337M]] data burst. Dolby E also carries metadata such as downmixing information which is intended to be passed through to the final distribution encoder.
Very similar to AC-3. Longer transform length and different windows. LFE also has a postfilter. Higher Bitrate overall.
Official decoder is very slow. It has no SIMD at all. A decoder in ffmpeg is 20-30x faster.


== Decoding ==
== Decoding ==
Line 7: Line 10:
Dolby E is designed to match up with video frames to allow for easy cutting. Guard Bands are also present at the beginning and the end of the frame to reduce the risk of bad splicing causing problems.
Dolby E is designed to match up with video frames to allow for easy cutting. Guard Bands are also present at the beginning and the end of the frame to reduce the risk of bad splicing causing problems.


There are 3 input bit depths of Dolby E: 16-bit, 20-bit and 24-bit. It is unknown whether 24-bit exists in the wild. 16-bit mode has a maximum of 6 channels, 20-bit mode has a maximum of 8 channels and 24-bit has an unknown number of channels.
There are 3 input bit depths of Dolby E: 16-bit, 20-bit and 24-bit. 16-bit mode has a maximum of 6 channels and 20-bit and 24-bit modes have a maximum of 8 channels.


=== Startcodes ===
=== Startcodes ===
Line 20: Line 23:


=== Bitstream Key ===
=== Bitstream Key ===
Certain parts of the bitstream seem to be XOR ciphered. The number of bytes in a XORed section is sometimes written in the bitstream. The key is always the first byte of the section that is XORed.
Certain parts of the bitstream seem to be XOR ciphered. The key is always the first word of the section that is XORed.  


=== CRC ===
=== CRC ===
Each audio subsegment, metadata section and the metering section is CRCed using the AV_CRC_16_ANSI in libavutil.
Each audio subsegment, metadata section and the metering section is CRCed using the AV_CRC_16_ANSI in libavutil. (TODO: describe 20-bit mode because it's slightly different)
 
=== Metadata ===
Dolby E contains "Professional Metadata", which include SMPTE timecodes along with how the downstream encoder should be configured and "Consumer metadata", which is for passing onto [[AC-3]] and [[Dolby Pulse]] bitstreams to the viewer.
 
{| border="1"
! Size (bits) !! Explanation !! Value
|-
| 16/20/24 || Sync word || See above
|-
|-
| 0 || '''if(has_bitstream_key){''' ||
|-
|-
| 16/20/24 || Bitstream Key ||
|-
|-
| 0 || '''}''' ||
|-
|-
| 4 || Metadata revision id || Only seen 0 in the wild
|-
|-
| 10 || Size of first metadata section || In AES3 words
|-
|-
| 6 || Program configuration || Lookup table gives you number of channels and programs.
|-
|-
| 4 || Framerate byte pt1 (does something else too) || Seems to be the same as above ("unknown3" in the code)
|-
|-
| 4 || Framerate byte pt2 (does something else too) || Seems to be the same as above ("unknown2" in the code)
|-
|-
| 16 || Frame counter || Designed to detect splices
|-
|-
| 10 || Unknown ||
|-
|-
| 2 || SMPTE Timecode Hours tens ||
|-
|-
| 4 || SMPTE Timecode Hours units || NOTE: Timecode of "45" signifies "No Timecode" - Other sections are zeroed out. Max value for hours is 23.
|-
|-
| 9 || Unknown ||
|-
|-
| 3 || SMPTE Timecode Minutes tens ||
|-
|-
| 4 || SMPTE Timecode Minutes units || Max value for minutes is 59.
|-
|-
| 9 || Unknown ||
|-
|-
| 3 || SMPTE Timecode Seconds tens ||
|-
|-
| 4 || SMPTE Timecode Seconds units || Max value for seconds is 59.
|-
|-
| 9 || Unknown ||
|-
|-
| 1 || Drop Frame Flag ||
|-
|-
| 2 || SMPTE Timecode Frames tens ||
|-
|-
| 4 || SMPTE Timecode frames units ||
|-
|-
| 8 || Unknown ||
|-
|-
| 0 || '''for(int i=0; i < num_channels; i++){''' ||
|-
|-
| 10 || Size in words of channel i ||
|-
|-
| 0 || '''}''' ||
|-
|-
| 0 || '''if(Framerate_byte_pt1 =< 5){''' ||
|-
|-
| 8 || Size of metadata section 2 || In AES3 words
|-
|-
| 0 || '''}''' ||
|-
|-
| 8 || Size of meter section || In AES3 words
|-
|-
| 0 || '''for(int i=0; i < num_programs; i++){''' ||
|-
|-
| 8 || Description character || ( part of program info word. (why is this needed for decoding with a LUT?)
|-
|-
| 2 || zero (seemingly always) || ( part of program info word. (why is this needed for decoding with a LUT?)
|-
|-
 
| 0 || '''}''' ||
|-
|-
| 0 || '''for(int i=0; i < num_channels; i++){''' ||
|-
|-
| 10 || Gain word (first audio subsegment) ||
|-
|-
| 10 || Gain word (second audio subsegment) ||
|-
|-
| 0 || '''}''' ||
|-
|-
| 4 || "unknown4" in code || Seems to do a lot of things
|-
|-
| 0 || '''if(unknown4 & 0x3){''' || Implies existence of more metadata
|-
|-
| 0 || '''if(metadata_segment == 0 && (unknown4 == 1 OR unknown4 == 2)){''' ||
|-
|-
| 12 || Unknown ||
|-
|-
| 0 || '''for(int i=0; i < num_programs; i++){''' ||
|-
|-
| 5 || Data Rate || Need to find sample with this site
|-
|-
| 3 || Bitstream Mode ||
|-
|-
| 3 || Coding Mode || What about 7.1 mode? (not enough bits to signal)
|-
|-
| 2 || Centre Mix ||
|-
|-
| 2 || Surround Mix ||
|-
|-
| 2 || Surround Mode ||
|-
|-
| 1 || LFE Enable ||
|-
|-
| 5 || Dialogue Normalisation ||
|-
|-
| 1 || Unknown || Seems to always be zero
|-
|-
| 8 || Unknown || TODO
|-
|-
| 1 || Production information exists ||
|-
|-
| 5 || Mix Level ||
|-
|-
| 2 || Room Type ||
|-
|-
| 1 || Copyright ||
|-
|-
| 1 || Original ||
|-
|-
| 0 || '''}''' ||
|-
|-
| 0 || '''}''' ||
|-
|-
| 0 || '''else if(metadata_segment == 1){''' ||
|-
|-
| 0 || '''}''' ||
|-
|-
| 0 || '''}''' ||
|-
|-
|}
 


=== Audio Segments ===
=== Audio Segments ===
[[Image:Dolby e Audio segments.png]]
[[Image:Dolby e Audio segments.png]]
There's always an even number of channels so the split is trivial.
=== Exponents and bit allocation ===
There seem to be only 2 exponent strategies. Bit allocation is similar to AC-3.
=== Mantissa Quantisation ===
Dolby E uses gain adaptive quantisation for its mantissas. (TODO: describe further)


=== Transforms ===
=== Transforms ===
Line 32: Line 245:


[[Image:Dolby E Mdct.png]]
[[Image:Dolby E Mdct.png]]
=== Quantisation ===
Dolby E uses gain adaptive quantisation. (TODO: describe further)


=== Sample Rate Conversion ===
=== Sample Rate Conversion ===
Line 81: Line 291:
'''DD_SRCD_CONV''' – 0x2a <br />
'''DD_SRCD_CONV''' – 0x2a <br />
Sample rate convert
Sample rate convert
==External links==
[http://forum.videolan.org/viewtopic.php?f=18&t=81323 Discussion on VideoLan forum about E-distribution decoder"]


[[Category:Audio Codecs]]
[[Category:Audio Codecs]]
[[Category: MDCT Audio Codecs]]
[[Category: MDCT Audio Codecs]]
[[Category: Multichannel Audio Codecs]]
[[Category: Multichannel Audio Codecs]]

Latest revision as of 07:40, 21 November 2010

Dolby E is a codec from Dolby Laboratories that is used to transport up to 8 channels of audio across AES-3 cabling (AES-3 is the professional version of SPDIF). It is carried in a SMPTE-337M data burst. Dolby E also carries metadata such as downmixing information which is intended to be passed through to the final distribution encoder.

Very similar to AC-3. Longer transform length and different windows. LFE also has a postfilter. Higher Bitrate overall. Official decoder is very slow. It has no SIMD at all. A decoder in ffmpeg is 20-30x faster.

Decoding

Frame Structure

Frame structure.png

Dolby E is designed to match up with video frames to allow for easy cutting. Guard Bands are also present at the beginning and the end of the frame to reduce the risk of bad splicing causing problems.

There are 3 input bit depths of Dolby E: 16-bit, 20-bit and 24-bit. 16-bit mode has a maximum of 6 channels and 20-bit and 24-bit modes have a maximum of 8 channels.

Startcodes

Dolby E uses the following startcodes:

16-bit: 0x78e 20-bit: 0x788e 24-bit: 0x7888e

The LSB of the startcode signals the presence of a Bitstream Key. The Bitstream Key is mandatory in 16-bit mode.

Bitstream Key

Certain parts of the bitstream seem to be XOR ciphered. The key is always the first word of the section that is XORed.

CRC

Each audio subsegment, metadata section and the metering section is CRCed using the AV_CRC_16_ANSI in libavutil. (TODO: describe 20-bit mode because it's slightly different)

Metadata

Dolby E contains "Professional Metadata", which include SMPTE timecodes along with how the downstream encoder should be configured and "Consumer metadata", which is for passing onto AC-3 and Dolby Pulse bitstreams to the viewer.

Size (bits) Explanation Value
16/20/24 Sync word See above
0 if(has_bitstream_key){
16/20/24 Bitstream Key
0 }
4 Metadata revision id Only seen 0 in the wild
10 Size of first metadata section In AES3 words
6 Program configuration Lookup table gives you number of channels and programs.
4 Framerate byte pt1 (does something else too) Seems to be the same as above ("unknown3" in the code)
4 Framerate byte pt2 (does something else too) Seems to be the same as above ("unknown2" in the code)
16 Frame counter Designed to detect splices
10 Unknown
2 SMPTE Timecode Hours tens
4 SMPTE Timecode Hours units NOTE: Timecode of "45" signifies "No Timecode" - Other sections are zeroed out. Max value for hours is 23.
9 Unknown
3 SMPTE Timecode Minutes tens
4 SMPTE Timecode Minutes units Max value for minutes is 59.
9 Unknown
3 SMPTE Timecode Seconds tens
4 SMPTE Timecode Seconds units Max value for seconds is 59.
9 Unknown
1 Drop Frame Flag
2 SMPTE Timecode Frames tens
4 SMPTE Timecode frames units
8 Unknown
0 for(int i=0; i < num_channels; i++){
10 Size in words of channel i
0 }
0 if(Framerate_byte_pt1 =< 5){
8 Size of metadata section 2 In AES3 words
0 }
8 Size of meter section In AES3 words
0 for(int i=0; i < num_programs; i++){
8 Description character ( part of program info word. (why is this needed for decoding with a LUT?)
2 zero (seemingly always) ( part of program info word. (why is this needed for decoding with a LUT?)
0 }
0 for(int i=0; i < num_channels; i++){
10 Gain word (first audio subsegment)
10 Gain word (second audio subsegment)
0 }
4 "unknown4" in code Seems to do a lot of things
0 if(unknown4 & 0x3){ Implies existence of more metadata
0 if(metadata_segment == 0 && (unknown4 == 1 OR unknown4 == 2)){
12 Unknown
0 for(int i=0; i < num_programs; i++){
5 Data Rate Need to find sample with this site
3 Bitstream Mode
3 Coding Mode What about 7.1 mode? (not enough bits to signal)
2 Centre Mix
2 Surround Mix
2 Surround Mode
1 LFE Enable
5 Dialogue Normalisation
1 Unknown Seems to always be zero
8 Unknown TODO
1 Production information exists
5 Mix Level
2 Room Type
1 Copyright
1 Original
0 }
0 }
0 else if(metadata_segment == 1){
0 }
0 }


Audio Segments

Dolby e Audio segments.png

There's always an even number of channels so the split is trivial.

Exponents and bit allocation

There seem to be only 2 exponent strategies. Bit allocation is similar to AC-3.

Mantissa Quantisation

Dolby E uses gain adaptive quantisation for its mantissas. (TODO: describe further)

Transforms

Dolby E uses a slightly edited MDCT:

Dolby E Mdct.png

Sample Rate Conversion

The internal sample rate of Dolby E varies depending on the associated video frame-rate. This internal sample rate varies between 42.965kHz and 53.760kHz. This is sample rate converted to 48kHz after decoding.

Metering Information

There is also metering information available at the end of the frame. (TODO: describe further)

Decoder/Encoder

A free trial of a software Dolby E encoder and decoder that supports encoding of 16-bit and 20-bit modes and decoding of 16-bit, 20-bit and possibly 24-bit is available from http://www.neyrinck.com. However it requires Pace iLok to run, which features kernel level anti-debugging.

Dolby SIP API

The application library uses the Dolby SIP interface to decode. More information about the Dolby SIP interface can be found here.

Dolby Subroutines

Each function name is followed by a function ID number. These take the take the form: DD_XXXXD_YYYY for decoding and DD_XXXXE_YYYY for encoding

DD_SYS_INIT – 0x00
System Initialise

meter DD_CRCD_VER – 0x1E
Verify CRC of meter section.

Metadata %d DD_CRCD_VER: 0x1E
Verify CRC of metadata section.

Channel decode %d:%d DD_CRCD_VER – 0x1E
Verify CRC of channel.

DD_DDED_DEC – 0x20
Dolby E Decode Seemingly needs to be called 8 times (similar to ac-3’s 6 times per frame)

DD_METD_UNPACK – 0x22
Unpack Metadata

DD_METD_AC3INFO – 0x23
Return only the AC-3 compatible metadata?

DD_MTRD_UNPACK – 0x25
Unpack Metering data

meter DD_KEYD_EXTR – 0x28
Extract Bitstream Key

DD_SRCD_CONV – 0x2a
Sample rate convert

External links

Discussion on VideoLan forum about E-distribution decoder"