Audible Audio: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
No edit summary
(Mucho documentation of file format)
 
(3 intermediate revisions by one other user not shown)
Line 4: Line 4:
Proprietary container format from audible.com.  There is no published specification.  It may contain one of five different encodings which are numbered 1 thru 5.  1-3 are rumored to be [[ACELP.net]] at varying bitrates.  #4 is [[MP3]].  #5 is some unknown Sony format.
Proprietary container format from audible.com.  There is no published specification.  It may contain one of five different encodings which are numbered 1 thru 5.  1-3 are rumored to be [[ACELP.net]] at varying bitrates.  #4 is [[MP3]].  #5 is some unknown Sony format.


* [http://www.audible.com/formats Audible.com FAQ]
* [http://www.digitalpreservation.gov/formats/fdd/fdd000103.shtml Description of the format from the US Library of Congress]
* [http://www.digitalpreservation.gov/formats/fdd/fdd000103.shtml Description of the format from the US Library of Congress]
* [http://en.wikipedia.org/wiki/Audible.com Wikipedia entry]
* [http://en.wikipedia.org/wiki/Audible.com Wikipedia entry]
To find files, use Google "filetype:aa audible".
Apparently this project knows everything below this: [http://code.google.com/p/pyaudibletags/source/browse/trunk/pyaudibletags.py PyAudibleTags]


=== File format ===
=== File format ===


The file is built up of 5 parts:
The file is composed of multiple chunks (11 chunks in all files seen so far in the wild):
* general header
* general header
* text based metadata
* multiple binary metadata chunks
* some codec related header
* a single textual metadata chunk
* offset table
* offsets table
* data
* audio data
* cover image
Whenever multi-byte values are used (e.g. 32-bit integers), they are encoded in big-endian format.


==== General header ====
==== General header ====


First 4 bytes is the file size including these four bytes.
First 32-bit word is the file size including these four bytes.
 
Third 32-bit word is the number of chunks in the file.


==== Text based metadata ====
A chunk index begins at the fourth word (file offset 16); for each chunk, this index includes three 32-bit values:
* The chunk type
* Chunk starting position in file
* Chunk size


  number_of_entires (32)
Internal structure and meaning for most chunks is unknown.
  skip (8)
The following table summarizes the known chunk types:
 
  key_length (32)
  value_length (32)
  key (key_length)
  value (value_length+1)


==== Offset table ====
{| border="1"
  number_of_entries (32)
|-
!chunk type code
!contents
|-
|align="center"|0x00
|entire file
|-
|align="center"|0x02
|textual metadata
|-
|align="center"|0x06
|offsets table
|-
|align="center"|0x0a
|(encrypted) audio contents
|-
|align="center"|0x0b
|cover image
|-
|}
 
 
==== Textual metadata ====


  type (32) [0 - meta?, 1 - meta?, 2 - audio packets, 3 - meta?, 4 - meta?]
The textual metadata (chunk type 2) consists of a sequence of named attributes.
  offset (32) [must be relative, as the first entry in the list is always 0]
The attribute names are textual (ASCII); attribute values appear to be encoded in UTF-8.
This chunk is formatted as follows:
* 32-bit number of attributes (n_attrs)
* n_attrs attribute entries, each consisting of
** 1 byte: 0x00
** 32 bits: name_length
** 32 bits: value_length
** name_length bytes: attribute name
** value_length bytes: value


==== Parsing the file ====
Interesting and useful attributes include author, narrator, title, pubdate, and description.


This following Ruby code works for parsing the known parts:
==== Offset table ====


  video = File.new(ARGV[0])
The offsets table (chunk type 6) lists all audio frames in the file, split into chapters.
  puts "File size: #{video.read(4).unpack('N')[0].ti_i}"
  video.read(0xb4) # skip header
  sizes = video.read(4).unpack('N')
  puts "Number of entires: #{sizes[0].to_i}"
  video.read(1) # skip
  (1..sizes[0].to_i).each { |x|
    sizes = video.read(8).unpack('NN')
    key = video.read(sizes[0].to_i)
    value = video.read(sizes[0].to_i+1)
    puts "#{x} key(#{sizes[0].to_i})=#{key} value(#{sizes[0].to_i)=#{value}"
  }
  video.read(0x6f) # skip
  sizes = video.read(4).unpack('N')
  puts "Number of packet table entries: #{sizes[0].to_i}"
  (1..sizes[0].to_i).each {
    sizes = video.read(8).unpack('NN')
    puts "type=#{sizes[0].to_i} offset=#{sizes[1].to_i}"
  }


Sample output (stripped):
Its format is:
* 32-bit number of chapters (n_chaps)
* n_chaps chapters:
** 32-bit flag word (usage unknown, observed values 0..3)
** 2 32-bit unused words (always 0xffffffff)
** 32-bit size of chapter in bytes
** 32-bit number of offsets in chapter (n_offsets); each offset refers to one second of playback
** 16-bit unknown half-word (always 0xc00d)
** 32-bit number of offsets in chapter (n_offsets, again)
** n_offsets offset entries:
*** 32-bit flag word (usage unknown, observed values 0..2)
*** 32-bit offset inside chapter


  13 key(5)=codec value(7)=acelp85
The offsets table is useful for determining the number of chapters in the file and each chapter's length.
  16 key(10)=HeaderSeed value(10)=1158166611
  18 key(9)=HeaderKey value(43)=3759801365 1641076194 2988088058 4282540117
  19 key(15)=EncryptedBlocks value(5)=39333


Probably a seek table starts at 0x5f8:
==== Cover image ====


The cover image (chunk type 11) is stored as follows:
* 32-bit length of image (imglen)
* 32-bit file position of image (always points to next byte in practice)
* imglen bytes: JPEG (EXIF) cover image


[[Category: Container Formats]]
[[Category: Container Formats]]

Latest revision as of 11:30, 30 January 2011

Proprietary container format from audible.com. There is no published specification. It may contain one of five different encodings which are numbered 1 thru 5. 1-3 are rumored to be ACELP.net at varying bitrates. #4 is MP3. #5 is some unknown Sony format.

To find files, use Google "filetype:aa audible".

Apparently this project knows everything below this: PyAudibleTags

File format

The file is composed of multiple chunks (11 chunks in all files seen so far in the wild):

  • general header
  • multiple binary metadata chunks
  • a single textual metadata chunk
  • offsets table
  • audio data
  • cover image

Whenever multi-byte values are used (e.g. 32-bit integers), they are encoded in big-endian format.

General header

First 32-bit word is the file size including these four bytes.

Third 32-bit word is the number of chunks in the file.

A chunk index begins at the fourth word (file offset 16); for each chunk, this index includes three 32-bit values:

  • The chunk type
  • Chunk starting position in file
  • Chunk size

Internal structure and meaning for most chunks is unknown. The following table summarizes the known chunk types:

chunk type code contents
0x00 entire file
0x02 textual metadata
0x06 offsets table
0x0a (encrypted) audio contents
0x0b cover image


Textual metadata

The textual metadata (chunk type 2) consists of a sequence of named attributes. The attribute names are textual (ASCII); attribute values appear to be encoded in UTF-8. This chunk is formatted as follows:

  • 32-bit number of attributes (n_attrs)
  • n_attrs attribute entries, each consisting of
    • 1 byte: 0x00
    • 32 bits: name_length
    • 32 bits: value_length
    • name_length bytes: attribute name
    • value_length bytes: value

Interesting and useful attributes include author, narrator, title, pubdate, and description.

Offset table

The offsets table (chunk type 6) lists all audio frames in the file, split into chapters.

Its format is:

  • 32-bit number of chapters (n_chaps)
  • n_chaps chapters:
    • 32-bit flag word (usage unknown, observed values 0..3)
    • 2 32-bit unused words (always 0xffffffff)
    • 32-bit size of chapter in bytes
    • 32-bit number of offsets in chapter (n_offsets); each offset refers to one second of playback
    • 16-bit unknown half-word (always 0xc00d)
    • 32-bit number of offsets in chapter (n_offsets, again)
    • n_offsets offset entries:
      • 32-bit flag word (usage unknown, observed values 0..2)
      • 32-bit offset inside chapter

The offsets table is useful for determining the number of chapters in the file and each chapter's length.

Cover image

The cover image (chunk type 11) is stored as follows:

  • 32-bit length of image (imglen)
  • 32-bit file position of image (always points to next byte in practice)
  • imglen bytes: JPEG (EXIF) cover image