Microsoft Audio/Video Interleaved

From MultimediaWiki
Jump to: navigation, search

Microsoft Audio/Video Interleaved (AVI) is a multimedia format based on the RIFF container format. For a long time, AVI was the de facto standard for multimedia files on Windows (recently, ASF has supplanted AVI on the Windows platform). While there is some contention regarding the originator of the format, the fact remains that there were, and still are, a wide variety of computer applications that create AVI files. This leads to a lot of fragmentation and application-specific nuances in a standard that was never particularly well-defined in the first place.

Structure

An AVI has this RIFF structure:

RIFF "AVI " (space at end)
    LIST "hdrl"
    DATA "avih", len: 56
        LIST "strl"
            DATA "strh", len: 56
            DATA "strf"
        LIST "strl"
            ...
    LIST: name: `INFO' (optional)
        ...
    LIST "movi"
        DATA "00dc"
        DATA "01wb"
        ...
    DATA "idx1"

Idiosyncracies

This section comes from the document 'AVI Files: Tips & Quirks' by Arpad "A'rpi" Gereoffy found at http://multimedia.cx/avistuff.txt

Introduction

A'rpi is the originator of the MPlayer media application for Linux. It's an open source movie player that can decode AVI files, as well as a number of other file formats. He has encountered a lot of AVIs created by a lot of different programs and is qualified to write about some of the quirks and nuances a programmer might encounter when writing a general purpose AVI file decoder.

Random Tips For Processing AVI File

In short, these are some things I discovered while writing/fixing my AVI demuxer:

  • AVI files are built from variable length chunks.
  • Each chunk has a 4-byte fourcc and a 4-byte length (dword).
  • If the chunk size is bad/broken, it will kill the whole demuxer process.
  • Chunks are padded to 2*n offset.

AVI files usually have:

  • RIFF avi header, containing general parameters (used for file type detection)
  • stream headers, containing common format stream descriptor, and type-specific audio/video/other header
  • single 'movi' chunk contains the audio and video packets.
  • index chunk contains index table (16 bytes for each chunk in 'movi')

The AVI header has a dwFlags field. It contains useful information, like type of interleaving, "have index" chunk and so on. Ignore it. Really. It's broken in too many files. Windows players ignore it too.

AVI docs say that 'XXdb' are uncompressed and 'XXdc' are compressed video chunk fourccs. (XX = stream id in HEX. Some specs says it's in DEC. Funny.) Ignore it. Just use the first 2 chars as a hex number, and get stream type from stream header for that stream id. I've seen even XXim FourCCs...

Stream header has some interesting fields:

  • dwRate, dwScale: These specify the playback samplerate of the stream.
  • dwStart: Specifies delay of the stream; rarely used, but must be supported.
  • dwSampleSize: This is the sample size (bytes / sample). It may be 0, which means variable sample sizes -> 1 chunk == 1 sample. For non-zero samplesize, chunks may contain more than one sample.

Regarding VBR audio in AVI, see VirtualDub site mentioned in the references. The 3 AVI parsers in Windows behave differently with such streams. 1 normal (0=vbr), 1 tricky (rounds up zero to blockalign), 1 crashes.

In AVI, audio specific header contains WAVEFORMATEX and video spec. hdr contains BITMAPINFOHEADER. Both can have optional codec-dependent extra data appended after the struct. Don't crop it, it will break decoding!

About the movi chunk: Recently I got some AVI files with bad movi chunk sizes. So, I have to say: Ignore it. Read chunks from the file while not EOF, and not while filepos < movi_end.

About index:

  • It contains chunk pos, chunk size, chunk fourcc and flags. Bit 4 of the flags field (flags & 0x10) means that chunk represents a keyframe.
  • Offset is relative to `cat /dev/urandom`. Really. Or dunno.
  • I calculate an offset_of_offset value from the movi_start and first chunk offset. It works in 99% of cases. I saw different methods in other players, handling some common cases (such as relative to avi chunk, relative to movie chunk, etc.) and fallback to absolute value.
  • Chunk info in chunk header (first 4+4 bytes of chunks) and index table should be equal. They aren't. Sometimes the size values differ by +/-1. Strange. Sometimes fourccs 'type' part (last 2 char) differ. Even more strange. Sometimes they leave chunk header. I think Windows parsers don't use chunk headers at all, and use only the index. This may be why they are unable to play files without index.

On the subject of interleaving, there are 3 categories of interleaving for AVI files (taken from DOCS/tech/formats.txt in the MPlayer distribution):

  1. Interleaved: Audio and video content is interleaved. It's faster and requires only 1 reading thread, so it's recommended (and most commonly used).
  2. Non-interleaved: Audio and video aren't interleaved. The file stores all of the video data followed by all the audio data. Such a file requires 2 reading processes or 1 reading with lots of seeking. This is very bad when playing the data from a network or CD-ROM.
  3. Badly-interleaved streams: Some AVI files claim to be interleaved but with bad sync. These files should be treated as non-interleaved.

About A/V sync, you should rely on samplerate (dwRate/dwScale), samplesize and stream positions. Use an integer, not a floating-point number, for byte positions. When calculating time for each frame:

 time = ((dwSampleSize ? (bytepos / dwSampleSize) : chunkpos) * dwRate / dwScale

floats will gradually drift into error.

Quirks

  • Some AVI files are seen in the wild with the signature 'AVI\x19'. There is a 0x19 number in the fourth byte rather than a 0x20.