Indeo IVF

From MultimediaWiki
Revision as of 09:35, 11 October 2022 by Kostya (talk | contribs) (→‎Chunk format)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
  • Company: Intel
  • Extension: ivf

This is a streaming format created by Intel to encapsule its Indeo codecs (probably just Indeo 5 with Indeo Audio or Intel Music Coder).

File format

All data is little-endian.

  16 bytes - GUID (1981ef50-bdb3-11d0-a3e5-00a0c9244436 or 1981ef50-bdb3-11d0-a3e5-00a0c9244437)
   4 bytes - container flags (1 - audio stream present)
 156 bytes - container header
 140 bytes - video stream header (almost the same as in AVI)
 140 bytes - (optional) audio stream header (almost the same as in AVI)
   4 bytes - size of video stream information
   N bytes - video stream information (BITMAPINFOHEADER)
   4 bytes - (optional) size of audio stream information
   M bytes - (optional) audio stream information (WAVEFORMATEX)
 4*X bytes - 32-bit full sizes for each video frame
 128 bytes - (only for version 1 with GUID ending with '7') unknown
   4 bytes - container description length
   D bytes - container description (ASCIIZ)
 chunks

Container header format

 4 bytes - number of audio frames
 4 bytes - unknown
 4 bytes - unknown
 4 bytes - audio frame size
 4 bytes - total file size
 the rest is unknown

Stream header format

  4 bytes - stream type ("vids" or "auds")
  4 bytes - handler FOURCC
  4 bytes - flags
  2 bytes - stream priority
  2 bytes - language
  4 bytes - initial frames
  4 bytes - timebase numerator
  4 bytes - timebase denominator
  4 bytes - start offset
  4 bytes - stream duration
  4 bytes - unknown
  4 bytes - suggested buffer size
  4 bytes - unknown
  4 bytes - unknown
 16 bytes - bounding rectangle (in RECT format)
  4 bytes - unknown
  4 bytes - unknown
 24 bytes - stream description (ASCIIZ)
 the rest probably does not matter

Chunk format

Chunks start with 8-byte header:

 4 bytes - frame and stream number
 4 bytes - chunk size

Stream number is the low bit of the first dword, it is set for video stream.

Audio chunks have complete audio frames while video data can be spread in several layers for scalability.

For example, in the known stream at first there are only some bands of intra frames that are transmitted (interleaved with audio frames) while for the rest of frames only 2-byte code for drop frames are transmitted. Then there is another pass of video data transmitting inter frame data for every third frame. Then there's another video data pass transmitting all droppable inter frames plus additional data for already present intra and inter frames. And finally there's the rest of data for all video frames.