- Company: Intel
- Extension: ivf
All data is little-endian.
16 bytes - GUID (1981ef50-bdb3-11d0-a3e5-00a0c9244436 or 1981ef50-bdb3-11d0-a3e5-00a0c9244437) 4 bytes - container flags (1 - audio stream present) 156 bytes - container header 140 bytes - video stream header (almost the same as in AVI) 140 bytes - (optional) audio stream header (almost the same as in AVI) 4 bytes - size of video stream information N bytes - video stream information (BITMAPINFOHEADER) 4 bytes - (optional) size of audio stream information M bytes - (optional) audio stream information (WAVEFORMATEX) 4*X bytes - 32-bit full sizes for each video frame 128 bytes - (only for version 1 with GUID ending with '7') unknown 4 bytes - container description length D bytes - container description (ASCIIZ) chunks
Container header format
4 bytes - number of audio frames 4 bytes - unknown 4 bytes - unknown 4 bytes - audio frame size 4 bytes - total file size the rest is unknown
Stream header format
4 bytes - stream type ("vids" or "auds") 4 bytes - handler FOURCC 4 bytes - flags 2 bytes - stream priority 2 bytes - language 4 bytes - initial frames 4 bytes - timebase numerator 4 bytes - timebase denominator 4 bytes - start offset 4 bytes - stream duration 4 bytes - unknown 4 bytes - suggested buffer size 4 bytes - unknown 4 bytes - unknown 16 bytes - bounding rectangle (in RECT format) 4 bytes - unknown 4 bytes - unknown 24 bytes - stream description (ASCIIZ) the rest probably does not matter
Chunks start with 8-byte header:
4 bytes - frame and stream number 4 bytes - chunk size
Stream number is the low bit of the first dword, it is set for video stream.
Audio chunks have complete audio frames while video data can be spread in several layers for scalability.
For example, in the known stream at first there are only some bands of intra frames that are transmitted (interleaved with audio frames) while for the rest of frames only 2-byte code for drop frames are transmitted. Then there is another pass of video data transmitting inter frame data for every third frame. Then there's another video data pass transmitting all droppable inter frames plus additional data for already present intra and inter frames. And finally there's the rest of data for all video frames.