Ultimately, encoded video data needs to be decoded to 2-dimensional arrays of pixel values and presented to the user (or perhaps transcoded to a different format). All of these frames look complete to the user. But the frames often can not stand by themselves. They usually need information from other frames in order to make their presentation complete.
Some basic video frame terminology. First, there is the intraframe. This is also known as a keyframe. An intraframe is one that can stand on its own. It requires no other frames. It carries with it all the information needed to be decoded.
This definition of intraframe implies that there is a frame type that does depend on another type of frame. What kind of frame depends on other frames? The interframe, which is the opposite of intraframe. An interframe is a frame that has a data dependency on another frame. Usually the interframe depends on the previous frame. In some more complex codecs, an interframe may depend on frames from several frames ago, or the previous intraframe. In some cases, the interframe may depend on frames from the future.
What does it mean for an interframe to depend on another frame? Imagine frame #0 and frame #1 which happen to be exactly identical. frame #0 will be the intraframe and frame #1 will be the subsequent interframe. A certain amount of information is required to encode frame #0. If frame #1 were also coded as an intraframe, it would require the same amount of information as was required to encode frame #0. But since the encoder can see that frame #1 is the same as frame #0, the encoder can write a series of codes that indicate that parts of frame #1 (or perhaps the entire frame) are identical to the preceding frame (frame #0). This saves information. Perhaps frame #1 changed entirely from frame #0, but only marginally as the entire frame grew a little brighter. The encoder can transmit difference information that instructs the encoder to add small values to the pixels from the frame #0 to form frame #1. Remember that, in general, lots of small numbers compress better than lots of big numbers. Thus, transmitting the delta information in this case tends to be more efficient than transmitting some quasi-random distribution that represents an entire video frame.
Here are some things to understand about the general trade-offs involved in using different frame types:
- intraframes use relatively more space; interframes use relatively less space
- intraframes are relatively quicker to decode; interframes are relatively slower to decode
- an application can randomly seek to intraframes; an application can not randomly seek to interframes, at least not with clean video since interframes do not carry all the information to put together a complete frame
Further, here is some terminology and concepts for understanding basic MPEG video technology. There are 3 frame types:
- I-frame: This is an intraframe, coded completely by itself
- P-frame: This is a predicted frame which requires information from the previous I-frame or P-frame, which may or may not be the frame directly preceding it
- B-frame: This is a bidirectionally predicted frame and requires information from the surrounding I- and P-frames
MPEG videos contain frame sequences of the following sort:
0 1 2 3 4 5 6 7 8 9 I B B P B B P B B I ...
Note that those first 2 B-frames (#1 and 2) actually require information from the future (frame #3) as well as from the past (frame #0). How is this achieved? By transmitting the P-frame earlier than the B-frames:
0 3 1 2 6 4 5 7 8 9 I P B B P B B B B I ...
Thus, the decoder decodes frame #0 (I-frame) first and has it ready to display. It decodes frame #3 (P-frame) next and keeps it handy for decoding frames #1 and 2 (the B-frames). Data from frames #0 and 3 are used for decoding frame #1 and 2. Frames #1 and 2 are now ready for display followed by frame #3. Frame #6 (another P-frame) is decoded next since it and frame #0 (the last I-frame) are both required for decoding frames #4 and 5. And on it goes.
Another important point is that the B-frames in MPEG have no data dependencies. This means that no other frames ever reference them. What is the consequence of this? B-frames can be skipped. Consider the sequence of decoding frames:
0 3 1 2 6 4 5 7 8 9 I P B B P B B B B I ...
Imagine that the computer performing the decoding is bogged down with other tasks, or is just not up to the decoding task in the first place. It decodes frames #0, 3, 1, 2, and 6 but is just getting too far behind schedule. The decoder has the option of skipping some B-frames which is especially useful when considering that B-frames generally take the most time to decode. This is a feature missing from a lot of video codecs which only provide the conceptual equivalent of I- and P-frames:
0 1 2 3 4 5 6 7 8 9 I P P P P P P P P I ...
If, by frame #4, decoding is behind schedule, the best that the decoder can hope to do is drop frames #5-8 and re-sync the video at frame #9, and that is assuming that the video/container format supports the notion of keyframes. It is generally not acceptable in this situation to drop frame #5 and go on to decode frame #6; frame #6 assumes that it will be using data from frame #5 when frame #4 was most recently decoded.
What A Multimedia Hacker Needs To Know
Many MPEG-type codecs use I, P, and B frames. Most non-MPEG codecs only use I- and P-frames, conceptually. Sometimes, a codec will use unidirectional B-frames. This is a P-frame that, while it does not use data from a future frame, no other frames depend on it. A fundamental property of B-frames is that they can be dropped when without affecting the correct decoding of other frames.
Many lossless video codecs, which are often designed for archival or editing, are I-frame only. The consequence of this is that any frame can be accessed instantaneously.
When developing a new video decoder, it helps to focus on I-frames first, as they are the least complex.
- Part 1 of the MPEG FAQ has a good discussion of the MPEG frame types (”Q. So is each frame predicted from the last frame?”)