Penguin

Video compression algorithms as used in multimedia formats like MPEG are invariably lossy—that is, they throw away information that the human eye is less likely to notice.

In the beginning were compression algorithms for single image frames, like JPEG. This was able to effect a 10:1 reduction in size of photorealistic images with no noticeable difference in visible quality, which at the time (1990) seemed pretty miraculous.

The original MPEG-1 spec used a JPEG-derived algorithm for compressing video frames, and built on this to add compression in the temporal dimension as well, by storing differences between successive video frames instead of full frames. This gave another 10:1 reduction factor in data size.

But not all frames can be expressed as differences—you have to have a full frame as a starting point. This is called an I-frame (because it only uses intraframe compression). This can be followed by one or more P-frames (because each one is expressed as a difference from the previous frame). The sequence starting with an I-frame, and continuing with all its dependent P-frames, is called a Group Of Pictures or “GOP”.

To compress things even further, the GOP can contain B-frames, which are expressed as a difference between both a preceding and following I- or P-frame.

That is, preceding and following in time. However, to ease the job of the decoder, the B-frame occurs in the video stream after the frames that it depends on. Thus, the temporal order of display of an B-frame between an I-frame and a P-frame might be IBP, but the order in which they appear in the stream is IPB. The decoder knows that any B-frame is always to be displayed prior to the last preceding I- or P-frame.

The optimal length of a GOP is a tradeoff; the more frames, the better the compression (at least until you hit something like a scene change where the differencing is no longer effective). But if you want to do random access or trick play, where you skip frames forward or backward through the video, you can really only skip to the I-frames, since the other frames cannot be decoded without starting from them. Also sometimes you may have seen on a digital TV broadcast where part of the screen is suddenly filled with green squares that may take an appreciable fraction of a second to disappear; this is where the received signal has lost part or all of an I-frame, so the decoder substitutes the green squares for the missing data, until the next I-frame comes along.

For all these reasons, the usual length of a GOP is enough frames to make up 0.5 to 1 second of video.