Some of the more esoteric terms and concepts you may come across in MPEG.
“Streams” in MPEG are actually called packetized elementary streams (PES), because the data is divided up into packets, each of which begins with a PES header. Each PES header can contain a presentation time stamp (PTS) which synchronizes the time at which that data is to be decoded and presented to the viewer, along with corresponding data from the other streams.
Not every PES packet has to have a PTS, but there must be at least one every 0.7 seconds, in order to ensure the decoder clock stays in sync. The PTS is a 33-bit unsigned integer, in units of a 90kHz clock.
In a typical movie file, a packet belonging to one stream (say, video) will invariably be immediately followed by one belonging to another stream (say, audio), with around the same PTS. This interleaving or multiplexing of data from different streams minimizes the amount of buffering the decoder has to implement in order to provide smoothly synchronized playback.
MPEG defines standard stream types for audio and video. Besides these, it also allows for padding streams (the contents of which are ignored by the decoder), and private streams, not further specified by MPEG itself. For instance, DVDVideo defines meanings for the contents of particular private streams.
An MPEG file/bytestream could just consist of an unadorned sequence of PES packets, but this is not usually done. Instead, MPEG is normally represented in either transport stream (MPEG-TS) or program stream(MPEG-PS) formats.
In transport stream format, the PES packets are sliced into small pieces, each of which is carried in a transport-stream packet. This is intended for use in transmission over less-than-fully-reliable channels (e.g. broadcast over the air), where a momentary burst of interference or loss of signal will only lose some transport-stream packets.
In program stream format, PES packets are grouped into packs, each beginning with a pack header. Though calling this a “header” is a bit of a misnomer, since it is effectively a packet in its own right, and in particular it has no length field that includes any following PES packets (the pack effectively extends until the next pack header, or the end of the file/bytestream). The purpose of the pack header is to 1) contain a more precise clock reference (units of 27MHz instead of 90kHz) and 2) give the decoder some indication of the rate at which it will have to transfer data in order to play the movie.
Optionally following the pack header, there can be a system header. This specifies such things as how many audio and video streams there are, how much bandwidth they might need, and whether audio and video are in fact synchronized to the system clock. The first pack in the file must have a system header. This can be repeated at intervals throughout the file, but it must always have the same contents.
Note that, apart from the requirement for the presence of a system header, there is no special header at the start of an MPEG file/bytestream. This means it is in principle possible to concatenate two or more MPEG files together to achieve a playable result, provided that 1) there are no leftover bytes after the end of the last packet in each file, and 2) the player can cope with any resultant discontinuities in the progression of the PTS.
3 pages link to MPEGTerminology: