H.264, also known as AVC (Advanced Video Coding) was introduced in 2004 as the 10th part of MPEG-4 standard.
H.264, similar to MPEG-2 and MPEG-4, uses a differential compression technique, i.e. actual image is created based on one or several previous images or differences between those images in time. However, H.264 has been significantly improved. On one hand, it significantly increases the processing speed demand during encoding, but on the other hand, it reduces the bit rates without affecting the image quality.
A key element of H.264 compression is image prediction (intra-frame prediction). It predicts subsequent coded image based on previously coded and decoded images. It works in the same way in the coder and decoder, and thus the decoder may restore the encoded image based on the image prediction error determined by the coder as a difference between the original coded image and its prediction. It does not send the subsequent images but the prediction errors only which, using special algorithms, does not contain a large volume of data and can be coded with less bits.
H.264 compression includes 3 types of frames: I – Intra-coded, P – Predictive, B – Bi-predictive.
Fig. 1. Example notation of a sequence frame including three types of frames
t - Time
I type frames include complete image data. P frames include information about changes between subsequent P or I frames (the resulting image is created based on those information). B frames are images coded using two reference images, one before and one after the coded image in a video sequence. In B type images, the most similar blocks (macroblocks) are selected in both reference images or are determined as average values for blocks from both reference images. The size of the frame depends on many factors. It can be assumed that P frame size is approx. 60% of I frame size and B frame size is 10% of I frame size. The more B frames in the video sequence, the higher the compression ratio. It does not necessary translate into the deterioration of the image quality.
Quality of all three compression standards is shown below as a Peak Signal-to-Noise Ratio (PSNR) depending on the bit rate.
Fig. 2. Comparison of H.264, MPEG-4 and JPEG formats
X - Bit rate
Y - Peak signal-to-noise ratio (PSNR)
Improvements and modifications introduced in H.264 compared to the legacy standards using hybrid coding with intra-frame prediction and motion compensation are shown below.
1. Variable block size for motion compensation. Motion can be compensated not only in relation to the whole macro-blocks but also their parts. They are assigned individual motion vectors. The smaller block includes (4x4) luminance points. It produces smaller prediction errors that may be expressed with less bits.
2. Motion prediction with accuracy up to 1/4 of the image sampling interval. High motion vector accuracy allows high precision prediction with motion compensation.
3. Using multiple reference images - using a long -term memory for prediction of revealed areas.
4. Inter-frame directional prediction for the inter-frame coded macro-blocks.
5. De-blocking filter removes the block effects occurring in the predicted images at high compression.
6. The cosine transform operates at small blocks - (4x4) for luminance samples and (2x2) for chrominance samples for improved adjustment to local image properties.
7. Adaptive entropy coding: CAVLC (Context-Adaptive Variable Length Coding) – with variable word length and CABAC (Context-Based Binary Arithmetic Coding) – more complex adaptive arithmetic coding for better compression ratio.
To fully use the capabilities of H.264 standard, the coder should provide optimum switching between the available coding modes. H.264 standard is important for CCTV systems, since reducing the image data rates without any loss in quality allows transmission through more channels at higher quality.