Learning more about the technical side of video surveillance can very often be alienating, confusing, and unapproachable. In this article, we aim to provide an easy to understand explanation of just what video compression does and why we need it.
As surveillance cameras continue to get more and more advanced, we end up creating increasingly large amounts of data. As such, the need for codecs has grown significantly over the years. After just half an hour of filming with a 1080p (1080x1920) camera, over 410GB of raw data would be created. All that data needs to be transferred from the camera and stored somewhere, but today’s storage and data transfer technology simply can’t keep up. This is where video compression comes into play.
First, we’ll start with the basics of compression – In simple terms, video encoders like H.264 or H.265 are a type of software designed to compress or “encode” digital video. Compressing something like an image essentially condenses it, which reduces the amount of information needed to represent it. The picture below demonstrates compression in its most basic form – to squish an image down and then stretch it out again.
The image on the right is now pixelated after having been stretched back out. We can still see that it’s a picture of the same thing, but the quality is noticeably worse. In this process, we lose some information about the image (the reduced quality), but the space needed to store it is considerably smaller. This is what’s known as “Lossy compression”, and it’s one of the fundamental building blocks in the advanced compression technology we use today.
As we know, a video is essentially a series of still pictures played back-to-back (typically at a speed of around 30 different images per second). We can use the same process as above to reduce the space needed to store a video by compressing each individual frame, but - just like the picture above - the final result would look pretty rubbish.
There are arguably three core concepts that can help to give you a general understanding of how more advanced compression techniques work; the first of which is bitrates. A bitrate is more or less what it sounds like – it describes the rate that bits of information are being transferred.
The typical bitrate for a 1080p video comes in at around 3000-6000 kilobits per second (kbps). Security cameras can capture an exceptional amount of detail, but in reality, only so much data can be pushed down a wire, and computers have their limits on how fast they can display information. Lowering the bitrate is one of the primary goals of video compression, and there are number of ways to do it. The next two concepts describe two important technologies in achieving this.
The second topic is macroblocks. The image below (top) is an exaggerated example of how an image might be divided into macroblocks. The image below (bottom) shows another perspective.
Each square in this picture (top) would represent a macroblock. The amount of detail within an area determines how big the corresponding macroblock will be. We can see that the spaces on the pillars have very little detail, so they can be represented with a larger macroblock. Areas like those containing the people or fence have a much higher level of detail, so they need more blocks to capture the finer details. Each of these blocks would then be individually compressed.
This is a simplified representation of how macroblocks would actually be assigned. In reality, size of these blocks is variable: H.264 uses block sizes from 4x4 up to 16x16 pixels, and H.265 uses block sizes up to 64x64 pixels.
As an example of why this is useful, say we want to compress the iCam logo. Our makeshift encoder can use macroblocks with the dimensions of 5x5, 25x25, and 50x50 pixels. We want to save detailed elements of the image like the shield and text, but we don’t care about all of the space taken up by the white background. For the sake of simplicity, when we compress a macroblock, it then becomes a single pixel (which is the average of all the pixels around it).
The image contains a total of 112,500 pixels. If we don’t use macroblocks to compress it (i.e. just using 5x5 blocks), the compressed image would contain 4,500 pixels.
If we do use macroblocks, 50x50 blocks can be assigned to the white spaces, and 5x5 or 25x25 blocks can be assigned to the text. Doing this instead would significantly reduce the file size when compressed.
The last major concept is frame prediction and analysis, otherwise referred to as inter/intra frame prediction. Red chunks represent areas that might experience a significant change in pixels, like from a moving object. Blue represents areas that experience small or reduced change, and clear chunks experience little to no change. This is a very typical scenario for a CCTV camera – the background is unchanging, but there may be some moving humans or vehicles in the foreground.
The codec knows that movements like those shown in red can use frames that occur before it to predict how it will look after; and that clear areas will virtually remain the same. In other words, details can be extrapolated from other frames – doing this saves space, as information can now be derived instead of storing the entirety of each frame.