|
In this article:
- What is Data Compression
- Type of Data Compression
- Popular Data Compression Applications (Formats)
What is Data Compression?
Data compression, also called source coding is the process making certain information smaller in size by encoding it in as fewer bits as possible than it has in its uncomressed or unencoded form. A well known example in this regard is the ZIP format used by software such as WinZip. It is known for storing many files inside a single file (the Zip file) and acts an application to zip (encode) and unzip (decode) information contained within the file.
It is important to note that for understanding the data contained within a compressed file, the reciever of the file has to understand the encoding scheme or algorithm used by the sender. It is the same as in case with some written text. If a particular text, say a book, is in English, then it is important that the reader understands English in order for him to be able to read it. Hence, a reciever of a compressed file needs to know the decoding algorithm or method to decompress the data and see what it contains.
Some questions arise: How is compression possible? Aren’t the files or data already in the shortest form?
The answer lies in some characteristic of real world data. This characteristic is called statistical redundancy. It means that data tends to be repetitive in it. A good example would be that of the English lanugage where the letter 'e' is way common than 'z', and a ‘q’ followed by a ‘z’ is next to inexistent. Some compression techniques exploit this ‘loophole’ to express data in a shorter or concise form.
Another concept that makes compression possible is the way we percieve things. If a person percieves something such that he is still able to understand something even if some of the content is removed from the data, then we can compress that data by removing such parts that do not affect the capability of the reciever to understand it.
The reason why compression is necessary is that computer and storage resources are very limited as well as expensive. Thus, it is important to save these precious resources as much as possible so as to make them available for as many uses as possible. Another direct benefit of compression is that it saves bandwidth over a network connection. That is to say that since data is compressed; it takes less bandwidth (space over a network) and hence allows more data to be communicated in as less bandwidth (space) as possible.
Types of Data Compression
There are two types of data compression that exist. They are explained as follows:
- Lossless Data Compression
This is usually the most desirable form of data compression. The property of this type of compression is that it allows the recreation of the exact original data that was compressed from the output that it produces (hence the name, lossless). In this way, lossless data compression has an edge over the other type of data compression (explained next).
However, it should be understood that lossless data compression is not completely lossless. This is an inherent limitation with all compression algorithms. In each algorithm, there exists at least one input that cannot be compressed. Hence, when chosing any particular compression format, there is always a possibility that an input can not be shortened in size.
This type of data compression is called as reversible data compression.
- Lossy Data Compression
As the name suggest, lossy data compression is one where the information or data is compressed by eliminating some very small details that do not affect the readability of the data. This type of compression is widely popular especially used by the MP3 file format. The limitation here is that though the data is compressed, it cannot result into the recreation of the exact input used when decompressed. The decompressed form of the data, however, would be close enough to the original to be used. In some such algorithms, if the same data is compressed over and over again, then finally, the output would be an empty file.
Lossy data compression is also known as irreversible data compression.
Popular Data Compression Applications (Formats)
Some popular compressed formats are given below:
- Lossless Formats
- Graphics Interchange Format (GIF)
- Portable Network Graphics (PNG)
- MPEG4
- RealPlayer Lossless
- Monkey’s Audio (APE)
- Shorten (SHN)
- Lossy Formats
- JPEG
- DjVu
- Advanced Audio Encoding (AAC)
- MPEG-1 Layer 3 (MP3)
- Windows Media Audio (WMA)
- Cartesian Perceptual Compression (CPC)
|