Експериментальне порівняння алгоритмів компресії даних
Loading...
Date
2019
Authors
Глибовець, Микола
Яблонський, Володимир
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Об’єми даних, які зберігаються та передаються, постійно ростуть. Коли потрібно передати великі об’єми даних, на допомогу приходить компресія. Добре підібраний алгоритм компресії здатен зменшити розмір даних у середньому на 60 %. Проблема появи нових і модифікації чи оптимізації старих алгоритмів компресії стоїть дуже гостро. У статті розглянуто деякі відомі сьогодні алгоритми компресії. Наведено короткий опис основних властивостей і варіантів реалізації таких алгоритмів. У рамках роботи над статтею було реалізовано ці алгоритми та проведено експериментальний аналіз їхньої якості та швидкості роботи. Робота може бути цікавою та корисною дослідникам галузі компресії даних.
The amount of data that is stored and transferred grows regularly and rapidly. When it comes to transferring large data volumes, a data compression algorithm can be useful. A well-chosen data compression algorithm can reduce the size of the data to up to 60%. The problem of creating new and modifying or optimizing old algorithms is up to date. This article discloses some of the most widespread algorithms of data comparison; more exactly, four of them. Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities. It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding. Huffman code is a particular type of the optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding; an algorithm developed by David A. Huffman. Lempel–Ziv–Storer–Szymanski (LZSS) is a lossless data compression algorithm, a derivative of LZ77 that was created in 1982 by James Storer and Thomas Szymanski. The main difference between LZ77 and LZSS is that in LZ77 the dictionary reference could actually be longer than the string it was replacing. In LZSS, such references are omitted if the length is less than the “break even” point. Furthermore, LZSS uses one-bit flags to indicate whether the next chunk of data is a literal (byte) or a reference to an offset/length pair. Lempel–Ziv–Welch (LZW) is a lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published as an improved implementation of the LZ78 algorithm. As part of the work on the article, these algorithms were implemented and an experimental analysis of their quality and speed was carried out. Those experiments gave the conclusion that the best compression speed results were shown by LZSS and the best compression ratio was reached by LZW. The work can be useful for researchers in the field of data compression.
The amount of data that is stored and transferred grows regularly and rapidly. When it comes to transferring large data volumes, a data compression algorithm can be useful. A well-chosen data compression algorithm can reduce the size of the data to up to 60%. The problem of creating new and modifying or optimizing old algorithms is up to date. This article discloses some of the most widespread algorithms of data comparison; more exactly, four of them. Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities. It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding. Huffman code is a particular type of the optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding; an algorithm developed by David A. Huffman. Lempel–Ziv–Storer–Szymanski (LZSS) is a lossless data compression algorithm, a derivative of LZ77 that was created in 1982 by James Storer and Thomas Szymanski. The main difference between LZ77 and LZSS is that in LZ77 the dictionary reference could actually be longer than the string it was replacing. In LZSS, such references are omitted if the length is less than the “break even” point. Furthermore, LZSS uses one-bit flags to indicate whether the next chunk of data is a literal (byte) or a reference to an offset/length pair. Lempel–Ziv–Welch (LZW) is a lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published as an improved implementation of the LZ78 algorithm. As part of the work on the article, these algorithms were implemented and an experimental analysis of their quality and speed was carried out. Those experiments gave the conclusion that the best compression speed results were shown by LZSS and the best compression ratio was reached by LZW. The work can be useful for researchers in the field of data compression.
Description
Keywords
компресія, алгоритм, кодування, код, стиснення, дані, стаття, compression, algorithm, coding, code, data, article
Citation
Глибовець А. М. Експериментальне порівняння алгоритмів компресії даних / Глибовець А. М., Яблонський В. Р. // Наукові записки НаУКМА. Комп'ютерні науки. - 2019. - Т. 2. - С. 43-49.