Lempel–Ziv–Storer–Szymanski

Lempel–Ziv–Storer–Szymanski (LZSS) is a lossless data compression algorithm, a derivative of LZ77, that was created in 1982 by James A. Storer and Thomas Szymanski. LZSS was described in article "Data compression via textual substitution" published in Journal of the ACM (1982, pp. 928–951).^[1]

LZSS is a dictionary coding technique. It attempts to replace a string of symbols with a reference to a dictionary location of the same string.

The main difference between LZ77 and LZSS is that in LZ77 the dictionary reference could actually be longer than the string it was replacing. In LZSS, such references are omitted if the length is less than the "break even" point. Furthermore, LZSS uses one-bit flags to indicate whether the next chunk of data is a literal (byte) or a reference to an offset/length pair.

YouTube Encyclopedic

1/3
Views:
40 399
7 229
158 414

Transcription

Example

Here is the beginning of Dr. Seuss's Green Eggs and Ham, with character numbers at the beginning of lines for convenience. Green Eggs and Ham is a good example to illustrate LZSS compression because the book itself only contains 50 unique words, despite having a word count of 170.^[2] Thus, words are repeated, however not in succession.

  0: I am Sam
  9:
 10: Sam I am
 19:
 20: That Sam-I-am!
 35: That Sam-I-am!
 50: I do not like
 64: that Sam-I-am!
 79: 
 80: Do you like green eggs and ham?
112:
113: I do not like them, Sam-I-am.
143: I do not like green eggs and ham.

This text takes 177 bytes in uncompressed form. Assuming a break even point of 2 bytes (and thus 2 byte pointer/offset pairs), and one byte newlines, this text compressed with LZSS becomes 95 bytes long:

A color coded example of LZSS compression in action.

 0: I am Sam
 9:
10: (5,3) (0,4)
16:
17: That(4,4)-I-am!(19,15)
32: I do not like
46: t(21,14)
50: Do you(58,5) green eggs and ham?
79: (49,14) them,(24,9).(112,15)(92,18).

Note: this does not include the 12 bytes of flags indicating whether the next chunk of text is a pointer or a literal. Adding it, the text becomes 107 bytes long, which is still shorter than the original 177 bytes.

Implementations

Many popular archivers like ARJ, RAR, ZOO, LHarc use LZSS rather than LZ77 as the primary compression algorithm; the encoding of literal characters and of length-distance pairs varies, with the most common option being Huffman coding. Most implementations stem from a public domain 1989 code by Haruhiko Okumura.^[3]^[4] Version 4 of the Allegro library can encode and decode an LZSS format,^[5] but the feature was cut from version 5. The Game Boy Advance BIOS can decode a slightly modified LZSS format.^[6] Apple's Mac OS X uses LZSS as one of the compression methods for kernel code.^[7]

References

^ Storer, James A.; Szymanski, Thomas G. (October 1982). "Data Compression via Textual Substitution". Journal of the ACM. 29 (4): 928–951. doi:10.1145/322344.322346.
^ "10 stories behind Dr. Seuss stories". CNN. January 23, 2009. Retrieved 2009-01-26.
^ Simtel.net mirror. Haruhiko Okumura implementation of 1989. Archived on February 3, 1999.
^ Haruhiko Okumura. History of Data Compression in Japan. Archived on January 10, 2016.
^ Hargreaves, Shawn [pl], et al. Allegro source code: lzss.c. Accessed on July 13, 2016.
^ Korth, Martin. "GBATEK LZ Decompression Functions". problemkaputt.de. Retrieved 7 June 2022.
^ "kext_tools/compression.c". GitHub. Apple Open Source. Retrieved 28 December 2019.

Data compression methods

Lossless

Entropy type	Adaptive coding Arithmetic Asymmetric numeral systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Unary Universal Exp-Golomb Fibonacci Gamma Levenshtein
Dictionary type	Byte pair encoding Lempel–Ziv 842 LZ4 LZJB LZO LZRW LZSS LZW LZWL Snappy
Other types	BWT CTW CM Delta Incremental DMC DPCM Grammar Re-Pair Sequitur LDCT MTF PAQ PPM RLE
Hybrid	LZ77 + Huffman Deflate LZX LZS LZ77 + ANS LZFSE LZ77 + Huffman + ANS Zstandard LZ77 + Huffman + context Brotli LZSS + Huffman LHA/LZH LZ77 + Range LZMA LZHAM bzip2 (RLE + BWT + MTF + Huffman)

Lossy

Transform type	Discrete cosine transform DCT MDCT DST FFT Wavelet Daubechies DWT SPIHT
Predictive type	DPCM ADPCM LPC ACELP CELP LAR LSP WLPC Motion Compensation Estimation Vector Psychoacoustic

Audio

Concepts	Bit rate ABR CBR VBR Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Silence compression Sound quality Speech coding Sub-band coding
Codec parts	A-law μ-law DPCM ADPCM DM FT FFT LPC ACELP CELP LAR LSP WLPC MDCT Psychoacoustic model

Image

Concepts	Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image Texture compression
Methods	Chain code DCT Deflate Fractal KLT LP RLE Wavelet Daubechies DWT EZW SPIHT

Video

Concepts	Bit rate ABR CBR VBR Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality
Codec parts	DCT DPCM Deblocking filter Lapped transform Motion Compensation Estimation Vector Wavelet Daubechies DWT

Theory

Community

Hutter Prize
Global Data Compression Competition
encode.su

People

Matt Mahoney
Mark Adler

This page was last edited on 5 March 2024, at 19:15

From Wikipedia, the free encyclopedia

YouTube Encyclopedic

Transcription

Example

Implementations

See also

References