PPP Predictor Compression Protocol

 
    The LZ77 compression algorithm of Ziv and Lempel is a dictionary-based algorithm which uses a history array or sliding-window of consecutive already-seen symbols. However, the input symbols need not be contiguous in the history array: they can be entered in the array using a context-driven hash function. The same hash function is thus able to "predict" the symbols using the context hash or position in the array. If the input symbol matches the symbol in the hashed window position, a boolean TRUE is transmitted. Otherwise, a boolean FALSE is output, and the mispredicted symbol is also emitted. The window position is then updated with the mispredicted symbol. 

   The above algorithm, due to Kasman E. Thomas (1993), is employed in the PPP Predictor Compression Protocol. According to the Data Compression FAQ of the internet newsgroup comp.compression, it was first described by T. Raita and J. Teuhola (1987) in the paper "Predictive text compression by hashing", ACM Conference on Information Retrieval. 

   The algorithm, now widely-known as LZP (LZ Predictive), is based on the idea that "the current context is an excellent predictor of the next symbol" [Fenwick 1997] [Bloom 1996]. It has since then been developed with varying context coding functions along with different symbol ranking schemes. Bloom, for example, has many advanced versions of LZP. (It is interesting to note that the same context-based prediction method was already anticipated by C. E. Shannon in his 1951 paper "Prediction and Entropy of Printed English," Bell System Technical Journal, Vol. 30, January 1951, pp. 50-64.)
 

References:

1.] P. Fenwick, "Symbol Ranking Text Compression with Shannon Recodings," Journal of Universal Computer Science, Vol. 3, No. 2, Feb. 1997, pp.70-85. 

2.] C. Bloom, "LZP: a new data compression algorithm".


Source Code: 

PRAQ.ZIP - includes a PPP predictor implementation which improves compression by counting the number of prediction bits (i.e., correctly-predicted symbols) and encoding the resulting sum as a variable-length integer.  The input bytes are symbol-ranked according to recency of occurrence (MTF), and encoded using the same variable-length coding function.

 

ċ
LZPN.zip
(36k)
Gerald R Tamayo,
Aug 15, 2021, 2:53 PM
ċ
praq.c
(5k)
Gerald R Tamayo,
Jan 20, 2020, 1:42 AM
ċ
praq.zip
(143k)
Gerald R Tamayo,
Sep 1, 2017, 7:36 PM
ċ
praq1.c
(6k)
Gerald R Tamayo,
Jan 20, 2020, 1:43 AM
ċ
praq2.c
(5k)
Gerald R Tamayo,
Jan 20, 2020, 1:43 AM
ċ
praq3.c
(5k)
Gerald R Tamayo,
Jan 20, 2020, 1:44 AM
Comments