Title
Preprocessing text to improve compression ratios
Abstract
The use of a text processing algorithm that can improve the compression ratio of standard data compression algorithms, in particular 'bzip2', is discussed. This new text processing algorithm, bzip, yields and improvements compression ratio of up to 20% over bzip2. The encoding for the method does not use long sequences of single characters ('*'), because this would interfere with bzip2's front-end run-length encoder. Instead the preprocessor generates character sequences such as '*rstuvwzaA', providing bzip2 with a strong local context within each word and at boundaries between words, and thus enhancing the effectiveness of bzip2. Also, how the algorithm can be used on wide area networks such as the Internet is described.
Publication Date
1-1-1998
Publication Title
Data Compression Conference Proceedings
Number of Pages
556-
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
Copyright Status
Unknown
Socpus ID
0031673372 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/0031673372
STARS Citation
Kruse, Holger and Mukherjee, Amar, "Preprocessing text to improve compression ratios" (1998). Scopus Export 1990s. 3424.
https://stars.library.ucf.edu/scopus1990/3424