Title

Preprocessing text to improve compression ratios

Abstract

The use of a text processing algorithm that can improve the compression ratio of standard data compression algorithms, in particular 'bzip2', is discussed. This new text processing algorithm, bzip, yields and improvements compression ratio of up to 20% over bzip2. The encoding for the method does not use long sequences of single characters ('*'), because this would interfere with bzip2's front-end run-length encoder. Instead the preprocessor generates character sequences such as '*rstuvwzaA', providing bzip2 with a strong local context within each word and at boundaries between words, and thus enhancing the effectiveness of bzip2. Also, how the algorithm can be used on wide area networks such as the Internet is described.

Publication Date

1-1-1998

Publication Title

Data Compression Conference Proceedings

Number of Pages

556-

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

Socpus ID

0031673372 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/0031673372

This document is currently not available here.

Share

COinS