Title

Multiple-Pattern Matching In Lzw Compressed Files Using Aho-Corasick Algorithm

Abstract

A novel multiple-pattern matching algorithm for LZW compressed files using Aho-Corasick algorithm [1] is reported. Since the LZW trie can be reconstructed without explicit decompression [2], a state transition table is then built from the LZW trie and the AC automaton and is used in our algorithm for fast multiple-pattern searching. The algorithm takes O(mt+n+r) time with O(mt) extra space, where n is the size of the compressed file, m is the total size of all patterns, t is the size of the LZW trie and r is the number of occurrences of the patterns. The algorithm is compared with a similar algorithm developed by Kida [3]. Extensive experiments have been conducted to test the performance of the algorithm and to compare with other well-known compressed pattern matching algorithms, particularly Kida's algorithm. The results show that the proposed algorithm is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. The proposed algorithm is efficient for large files and it is particularly efficient when being applied on archival search if the archives are compressed with a common LZW trie. © 2005 IEEE.

Publication Date

10-26-2005

Publication Title

Data Compression Conference Proceedings

Number of Pages

482-

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

Socpus ID

26944465371 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/26944465371

This document is currently not available here.

Share

COinS