Title
Multiple-Pattern Matching In Lzw Compressed Files Using Aho-Corasick Algorithm
Abstract
A novel multiple-pattern matching algorithm for LZW compressed files using Aho-Corasick algorithm [1] is reported. Since the LZW trie can be reconstructed without explicit decompression [2], a state transition table is then built from the LZW trie and the AC automaton and is used in our algorithm for fast multiple-pattern searching. The algorithm takes O(mt+n+r) time with O(mt) extra space, where n is the size of the compressed file, m is the total size of all patterns, t is the size of the LZW trie and r is the number of occurrences of the patterns. The algorithm is compared with a similar algorithm developed by Kida [3]. Extensive experiments have been conducted to test the performance of the algorithm and to compare with other well-known compressed pattern matching algorithms, particularly Kida's algorithm. The results show that the proposed algorithm is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. The proposed algorithm is efficient for large files and it is particularly efficient when being applied on archival search if the archives are compressed with a common LZW trie. © 2005 IEEE.
Publication Date
10-26-2005
Publication Title
Data Compression Conference Proceedings
Number of Pages
482-
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
Copyright Status
Unknown
Socpus ID
26944465371 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/26944465371
STARS Citation
Tao, Tao and Mukherjee, Amar, "Multiple-Pattern Matching In Lzw Compressed Files Using Aho-Corasick Algorithm" (2005). Scopus Export 2000s. 3626.
https://stars.library.ucf.edu/scopus2000/3626