Title
A Comparison Of Bwt Approaches To String Pattern Matching
Keywords
Binary search; BWT; Compressed pattern matching; FM-index; Q-grams; Suffix arrays
Abstract
Recently a number of algorithms have been developed to search files compressed with the Burrows-Wheeler Transform (BWT) without the need for full decompression first. This allows the storage requirement of data to be reduced through the exceptionally good compression offered by BWT, while allowing fast access to the information for searching by taking advantage of the sorted nature of BWT files. We provide a detailed description of five of these algorithms: BWT-based Boyer-Moore, Binary Search, Suffix Arrays, q-grams and the FM-index, and also present results from a set of extensive experiments that were performed to evaluate and compare the algorithms. Furthermore, we introduce a technique to improve the search times of Binary Search, Suffix Arrays and q-grams by 22% on average, as well as reduce the memory requirement of the latter two by 40% and 31%, respectively. Our results indicate that, while the compressed files of the FM-index are larger than those of the other approaches, it is able to perform searches with considerably less memory. Additionally, when only counting the occurrences of a pattern, or when locating the positions of a small number of matches, it is the fastest algorithm. For larger searches, Binary Search provides the fastest results. Comparative results with non-BWT based search methods are also included. Copyright © 2005 John Wiley & Sons, Ltd.
Publication Date
11-10-2005
Publication Title
Software - Practice and Experience
Volume
35
Issue
13
Number of Pages
1217-1258
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1002/spe.669
Copyright Status
Unknown
Socpus ID
27544477320 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/27544477320
STARS Citation
Firth, Andrew; Bell, Tim; and Mukherjee, Amar, "A Comparison Of Bwt Approaches To String Pattern Matching" (2005). Scopus Export 2000s. 3556.
https://stars.library.ucf.edu/scopus2000/3556