Keywords

Text indexing; Data structures; Query optimization; Bioinformatics; Algorithm analysis

Abstract

This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern's occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2021

Semester

Fall

Advisor

Valliyil Thankachan, Sharma

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0008792; DP0026071

URL

https://purls.library.ucf.edu/go/DP0026071

Language

English

Release Date

December 2021

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Subjects

Text processing (Computer science); Data structures (Computer science)--Design; Algorithms--Research; Indexing--Computer programs; Indexing--Data processing

STARS Citation

Abedin, Paniz, "Efficient Data Structures for Text Processing Applications" (2021). Electronic Theses and Dissertations, 2020-2023. 821.
https://stars.library.ucf.edu/etd2020/821

Download

Included in

Databases and Information Systems Commons

COinS

Accessibility Statement

This item was created or digitized prior to April 24, 2027, or is a reproduction of legacy media created before that date. It is preserved in its original, unmodified state specifically for research, reference, or historical recordkeeping. In accordance with the ADA Title II Final Rule, the University Libraries provides accessible versions of archival materials upon request. To request an accommodation for this item, please submit an accessibility request form.

Electronic Theses and Dissertations, 2020-2023

Efficient Data Structures for Text Processing Applications

Keywords

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

Subjects

STARS Citation

Included in

Accessibility Statement

Browse Advisors

Explore

Connect

Electronic Theses and Dissertations, 2020-2023

Efficient Data Structures for Text Processing Applications

Author

Keywords

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

Subjects

STARS Citation

Included in

Share

Accessibility Statement

Browse Advisors

Explore

Connect