Abstract
The work presented in this dissertation deals with establishing efficient methods for solving some algorithmic problems, which have applications to Bioinformatics. After a short introduction in Chapter 1, an algorithm for genome mappability problem is presented in Chapter 2. Genome mappability is a measure for the approximate repeat structure of the genome with respect to substrings of specific length and a tolerance to define the number of mismatches. The similarity between reads is measured by using the Hamming distance function. Genome mappability is computed for each position in the string and has several applications in designing high-throughput short-read sequencing experiments. Chapter 3, presents an algorithm to compute the Average Common Substring of two input sequences in their run-length encoded format. The distance between them based on the Average Common Substring measure can be computed in linearithmic time and linear space proportional to the total length of sequences after run-length encoding. Chapter 4, presents a method that produces a better approximation for Average Common Substring calculations where we are allowed to have mismatches. This method is applicable to the alignment-free comparison of biological sequences at highly competitive speed. Finally, in Chapter 5, we present two algorithms to efficiently decode the Suffix Array/Inverse Suffix Array of the reveres text, by using the FM-index of the forward text. Additionally, our experimental results are competitive when compared to the standard approach of maintaining the FM-Index for both the forward and the reverse text in approximate string-matching applications.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu.
Graduation Date
2020
Semester
Fall
Advisor
Valliyil Thankachan, Sharma
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Format
application/pdf
Identifier
CFE0008773;DP0025504
URL
https://purls.library.ucf.edu/go/DP0025504
Language
English
Release Date
6-15-2022
Length of Campus-only Access
1 year
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Hooshmand, Sahar, "Efficient String Algorithms with Applications in Bioinformatics" (2020). Electronic Theses and Dissertations, 2020-2023. 802.
https://stars.library.ucf.edu/etd2020/802