The work presented in this dissertation deals with establishing efficient methods for solving some algorithmic problems, which have applications to Bioinformatics. After a short introduction in Chapter 1, an algorithm for genome mappability problem is presented in Chapter 2. Genome mappability is a measure for the approximate repeat structure of the genome with respect to substrings of specific length and a tolerance to define the number of mismatches. The similarity between reads is measured by using the Hamming distance function. Genome mappability is computed for each position in the string and has several applications in designing high-throughput short-read sequencing experiments. Chapter 3, presents an algorithm to compute the Average Common Substring of two input sequences in their run-length encoded format. The distance between them based on the Average Common Substring measure can be computed in linearithmic time and linear space proportional to the total length of sequences after run-length encoding. Chapter 4, presents a method that produces a better approximation for Average Common Substring calculations where we are allowed to have mismatches. This method is applicable to the alignment-free comparison of biological sequences at highly competitive speed. Finally, in Chapter 5, we present two algorithms to efficiently decode the Suffix Array/Inverse Suffix Array of the reveres text, by using the FM-index of the forward text. Additionally, our experimental results are competitive when compared to the standard approach of maintaining the FM-Index for both the forward and the reverse text in approximate string-matching applications.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu.
Valliyil Thankachan, Sharma
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Hooshmand, Sahar, "Efficient String Algorithms with Applications in Bioinformatics" (2020). Electronic Theses and Dissertations, 2020-. 802.