Abstract

The work presented in this dissertation deals with establishing efficient methods for solving some algorithmic problems, which have applications to Bioinformatics. After a short introduction in Chapter 1, an algorithm for genome mappability problem is presented in Chapter 2. Genome mappability is a measure for the approximate repeat structure of the genome with respect to substrings of specific length and a tolerance to define the number of mismatches. The similarity between reads is measured by using the Hamming distance function. Genome mappability is computed for each position in the string and has several applications in designing high-throughput short-read sequencing experiments. Chapter 3, presents an algorithm to compute the Average Common Substring of two input sequences in their run-length encoded format. The distance between them based on the Average Common Substring measure can be computed in linearithmic time and linear space proportional to the total length of sequences after run-length encoding. Chapter 4, presents a method that produces a better approximation for Average Common Substring calculations where we are allowed to have mismatches. This method is applicable to the alignment-free comparison of biological sequences at highly competitive speed. Finally, in Chapter 5, we present two algorithms to efficiently decode the Suffix Array/Inverse Suffix Array of the reveres text, by using the FM-index of the forward text. Additionally, our experimental results are competitive when compared to the standard approach of maintaining the FM-Index for both the forward and the reverse text in approximate string-matching applications.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu.

Graduation Date

2020

Semester

Fall

Advisor

Valliyil Thankachan, Sharma

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0008773;DP0025504

URL

https://purls.library.ucf.edu/go/DP0025504

Language

English

Release Date

6-15-2022

Length of Campus-only Access

1 year

Access Status

Doctoral Dissertation (Open Access)

Share

COinS