Abstract
In this dissertation, we develop algorithms and variations on the Positional Burrows-Wheeler Transform (PBWT). The PBWT is a data structure that stores M binary strings of length N while allowing efficient search. We develop the dynamic-PBWT (d-PBWT). The d-PBWT is a variation of the PBWT that allows its relevant algorithms to run with unchanged time complexity, but also allows efficient insertion and deletion of haplotypes. We provide insertion and deletion algorithms on the PBWT with average case O(N) time complexity. We also improve upon the query algorithms for the PBWT. Durbin described a set maximal match query algorithm on the PBWT and claimed O(N) time complexity. Naseri et al. described a long match query algorithm on the PBWT using additional data structures (LEAP arrays) in claimed O(N+c) time complexity, where c is the number of matches outputted. We showed these bounds to be incorrect in the worst case and provided set maximal match and long match query algorithms that do have these time complexities. Furthermore, we develop a new formulation of haplotype threading, the Minimal Positional Substring Cover (MPSC). We solve the MPSC in O(N) time. Then, we solve variants of the MPSC problem: leftmost MPSC, rightmost MPSC, and set maximal match only MPSC. Using these variants to bound the solution space, we are able to represent all possible MPSCs in efficiently. Then, we solve variants that may be more biologically useful: length maximal MPSC, h-MPSC, and L-MPSC. All the MPSC problems are solved in O(N) time given a PBWT of the reference panel. Finally, we show the biological usefulness of the MPSC formulation using an imputation benchmark.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2023
Semester
Spring
Advisor
Zhang, Shaojie
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Identifier
CFE0009595; DP0027618
URL
https://purls.library.ucf.edu/go/DP0027618
Language
English
Release Date
May 2024
Length of Campus-only Access
1 year
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Sanaullah, Ahsan, "Algorithms and Variations on the Positional Burrows-Wheeler Transform and Their Applications" (2023). Electronic Theses and Dissertations, 2020-2023. 1651.
https://stars.library.ucf.edu/etd2020/1651