In this dissertation, we develop algorithms and variations on the Positional Burrows-Wheeler Transform (PBWT). The PBWT is a data structure that stores M binary strings of length N while allowing efficient search. We develop the dynamic-PBWT (d-PBWT). The d-PBWT is a variation of the PBWT that allows its relevant algorithms to run with unchanged time complexity, but also allows efficient insertion and deletion of haplotypes. We provide insertion and deletion algorithms on the PBWT with average case O(N) time complexity. We also improve upon the query algorithms for the PBWT. Durbin described a set maximal match query algorithm on the PBWT and claimed O(N) time complexity. Naseri et al. described a long match query algorithm on the PBWT using additional data structures (LEAP arrays) in claimed O(N+c) time complexity, where c is the number of matches outputted. We showed these bounds to be incorrect in the worst case and provided set maximal match and long match query algorithms that do have these time complexities. Furthermore, we develop a new formulation of haplotype threading, the Minimal Positional Substring Cover (MPSC). We solve the MPSC in O(N) time. Then, we solve variants of the MPSC problem: leftmost MPSC, rightmost MPSC, and set maximal match only MPSC. Using these variants to bound the solution space, we are able to represent all possible MPSCs in efficiently. Then, we solve variants that may be more biologically useful: length maximal MPSC, h-MPSC, and L-MPSC. All the MPSC problems are solved in O(N) time given a PBWT of the reference panel. Finally, we show the biological usefulness of the MPSC formulation using an imputation benchmark.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Campus-only Access)
Sanaullah, Ahsan, "Algorithms and Variations on the Positional Burrows-Wheeler Transform and Their Applications" (2023). Electronic Theses and Dissertations, 2020-. 1651.
Restricted to the UCF community until May 2024; it will then be open access.