Keywords

Identity by descent, Local ancestry

Abstract

Identity by descent (IBD) and local ancestry are essential for population genetic inference, such as understanding genealogical relationships and demographic history through genomic data. With the availability of large and high-resolution datasets, new opportunities and computational challenges are brought for efficient and accurate methods to infer IBD segments and local ancestry. This dissertation presents three computational methods designed to assist in population genetic inference. First, RaPID-Query is introduced to efficiently query IBD segments for individual haplotypes over a large genotype dataset. This method is based on the positional Burrows-Wheeler transform (PBWT) algorithm and utilizes a random projection approach. RaPID-Query is able to extract IBD segments with a high accuracy rate, making it useful for genealogical search analysis. Second, Recomb-Mix is introduced as an efficient local ancestry inference (LAI) method for admixed individual haplotypes using reference population panels. It is based on the Li and Stephens model and graph optimization formulation. Recomb-Mix is capable of inferring local ancestry labels in diverse sets of scenarios while being competitive in terms of resource efficiency. The high-quality results it produces prove beneficial for analyzing population demographic history. Finally, this dissertation examines the definitions of IBD segments in ancestral recombination graphs (ARGs) and promotes a recombination-based definition called identity by direct descent (IBDD). An ARG-based PBWT algorithm, referred to as TS-PBWT, is presented to efficiently extract IBDD segments from the tree sequence. The IBDD segments demonstrate robustness against IBD coverage inflation in the centromere region of Chromosome 1 and may be more useful for analyzing distant population demographic history compared to IBD segments defined by the most recent common ancestor. Overall, these computational methods enhance the efficiency of acquiring IBD segments and local ancestries with high resolution, leading to advancing population genetic studies.

Completion Date

2025

Semester

Summer

Committee Chair

Zhang, Shaojie

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Format

PDF

Identifier

DP0029626

Language

English

Document Type

Thesis

Campus Location

Orlando (Main) Campus

Share

COinS