Flow cytometry is a popular analytical cell-biology instrument that uses specific wavelengths of light to profile heterogeneous populations of cells at the individual level. Current cytometers have the capability of analyzing up to 20 parameters on over a million cells, but despite the complexity of these datasets, a typical workflow relies on subjective labor-intensive manual sequential analysis. The research presented in this dissertation provides two machine learning methods to increase the objectivity, efficiency, and discovery in flow cytometry data analysis. The first, a supervised learning method, utilizes previously analyzed data to evaluate new flow cytometry files containing similar parameters. The probability distribution of each dimension in a file is matched to each related dimension of a reference file through color indexing and histogram intersection methods. Once a similar reference file is selected the cell populations previously classified are used to create a tailored support vector machine capable of classifying cell populations as an expert would. This method has produced results highly correlated with manual sequential analysis, providing an efficient alternative for analyzing a large number of samples. The second, a novel unsupervised method, is used to explore and visualize single-cell data in an objective manner. To accomplish this, a hypergraph sampling method was created to preserve rare events within the flow data before divisively clustering the sampled data using singular value decomposition. The unsampled data is added to the discovered set of clusters using a support vector machine classifier, and the final analysis is displayed as a minimum spanning tree. This tree is capable of distinguishing rare subsets of cells comprising of less than 1% of the original data.
Jha, Sumit Kumar
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Sassano, Emily, "Machine Learning Methods for Flow Cytometry Analysis and Visualization" (2018). Electronic Theses and Dissertations, 2004-2019. 5964.