Keywords

Clustering method, Classification method, Mixture of pollen grains, Flow cytometry, Fire debris, Total ion spectrum

Abstract

Finite mixture models have been widely used to cluster data consisting of homogeneous subpopulations. In forensic palynology, pollen is used as a proxy to link individuals or items to a crime scene. Mixtures of pollen data-including willow and mustard and blank samples-were analyzed using flow cytometry. Willow and mustard clusters tend to have multivariate normal distributions, while a background cluster has multivariate non-normal distribution. We propose a finite mixture model capable of handling the mixtures of pollen in terms of univariate and multivariate distribution. The proposed methods are applied in simulated and mixture of pollen datasets.

Finite mixture models typically use an expectation–maximization (EM) algorithm to approximate parameters by maximum likelihood estimation (MLE). Since MLE is used in the M-step, we also apply alternative optimization methods such as Gradient descent and Newton’s method to estimate the parameters. To compare the performance of these optimization methods, processing of time, percent of mislabeling rate, bias, and mean squared error (MSE) are evaluated.

While the first topic focuses on clustering methods, the second explores classification techniques applying to fire debris datasets. The datasets contain total ion chromatogram (TIC) and total ion spectrum (TIS) representing the chemical profiles of materials burned in a fire. In fire investigations, identifying ignitable liquid residues-substances like gasoline or alcohol that can easily catch fire-is crucial for detecting possible arson. Substrate components, on the other hand, are the original materials present at the scene, such as carpet, wood, or fabric, that burned during the fire.

We classified ignitable liquid residues and substrate components by machine learning methods on TIS and TIC datasets. The predictive accuracy and area under the ROC (AUC) of the models was evaluated and compared on both an in-silico test dataset and on an experimental fire debris dataset.

Completion Date

2025

Semester

Fall

Committee Chair

Tang, Larry

Degree

Doctor of Philosophy (Ph.D.)

College

College of Sciences

Department

Statistics and Data science

Format

PDF

Identifier

DP0029745

Document Type

Thesis

Campus Location

Orlando (Main) Campus

Share

COinS