Keywords

Fire Debris, Likelihood Ratio, Machine Learning, Simulation Study, Repeatability, Reproducibility

Abstract

The importance of using statistical evidence to help inform the interpretation of forensic evidence has grown greater over time. This is especially evident in the field of forensic fire debris analysis, where the reliability and reproducibility of statistical results are crucial. This thesis investigates how the application of 5 different common machine learning methods on an in-silico fire debris dataset can be reduced to a univariate forensic score to create score-based likelihood ratios (SLRs), and how these ratios should be interpreted. These SLRs are crucial in supporting or rejecting a Prosecutor’s or Defense’s hypothesis, thus making the previously mentioned reliability and reproducibility essential. Different likelihood ratio (LR) estimation methods are tested based on the scores generated by the machine learning models. These estimation methods include Parametric estimation (PE), Kernel-density estimation (KDE), and Logistic Regression estimation (LRE), and the properties of each are analyzed on an in-silico fire debris dataset meant to mimic real-world data. The second part of this thesis investigates the bias and variance inherent in these estimation methods. Some simulations are performed to demonstrate that the LRE method is very sensitive to class imbalances. Additionally, we show how the derivation of the variance of the PE method via its relation to the Receiver Operating Curve (ROC) can be compared to the results of the actual variance found from the data. The effect of violations of underlying assumptions is then explored in relation to the stability of the variance. This work validates what previous theoretical results have shown but in a study on empirical in-silico fire debris data, demonstrating known methodology extends beyond theory, and into a practical setting.

Completion Date

2026

Semester

Spring

Committee Chair

Liansheng, Tang

Degree

Master of Science (M.S.)

College

College of Sciences

Department

Statistics and Data Science

Format

PDF

Document Type

Thesis

Identifier

DP0053186

Share

COinS
 

Accessibility Statement

This item was created or digitized prior to April 24, 2027, or is a reproduction of legacy media created before that date. It is preserved in its original, unmodified state specifically for research, reference, or historical recordkeeping. In accordance with the ADA Title II Final Rule, the University Libraries provides accessible versions of archival materials upon request. To request an accommodation for this item, please submit an accessibility request form.