Incident reporting systems are an integral part of any organization seeking to increase the safety of their operation by gathering data on past events, which can then be used to identify ways of mitigating similar events in the future. In order to analyze trends and common issues with regards to the human element in the system, reports are often classified according to a human factors taxonomy. Lately, machine learning algorithms have become popular tools for automated classification of text; however, performance of such algorithms varies and is dependent on several factors. In supervised machine learning tasks such as text classification, the algorithm is trained with features and labels, where the features here are a function of the incident reports themselves and the labels are supplied by a human annotator, whether that is the reporter or a third person. Aside from the intricacies of building and tuning machine learning models, a subjective classification according to a human factors taxonomy can generate considerable noise and bias. I examined the interdependencies between the features of incident reports, the subjective labeling process, the constraints that the taxonomy itself imposes, and basic characteristics of human factors taxonomies that can influence human, as well as automated, classification. In order to evaluate these challenges, I trained a machine learning classifier on 17,253 incident reports from the NASA Aviation Safety Reporting System (ASRS) using multi-label classification, and collected labels from six human annotators for a subset of 400 incident reports each, resulting in a total of 2,400 individual annotations. Results show that, in general, reliability of annotation for the set of incident reports selected in this study was comparatively low. It was also evident that some human factors labels were more agreed upon than others, sometimes related to the presence of key words in the reports which map directly to the label. Performance of machine learning annotation followed patterns of human agreement on labels. The high variability of content and quality of narratives has been identified as a major factor for difficulties in annotation. Suggestions on how to improve the data collection and labeling process are provided.


If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date





Jentsch, Florian


Doctor of Philosophy (Ph.D.)


College of Sciences



Degree Program

Psychology; Human Factors Cognitive Psychology




CFE0008302; DP0023739





Release Date

December 2021

Length of Campus-only Access

1 year

Access Status

Doctoral Dissertation (Open Access)