Keywords
AI-assisted item development, educational assessment, psychometrics, classical test theory, item response theory, distractor analysis
Abstract
Generative artificial intelligence (AI) has the potential to transform assessment item development by increasing efficiency and scalability. However, empirical evidence regarding the psychometric quality of AI-assisted assessment items remains limited, particularly in high-stakes testing contexts. The purpose of this study was to evaluate the psychometric properties of assessment items developed with AI assistance compared to traditionally authored items within the context of the CIA Part One examination.
Using a field-test dataset from a globally administered professional certification examination, this study examined item difficulty, discrimination, and distractor functioning. It also explored regional effects on performance. Analyses were conducted using Classical Test Theory (CTT), the two-parameter logistic Item Response Theory (2PL IRT) model, and Hierarchical Linear Modeling (HLM) to account for the nested structure of responses across geographic regions.
Results indicated that AI-assisted items were systematically easier than traditionally authored items across both CTT and IRT frameworks. Despite these differences in item difficulty, AI-assisted items demonstrated discrimination comparable to traditionally authored items. Analyses of distractor functioning revealed that AI-assisted items exhibited a lower proportion of functioning distractors, identifying distractor quality as a key challenge in AI-assisted item development. Multilevel modeling results further indicated that the difference in item difficulty remained consistent after accounting for regional variation. Overall, the findings highlight the importance of empirical evaluation and support the use of AI-assisted item development as a complementary approach in high-stakes professional certification programs.
Completion Date
2026
Semester
Spring
Committee Chair
Sivo, Stephen
Degree
Doctor of Philosophy (Ph.D.)
College
College of Community Innovation and Education
Department
Methodology, Measurement, and Analysis
Format
Document Type
Dissertation
Identifier
DP0053210
Release Date
5-15-2028
STARS Citation
Ouazzani, Mami M., "Psychometric comparison of AI-assisted and traditional assessment item development" (2026). Graduate Studies Theses and Dissertations 2026. 145.
https://stars.library.ucf.edu/gradstudies_etd_2026/145
Accessibility Statement
This item was created or digitized prior to April 24, 2027, or is a reproduction of legacy media created before that date. It is preserved in its original, unmodified state specifically for research, reference, or historical recordkeeping. In accordance with the ADA Title II Final Rule, the University Libraries provides accessible versions of archival materials upon request. To request an accommodation for this item, please submit an accessibility request form.