Wednesday, May 28, 2025

Harnessing Large Language Models for Automated Essay Scoring in Public Health

Shabnam Mehra, University of South Florida

Contributor

University of Central Florida. Faculty Center for Teaching and Learning; University of Central Florida. Division of Digital Learning; Teaching and Learning with AI Conference (2025 : Orlando, Fla.)

Location

Space Coast

Start Date

28-5-2025 2:45 PM

End Date

28-5-2025 3:10 PM

Publisher

University of Central Florida Libraries

Keywords:

Automated essay scoring; Large language models; Public health education; Grading accuracy; Prompt engineering

Subjects

Academic writing--Evaluation; English language--Writing--Evaluation; Grading and marking (Students)--Computer-assisted instruction; Academic writing--Computer-assisted instruction; Language and education--Evaluation

Description

Automated Essay Scoring (AES) using Large Language Models (LLMs) has emerged as a promising solution for assessing student writing, offering faster grading, unbiased evaluation, and detailed feedback. This study investigates the performance of two commonly used LLM-based tools - ChatGPT, Copilot, in scoring essays from a Public Health Introduction course. It evaluates the alignment of these tools with human rater judgments and examines the impact of prompt engineering on scoring accuracy. Quadratic Weighted Kappa (QWK) scores were used to measure agreement between the models and the manual grader, while deviations were analyzed to identify discrepancies for each criterion. Results indicate that ChatGPT demonstrates higher alignment with manual grading (QWK = 0.5342) compared to Copilot (QWK = 0.2186), with Copilot exhibiting greater score variability and deviations across criteria. Despite its better performance, ChatGPT underestimates scores in specific areas such as recommendations, highlighting areas for improvement. This study underscores the potential of LLMs for AES while identifying critical areas for optimization, paving the way for their effective integration into educational assessment frameworks.

Language

eng

Type

Presentation

Format

application/vnd.openxmlformats-officedocument.presentationml.presentation

Collection

Teaching and Learning with AI Conference Presentations

Series

Teaching and Learning with AI Conference 2025

Rights Statement

Audience

Faculty, Students, Instructional designers

Recommended Citation

Mehra, Shabnam, "Harnessing Large Language Models for Automated Essay Scoring in Public Health" (2025). Teaching and Learning with AI Conference Presentations. 38.
https://stars.library.ucf.edu/teachwithai/2025/wednesday/38

This document is currently not available here.

COinS

May 28th, 2:45 PM May 28th, 3:10 PM

Harnessing Large Language Models for Automated Essay Scoring in Public Health

Space Coast

Wednesday, May 28, 2025

Harnessing Large Language Models for Automated Essay Scoring in Public Health

Contributor

Location

Start Date

End Date

Publisher

Keywords:

Subjects

Description

Language

Type

Format

Collection

Series

Rights Statement

Audience

Recommended Citation

Conference Links

Explore

Connect

Wednesday, May 28, 2025

Harnessing Large Language Models for Automated Essay Scoring in Public Health

Presenter Information

Contributor

Location

Start Date

End Date

Publisher

Keywords:

Subjects

Description

Language

Type

Format

Collection

Series

Rights Statement

Audience

Recommended Citation

Share

Conference Links

Explore

Connect