ORCID
https://orcid.org/0009-0005-3108-5304
Keywords
Vulnerable Road Users, Pedestrian Safety, Computer Vision, Deep Learning, Intelligent Transportation Systems, Behavior Prediction
Abstract
Vulnerable Road Users (VRUs), such as pedestrians and cyclists, are among the most at-risk participants in traffic, making their safety a key priority for intelligent transportation systems (ITS). Accurate perception and understanding of VRU behavior are essential for proactive accident prevention and the design of human-centric mobility systems. Computer vision has become a core component of ITS, enhancing VRU safety. This thesis introduces a human behavior-aware transformer-based framework for VRU intention prediction and presents VRU-Accident, a large-scale benchmark enabling systematic assessment of multimodal large language models (MLLMs) in understanding accident scenarios involving VRUs.
First, the proposed framework leverages multi-modal cues, including 3D pose estimation and spatio-temporal trajectories, to predict VRU crossing intentions at intersections. By combining a geometric-invariant representation with temporal attention and pose- and context-aware embeddings, the model captures subtle indicators such as body orientation, motion patterns, and environmental cues influencing crossing decisions. The framework demonstrates robust performance across diverse intersection scenarios, showing high reliability and consistent predictive capability under varying conditions.
To enable realistic and comprehensive evaluation, this thesis also introduces VRU-Accident, the first vision-language benchmark for real-world accident scenarios involving VRUs. VRU-Accident contains 1,000 dashcam accident videos with over 6,000 safety-critical question-answer pairs and 1,000 dense scene descriptions. This dataset supports systematic evaluation of MLLMs in accident-related video question answering and dense scene captioning, providing a critical resource for assessing safety-critical AI systems.
Together, these contributions advance VRU safety research by offering both a behavior-aware prediction framework and a comprehensive benchmark for accident understanding. The proposed methods can enhance proactive safety interventions in ITS, support the development of autonomous driving systems, and inform policymaking for safer and more inclusive transportation networks.
Completion Date
2025
Semester
Fall
Committee Chair
Abdel-Aty, Mohamed
Degree
Master of Science in Civil Engineering (M.S.C.E.)
College
College of Engineering and Computer Science
Department
Civil, Environmental and Construction Engineering
Format
Identifier
DP0029799
Document Type
Thesis
Campus Location
Orlando (Main) Campus
STARS Citation
Kim, Younggun, "Improving Vulnerable Road Users' Safety Through Computer Vision-Based Crossing Direction Prediction And Multimodal Large Language Model-Based Accident Scene Description" (2025). Graduate Thesis and Dissertation post-2024. 465.
https://stars.library.ucf.edu/etd2024/465