Keywords
Machine Learning, Human Activities, Pose Estimation
Abstract
This dissertation introduces several technical innovations that improve the ability of machine learning models to recognize a wide range of complex human activities. As human sensor data becomes more abundant, the need to develop algorithms for understanding and interpreting complex human actions has become increasingly important. Our research focuses on three key areas: multi-agent activity recognition, multi-person pose estimation, and multimodal fusion.
To tackle the problem of monitoring coordinated team activities from spatio-temporal traces, we introduce a new framework that incorporates field of view data to predict team performance. Our framework uses Spatial Temporal Graph Convolutional Networks (ST-GCN) and recurrent neural network layers to capture and model the dynamic spatial relationships between agents. The second part of the dissertation addresses the problem of multi-person pose estimation (MPPE) from video data. Our proposed technique (Language Assisted Multi-person Pose estimation) leverages text representations from multimodal foundation models to learn a visual representation that is more robust to occlusion. By infusing semantic information into pose estimation, our approach enables precise estimations, even in cluttered scenes. The final part of the dissertation examines the problem of fusing multimodal physiological input from cardiovascular and gaze tracking sensors to exploit the complementary nature of these modalities. When dealing with multimodal features, uncovering the correlations between different modalities is as crucial as identifying effective unimodal features. This dissertation introduces a hybrid multimodal tensor fusion network that is effective at learning both unimodal and bimodal dynamics.
The outcomes of this dissertation contribute to advancing the field of complex human activity recognition by addressing the challenges associated with multi-agent activity recognition, multi-person pose estimation, and multimodal fusion. The proposed innovations have potential applications in various domains, including video surveillance, human-robot interaction, sports analysis, and healthcare monitoring. By developing intelligent systems capable of accurately recognizing complex human activities, this research paves the way for improved safety, efficiency, and decision-making in a wide range of real-world applications.
Completion Date
2023
Semester
Fall
Committee Chair
Sukthankar, Gita
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Format
application/pdf
Identifier
DP0028042
URL
https://purls.library.ucf.edu/go/DP0028042
Language
English
Release Date
December 2023
Length of Campus-only Access
None
Access Status
Doctoral Dissertation (Open Access)
Campus Location
Orlando (Main) Campus
STARS Citation
Hu, Shengnan, "Exploring the Feasibility of Machine Learning Techniques in Recognizing Complex Human Activities" (2023). Graduate Thesis and Dissertation 2023-2024. 19.
https://stars.library.ucf.edu/etd2023/19