Keywords

human pose estimation, human mesh recovery, deep-learning, human reconstruction

Abstract

Understanding humans in visual content is fundamental for numerous computer vision applications. Extensive research has been conducted in the field of human pose estimation (HPE) to accurately locate joints and construct body representations from images and videos. Expanding on HPE, human mesh recovery (HMR) addresses the more complex task of estimating the 3D pose and shape of the entire human body. HPE and HMR have gained significant attention due to their applications in areas such as digital human avatar modeling, AI coaching, and virtual reality [135]. However, HPE and HMR come with notable challenges, including intricate body articulation, occlusion, depth ambiguity, and the limited availability of annotated 3D data. Despite the progress made so far, the research community continues to strive for robust, accurate, and efficient solutions in HPE and HMR, advancing us closer to the ultimate goals in the field.

This dissertation tackles various challenges in the domains of HPE and HMR. The initial focus is on video-based HPE, where we proposed a transformer architecture named PoseFormer [136] to leverage to capture the spatial relationships between body joints and temporal correlations across frames. This approach effectively harnesses the comprehensive connectivity and expressive power of transformers, leading to improved pose estimation accuracy in video sequences. Building upon this, the dissertation addresses the heavy computational and memory burden associated with image-based HMR. Our proposed Feater Map-based Transformer method (FeatER [133]) and a Pooling attention transformer method (POTTER[130]), demonstrate superior performance while significantly reducing computational and memory requirements compared to existing state-of-the-art techniques. Furthermore, a diffusion-based framework (DiffMesh[134]) is proposed for reconstructing high-quality human mesh outputs given input video sequences. These achievements provide practical and efficient solutions that cater to the demands of real-world applications in HPE and HMR.

In this dissertation, our contributions advance the fields of HPE and HMR, bringing us closer to accurate and efficient solutions for understanding humans in visual content.

Completion Date

2023

Semester

Fall

Committee Chair

Chen, Chen

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Format

application/pdf

Identifier

DP0028104

URL

https://purls.library.ucf.edu/go/DP0028104

Language

English

Release Date

December 2023

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Campus Location

Orlando (Main) Campus

Share

COinS