Abstract

Deep learning has achieved tremendous success on various computer vision tasks. However, deep learning methods and models are usually computationally expensive, making it hard to train and deploy, especially on resource-constrained devices. In this dissertation, we explore how to improve the efficiency and effectiveness of deep learning methods from various perspectives. We first propose a new learning method to learn computationally adaptive representations. Traditional neural networks are static. However, our method trains adaptive neural networks that can adjust their computational cost during runtime, avoiding the need to train and deploy multiple networks for dynamic resource budgets. Next, we extend our method to learn adaptive spatiotemporal representations to solve various video understanding tasks such as video recognition and action detection. Then, inspired by the proposed adaptive learning method, we propose a new regularization method to learn better representations for the full network. Our method regularizes the full network by ensuring that its predictions align with those of its sub-networks when fed with differently transformed input data. This approach facilitates the learning of more generalized and robust representations by the full network. Besides learning methods, designing good network architecture is also critical to learn good representations. Neural architecture search (NAS) has shown great potential in designing novel network structures, but its high computational cost is a significant limitation. To address this issue, we present a new short-training based NAS method that achieves superior performance compared to previous methods, while requiring significantly less search cost. Finally, with the recent advancements in large-scale image foundation models, we present an efficient finetuning method to adapt pre-trained image foundation models for video understanding. Our method significantly reduces training costs compared to traditional full fine-tuning, while delivering competitive performance across multiple video benchmarks. It is both simple and versatile, making it easy to leverage stronger image foundation models in the future.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2023

Semester

Summer

Advisor

Chen, Chen

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Identifier

CFE0009816; DP0027924

URL

https://purls.library.ucf.edu/go/DP0027924

Language

English

Release Date

August 2023

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

STARS Citation

Yang, Taojiannan, "Towards Efficient and Effective Representation Learning for Image and Video Understanding" (2023). Electronic Theses and Dissertations, 2020-2023. 1738.
https://stars.library.ucf.edu/etd2020/1738

Download

Included in

Computer Sciences Commons

COinS

Electronic Theses and Dissertations, 2020-2023

Towards Efficient and Effective Representation Learning for Image and Video Understanding

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Browse Advisors

Explore

Connect

Electronic Theses and Dissertations, 2020-2023

Towards Efficient and Effective Representation Learning for Image and Video Understanding

Author

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Share

Browse Advisors

Explore

Connect