Abstract

Convolutional networks have driven major advances in computer vision in recent years. The design of deep architectures, loss functions, and the curation of large, diverse datasets have furthered progress in many applied computer vision tasks. How data is represented to a network guides feature discovery and must be carefully considered in order to maximize performance on any applied task. We introduce novel input representations and associated architectural techniques to better utilize them such as complementary loss terms and network structure. We demonstrate the impact of these approaches on classification and matching tasks which involve shape and varied illumination. We show that these more robust features result in an increase in accuracy toward these tasks. We first consider the representation of objects for 3D object recognition. Convolutional networks designed for this task typically represent 3D objects either as a set of 2D images or as a volume. In the process of collecting this representation, critical shape information is lost. We augment the volumetric representation by computing and encoding the shape information in the form of mean curvature. This allows a convolutional net to discover shape features toward the task of 3D object recognition. We further consider the process of learning features toward image classification. Many existing deep convolutional networks have been highly successful with this problem. We provide a method that looks again at the misclassified training data by composing an ensemble with the base high performing network. We specialize a second network on the misclassified training examples and composite the two networks together to provide greater accuracy without additional training data or hyperparameter tuning as typical to ensemble approaches. We next transform the representation of indoor scene images by varying illumination for image matching. We use a relighting convolutional network to generate a set of varied illumination images per view. We perform matching across this set of images under many lighting conditions rather than single images. Aggregating these feature matches results in a set of correct matches which is both larger and more spatially dense than the set obtained from a single illumination condition alone. Lastly, we examine features under varied illumination and appearance in outdoor settings toward scene classification. Many scene classification networks and datasets introduce additional constraints on scene appearance such as restricting time of day and giving semantic sub-categories for weather conditions to limit appearance changes for classification. Instead, we address varied appearance in outdoor scenes by transforming the input representation and architecture to support the discovery of features robust to varied appearance. We introduce a new multi-input convolutional network which takes in a set of varied appearance images for a single scene to learn robust features during training time. We additionally introduce a novel loss term, the dissimilarity loss, which encourages the network to minimize the L2 difference across combinations of features to encourage similar activations over the set of appearance changes per scene. We also provide a distinct method from training to test single image scene classification with our network, that is, to duplicate the test image across the set. This new procedure for training features with a distinct input representation from test allows robust feature discovery over a variety of outdoor appearance changes while supporting traditional classification testing. Finally, we collect and sanitize a first of its kind dataset of Varying Outdoor Scenes labeled for scene classification with over 28k images spanning 38 categories. We compare accuracy of our network to competitive scene classification baselines, demonstrating our network's accuracy outperforms by a significant margin.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2021

Semester

Spring

Advisor

Foroosh, Hassan

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0008448; DP0024123

URL

https://purls.library.ucf.edu/go/DP0024123

Language

English

Release Date

May 2022

Length of Campus-only Access

1 year

Access Status

Doctoral Dissertation (Open Access)

STARS Citation

Braeger, Sarah, "Improving Matching and Classification Through Deep Learning of Structure and Varying Illumination" (2021). Electronic Theses and Dissertations, 2020-2023. 477.
https://stars.library.ucf.edu/etd2020/477

Download

Included in

Computer Sciences Commons

COinS

Electronic Theses and Dissertations, 2020-2023

Improving Matching and Classification Through Deep Learning of Structure and Varying Illumination

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Browse Advisors

Explore

Connect

Electronic Theses and Dissertations, 2020-2023

Improving Matching and Classification Through Deep Learning of Structure and Varying Illumination

Author

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Share

Browse Advisors

Explore

Connect