Abstract
Convolutional networks have driven major advances in computer vision in recent years. The design of deep architectures, loss functions, and the curation of large, diverse datasets have furthered progress in many applied computer vision tasks. How data is represented to a network guides feature discovery and must be carefully considered in order to maximize performance on any applied task. We introduce novel input representations and associated architectural techniques to better utilize them such as complementary loss terms and network structure. We demonstrate the impact of these approaches on classification and matching tasks which involve shape and varied illumination. We show that these more robust features result in an increase in accuracy toward these tasks. We first consider the representation of objects for 3D object recognition. Convolutional networks designed for this task typically represent 3D objects either as a set of 2D images or as a volume. In the process of collecting this representation, critical shape information is lost. We augment the volumetric representation by computing and encoding the shape information in the form of mean curvature. This allows a convolutional net to discover shape features toward the task of 3D object recognition. We further consider the process of learning features toward image classification. Many existing deep convolutional networks have been highly successful with this problem. We provide a method that looks again at the misclassified training data by composing an ensemble with the base high performing network. We specialize a second network on the misclassified training examples and composite the two networks together to provide greater accuracy without additional training data or hyperparameter tuning as typical to ensemble approaches. We next transform the representation of indoor scene images by varying illumination for image matching. We use a relighting convolutional network to generate a set of varied illumination images per view. We perform matching across this set of images under many lighting conditions rather than single images. Aggregating these feature matches results in a set of correct matches which is both larger and more spatially dense than the set obtained from a single illumination condition alone. Lastly, we examine features under varied illumination and appearance in outdoor settings toward scene classification. Many scene classification networks and datasets introduce additional constraints on scene appearance such as restricting time of day and giving semantic sub-categories for weather conditions to limit appearance changes for classification. Instead, we address varied appearance in outdoor scenes by transforming the input representation and architecture to support the discovery of features robust to varied appearance. We introduce a new multi-input convolutional network which takes in a set of varied appearance images for a single scene to learn robust features during training time. We additionally introduce a novel loss term, the dissimilarity loss, which encourages the network to minimize the L2 difference across combinations of features to encourage similar activations over the set of appearance changes per scene. We also provide a distinct method from training to test single image scene classification with our network, that is, to duplicate the test image across the set. This new procedure for training features with a distinct input representation from test allows robust feature discovery over a variety of outdoor appearance changes while supporting traditional classification testing. Finally, we collect and sanitize a first of its kind dataset of Varying Outdoor Scenes labeled for scene classification with over 28k images spanning 38 categories. We compare accuracy of our network to competitive scene classification baselines, demonstrating our network's accuracy outperforms by a significant margin.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2021
Semester
Spring
Advisor
Foroosh, Hassan
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Format
application/pdf
Identifier
CFE0008448; DP0024123
URL
https://purls.library.ucf.edu/go/DP0024123
Language
English
Release Date
May 2022
Length of Campus-only Access
1 year
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Braeger, Sarah, "Improving Matching and Classification Through Deep Learning of Structure and Varying Illumination" (2021). Electronic Theses and Dissertations, 2020-2023. 477.
https://stars.library.ucf.edu/etd2020/477