This dissertation addresses the problem of describing images using visual attributes and textual tags, a fundamental task that narrows down the semantic gap between the visual reasoning of humans and machines. Automatic image annotation assigns relevant textual tags to the images. In this dissertation, we propose a query-specific formulation based on Weighted Multi-view Non-negative Matrix Factorization to perform automatic image annotation. Our proposed technique seamlessly adapt to the changes in training data, naturally solves the problem of feature fusion and handles the challenge of the rare tags. Unlike tags, attributes are category-agnostic, hence their combination models an exponential number of semantic labels. Motivated by the fact that most attributes describe local properties, we propose exploiting localization cues, through semantic parsing of human face and body to improve person-related attribute prediction. We also demonstrate that image-level attribute labels can be effectively used as weak supervision for the task of semantic segmentation. Next, we analyze the Selfie images by utilizing tags and attributes. We collect the first large-scale Selfie dataset and annotate it with different attributes covering characteristics such as gender, age, race, facial gestures, and hairstyle. We then study the popularity and sentiments of the selfies given an estimated appearance of various semantic concepts. In brief, we automatically infer what makes a good selfie. Despite its extensive usage, the deep learning literature falls short in understanding the characteristics and behavior of the Batch Normalization. We conclude this dissertation by providing a fresh view, in light of information geometry and Fisher kernels to why the batch normalization works. We propose Mixture Normalization that disentangles modes of variation in the underlying distribution of the layer outputs and confirm that it effectively accelerates training of different batch-normalized architectures including Inception-V3, Densely Connected Networks, and Deep Convolutional Generative Adversarial Networks while achieving better generalization error.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Mahmoudkalayeh, Mahdi, "Describing Images by Semantic Modeling using Attributes and Tags" (2019). Electronic Theses and Dissertations. 6296.