Abstract

In computer vision, context refers to any information that may influence how visual media are understood. Traditionally, researchers have studied the influence of several sources of context in relation to the object detection problem in images. In this dissertation, we present a multifaceted review of the problem of context. Context is analyzed as a source of improvement in the object detection problem, not only in images but also in videos. In the case of images, we also investigate the influence of the semantic context, determined by objects, relationships, locations, and global composition, to achieve a general understanding of the image content as a whole. In our research, we also attempt to solve the related problem of finding the context associated with visual media. Given a set of visual elements (images), we want to extract the context that can be commonly associated with these images in order to remove ambiguity. The first part of this dissertation concentrates on achieving image understanding using semantic context. In spite of the recent success in tasks such as image classification, object detection, image segmentation, and the progress on scene understanding, researchers still lack clarity about computer comprehension of the content of the image as a whole. Hence, we propose a Top-Down Visual Tree (TDVT) image representation that allows the encoding of the content of the image as a hierarchy of objects capturing their importance, co-occurrences, and type of relations. A novel Top-Down Tree LSTM network is presented to learn about the image composition from the training images and their TDVT representations. Given a test image, our algorithm detects objects and determine the hierarchical structure that they form, encoded as a TDVT representation of the image. A single image could have multiple interpretations that may lead to ambiguity about the intentionality of an image. What if instead of having only a single image to be interpreted, we have multiple images that represent the same topic. The second part of this dissertation covers how to extract the context information shared by multiple images. We present a method to determine the topic that these images represent. We accomplish this task by transferring tags from an image retrieval database, and by performing operations in the textual space of these tags. As an application, we also present a new image retrieval method that uses multiple images as input. Unlike earlier works that focus either on using just a single query image or using multiple query images with views of the same instance, the new image search paradigm retrieves images based on the underlying concepts that the input images represent. Finally, in the third part of this dissertation, we analyze the influence of context in videos. In this case, the temporal context is utilized to improve scene identification and object detection. We focus on egocentric videos, where agents require some time to change from one location to another. Therefore, we propose a Conditional Random Field (CRF) formulation, which penalizes short-term changes of the scene identity to improve the scene identity accuracy. We also show how to improve the object detection outcome by re-scoring the results based on the scene identity of the tested frame. We present a Support Vector Regression (SVR) formulation in the case that explicit knowledge of the scene identity is available during training time. In the case that explicit scene labeling is not available, we propose an LSTM formulation that considers the general appearance of the frame to re-score the object detectors.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2017

Semester

Fall

Advisor

daVitoria Lobo, Niels

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Electrical Engineering and Computer Engineering

Degree Program

Electrical Engineering

Format

application/pdf

Identifier

CFE0006922

URL

http://purl.fcla.edu/fcla/etd/CFE0006922

Language

English

Release Date

December 2017

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

STARS Citation

Vaca Castano, Gonzalo, "Understanding images and videos using context" (2017). Electronic Theses and Dissertations. 5685.
https://stars.library.ucf.edu/etd/5685

Download

Included in

Electrical and Computer Engineering Commons

COinS

Electronic Theses and Dissertations

Understanding images and videos using context

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Browse Advisors

Explore

Connect

Electronic Theses and Dissertations

Understanding images and videos using context

Author

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Share

Browse Advisors

Explore

Connect