Abstract

Representing Spatio-temporal information in videos has proven to be a difficult task compared to action recognition in videos involving multiple actions. A single activity consists many smaller actions that can provide a better understanding of the activity. This paper tries to represent the varying information in a scene-graph format in order to answer temporal questions to obtain improved insights for the video, resulting in a directed temporal information graph. This project will use the Action Genome dataset, which is a variation of the charades dataset, to capture pairwise relationships in a graph. The model performs significantly better than the benchmark results of the dataset providing state-of-the-art results in predicate classification. The paper presents a novel Spatio-temporal scene graph for videos, represented as a directed acyclic graph that maximizes the information in the scene. The results obtained in the counting task suggest some interesting finds that are described in the paper. The graph can be used for reasoning with a much lower computational requirement explored in this work among other downstream tasks such as video captioning, action recognition, and more, trying to bridge the gap between videos and textual analysis.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2021

Semester

Summer

Advisor

Da Vitoria Lobo, Niels

Degree

Master of Science (M.S.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0008749;DP0025480

URL

https://purls.library.ucf.edu/go/DP0025480

Language

English

Release Date

August 2021

Length of Campus-only Access

None

Access Status

Masters Thesis (Open Access)

Share

COinS