It is hard to overstate the importance of gesture-based interfaces in many applications nowadays. The adoption of such interfaces stems from the opportunities they create for incorporating natural and fluid user interactions. This highlights the importance of having gesture recognizers that are not only accurate but also easy to adopt. The ever-growing popularity of machine learning has prompted many application developers to integrate automatic methods of recognition into their products. On the one hand, deep learning often tops the list of the most powerful and robust recognizers. These methods have been consistently shown to outperform all other machine learning methods in a variety of tasks. On the other hand, deep networks can be overwhelming to use for a majority of developers, requiring a lot of tuning and tweaking to work as expected. Additionally, these networks are infamous for their requirement for large amounts of training data, further hampering their adoption in scenarios where labeled data is limited. In this dissertation, we aim to bridge the gap between the power of deep learning methods and their adoption into gesture recognition workflows. To this end, we introduce two deep network models for recognition. These models are similar in spirit, but target different application domains: one is designed for segmented gesture recognition, while the other is suitable for continuous data, tackling segmentation and recognition problems simultaneously. The distinguishing characteristic of these networks is their simplicity, small number of free parameters, and their use of common building blocks that come standard with any modern deep learning framework, making them easy to implement, train and adopt. Through evaluations, we show that our proposed models achieve state-of-the-art results in various recognition tasks and application domains spanning different input devices and interaction modalities. We demonstrate that the infamy of deep networks due to their demand for powerful hardware as well as large amounts of data is an unfair assessment. On the contrary, we show that in the absence of such data, our proposed models can be quickly trained while achieving competitive recognition accuracy. Next, we explore the problem of synthetic gesture generation: a measure often taken to address the shortage of labeled data. We extend our proposed recognition models and demonstrate that the same models can be used in a Generative Adversarial Network (GAN) architecture for synthetic gesture generation. Specifically, we show that our original recognizer can be used as the discriminator in such frameworks, while its slightly modified version can act as the gesture generator. We then formulate a novel loss function for our gesture generator, which entirely replaces the need for a discriminator network in our generative model, thereby significantly reducing the complexity of our framework. Through evaluations, we show that our model is able to improve the recognition accuracy of multiple recognizers across a variety of datasets. Through user studies, we additionally show that human evaluators mistake our synthetic samples with the real ones frequently indicating that our synthetic samples are visually realistic. Additional resources for this dissertation (such as demo videos and public source codes) are available at https://www.maghoumi.com/dissertation
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Laviola II, Joseph
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Maghoumi, Mehran, "Deep Recurrent Networks for Gesture Recognition and Synthesis" (2020). Electronic Theses and Dissertations, 2020-. 379.
Restricted to the UCF community until December 2020; it will then be open access.