Still image emotion recognition (ER) has been receiving increasing attention in recent years due to the tremendous amount of social media content on the Web. Many works offer both categorical and dimensional methods to detect image sentiments, while others focus on extracting the true social signals, such as happiness and anger. Deep learning architectures have delivered great suc- cess, however, their dependency on large-scale datasets labeled with (1) emotion, and (2) valence, arousal and dominance, in categorical and dimensional domains respectively, introduce challenges the community tries to tackle. Emotions offer dissimilar semantics when aroused in different con- texts, however "context-sensitive" ER has been by and large discarded in the literature so far. Moreover, while dimensional methods deliver higher accuracy, they have been less attended due to (1) lack of reliable large-scale labeled datasets, and (2) challenges involved in architecting un- supervised solutions to the problem. Owing to the success offered by multi-modal ER, still image ER in the single-modal domain; i.e. using only still images, remains less resorted to. In this work, (1) we first architect a novel fully automated dataset collection pipeline, equipped with a built-in semantic sanitizer, (2) we then build UCF-ER with 50K images, and LUCFER, the largest labeled ER dataset in the literature with more than 3.6M images, both datasets labeled with emotion and context, (3) next, we build a single-modal context-sensitive ER CNN model, fine-tuned on UCF-ER and LUCFER, (4) we then claim and show empirically that infusing context to the unified training process helps achieve a more balanced precision and recall, while boosting performance, yielding an overall classification accuracy of 73.12% compared to the state of the art 58.3%, (5) next, we propose an unsupervised approach for ranking of continuous emotions in images using canonical polyadic (CP) decomposition, providing theoretical proof that rank-1 CP decomposition can be used as a ranking machine, (6) finally, we provide empirical proof that our method generates a Pearson Correlation Coefficient, outperforming the state of the art by a large margin; i.e. 65.13% (difference) in one experiment and 104.08% (difference) in another, when applied to valence rank estimation.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Balouchian, Pooyan, "Learning Context-sensitive Human Emotions in Categorical and Dimensional Domains" (2020). Electronic Theses and Dissertations, 2020-. 325.