ORCID

0009-0007-3035-4845

Keywords

Fine tuned model, Whisper V3, Emotion recognition, Language, Arabic, AI

Abstract

This thesis investigates the application of Whisper V3 which is a state-of-the-art, multilingual automatic speech recognition model to Arabic Speech Emotion Recognition (SER). Building on the foundational work of Muhammad Firdho, who adapted Whisper architectures for emotion classification tasks, this study extends his approach to the context of Arabic, a language characterized by rich dialectal diversity, complex morphology, and unique prosodic features.Although Whisper V3 was primarily designed for Automatic Speech Recognition (ASR), its latent representations appear to capture paralinguistic cues that can be leveraged for emotion classification. Muhammad Firdho laid the basis for this study by showing the ability of Whisper-based models to interpret expressive aspects beyond transcription, opening the door for their evaluation to emotion identification problems.

In this study, I have used Khalil Emotion Detection Arabic Speech (KEDAS) dataset which is a balanced dataset comprising of a total of 5000 audio samples evenly distributed across five emotional categories (Angry, Sad, Fearful, Happy, and Neutral) used to evaluate the model’s performance in a zero-/few-shot setting. Due to resource limitations, I sampled and used the first 500 audio samples for my analysis. Despite achieving an overall accuracy of approximately 37.2%, the analysis reveals a pronounced bias toward over-predicting the Fearful category (E3), as evidenced by the confusion matrix and statistical chi-square testing.

These findings suggest that while large pretrained models can detect some emotional variance in Arabic speech, additional plain-language adaptation and fine-tuning may be necessary to improve performance. The study further discusses the implications of such biases and the challenges inherent in transferring models trained on multilingual corpora to low-resource, linguistically diverse settings. By highlighting both the potential and limitations of current approaches, this work contributes to a more inclusive and culturally sensitive development of emotion-aware AI systems.

Completion Date

2025

Semester

Spring

Committee Chair

Murray, John

Degree

Master of Arts (M.A.)

College

College of Sciences

Department

Comm & Media, Nicholson School

Identifier

DP0029331

Document Type

Dissertation/Thesis

Campus Location

UCF Downtown

Share

COinS