Keywords

machine learning; artificial intelligence; transformers; music recommendation

Abstract

The popularity of AI audio applications is growing, it is used in chatbots, automated voice translation, virtual assistants, and text-to-speech translation. Audio classification is crucial in today’s world with a growing need to sort and classify millions of existing audio data with increasing amounts of new data uploaded over time. In the area of classification lies the difficult and lucrative problem of music recommendation. Research in music recommendation has trended over time towards collaborative-based approaches utilizing large amounts of user data. These approaches tend to deal with the cold-start problem of insufficient data and are costly to train. We look to recent advances in music generation to develop a content-based method utilizing a joint embedding space to link text with music audio. This approach has not been previously applied to music recommendation. In this thesis, we will examine the joint embedding methods used by recent AI music generation models and introduce a music recommendation system using joint embeddings. This music recommendation system can avoid cold-start, reduce training costs for music recommendation, and serve as the foundation for a cost-efficient content-based multimedia recommendation system. The current model trained on MusicCaps recommends the correct song per tag input within the top 50%-80% of all songs about 65%-70% of the time and we expect better results after further training.

Thesis Completion Year

2024

Thesis Completion Semester

Spring

Thesis Chair

Ewetz, Rickard

College

College of Engineering and Computer Science

Department

Computer Science

Thesis Discipline

Computer Science

Language

English

Access Status

Open Access

Length of Campus Access

None

Campus Location

Orlando (Main) Campus

Share

COinS
 

Rights Statement

In Copyright