Keywords
Cluster Analysis, K-means Clustering, Gaussian Mixture Models, AI-Driven Data Analysis, Adjusted Rand Index, Normalized Mutual Information Dimensionality Reduction in Clustering, Pattern Recognition
Abstract
This study presents a detailed comparison of Kmeans and Gaussian Mixture Model (GMM) clustering algorithms, illustrating their unique capabilities and limitations across various synthetic datasets. By utilizing metrics such as the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), the research provides nuanced insights into how these algorithms handle datasets with varying structures and complexities. For instance, while both K-means and GMM show robust performance on well-separated clusters, GMM demonstrates a distinct advantage in scenarios with overlapping clusters or unbalanced data distributions. Conversely, K-means excels in identifying clear, distinct groupings, highlighting its utility in simpler clustering contexts. This study contributes to a deeper understanding of the operational characteristics of these popular clustering algorithms, potentially guiding the selection of appropriate methods for complex data analysis tasks in practice.
Semester
Spring 2024
Course Name
STA 6367 Data Science 2
Instructor Name
Rui Xie
STARS Citation
Alipour Yengejeh, Amir, "Optimizing AI with Advanced Data Structuring: A Comparative Analysis of K-means and GMM Clustering Techniques" (2024). Data Science and Data Mining. 21.
https://stars.library.ucf.edu/data-science-mining/21
Accessibility Status
PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker