Keywords

Cluster Analysis, K-means Clustering, Gaussian Mixture Models, AI-Driven Data Analysis, Adjusted Rand Index, Normalized Mutual Information Dimensionality Reduction in Clustering, Pattern Recognition

Abstract

This study presents a detailed comparison of Kmeans and Gaussian Mixture Model (GMM) clustering algorithms, illustrating their unique capabilities and limitations across various synthetic datasets. By utilizing metrics such as the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), the research provides nuanced insights into how these algorithms handle datasets with varying structures and complexities. For instance, while both K-means and GMM show robust performance on well-separated clusters, GMM demonstrates a distinct advantage in scenarios with overlapping clusters or unbalanced data distributions. Conversely, K-means excels in identifying clear, distinct groupings, highlighting its utility in simpler clustering contexts. This study contributes to a deeper understanding of the operational characteristics of these popular clustering algorithms, potentially guiding the selection of appropriate methods for complex data analysis tasks in practice.

Semester

Spring 2024

Course Name

STA 6367 Data Science 2

Instructor Name

Rui Xie

Accessibility Status

PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker

Included in

Data Science Commons

Share

COinS