Keywords

K-Mean Clustering

Description

This study evaluates the performance of the K-Means clustering algorithm across a variety of benchmark datasets, including low-dimensional, high-dimensional, overlapping, and imbalanced data. Using four key metrics—Mean Squared Error (MSE), Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Silhouette Score—the paper demonstrates that K-Means performs exceptionally well on well-separated and high-dimensional datasets, but faces challenges with overlapping clusters and varying densities. Through visualization and quantitative analysis, the paper highlights both the strengths and limitations of K-Means in unsupervised learning.

Abstract

Clustering is a fundamental technique in unsupervised machine learning, widely applied in various domains such as pattern recognition, data segmentation, and anomaly detection. This study evaluates the performance of the K-Means clustering algorithm on multiple benchmark datasets, including low-dimensional, high-dimensional, and imbalanced datasets. The clustering results are assessed using four key evaluation metrics: Mean Squared Error (MSE), Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Silhouette Score. Experimental results demonstrate that K-Means performs effectively on datasets with well-separated clusters, particularly in high-dimensional spaces, where it achieves near-perfect clustering accuracy. However, its performance deteriorates in datasets with overlapping clusters and varying cluster densities, highlighting its sensitivity to initialization and cluster structure.

Course Name

STA 6367 Data Science 2

Instructor Name

Dr. RUI XIE

Rights

This work is licensed under a Creative Commons Attribution 4.0 International License.

College

College of Sciences

STARS Citation

Deb, Dipok, "Clustering Dataset Using K-Mean Clustering" (2025). Data Science and Data Mining. 39.
https://stars.library.ucf.edu/data-science-mining/39

Download

Included in

Data Science Commons

COinS

Data Science and Data Mining

Clustering Dataset Using K-Mean Clustering

Keywords

Description

Abstract

Course Name

Instructor Name

Rights

College

STARS Citation

Included in

Explore

Connect

Data Science and Data Mining

Clustering Dataset Using K-Mean Clustering

Author(s)

Keywords

Description

Abstract

Course Name

Instructor Name

Rights

College

STARS Citation

Included in

Share

Explore

Connect