Experiments with K-Means, Fuzzy c-Means and Approaches to Choose K and C
Abstract
A parameter specifying the number of clusters in an unsupervised clustering algorithm is often unknown. Different cluster validity indices proposed in the past have attempted to address this issue, and their performance is directly related to the accuracy of a clustering algorithm. Toe gap statistic proposed by Tibshirani (2001) was applied to k-means and hierarchical clustering algorithms for estimating the number of clusters and is shown to outperform other cluster validity measures, especially in the null model case. In our experiments, the gap statistic is applied to the Fuzzy c-Means (FCM) algorithm and compared to existing FCM cluster validity indices examined by Pal (1995). A comparison is also made between two initialization methods where centers are randomly assigned to data points or initialized using the furthest first algorithm (Hochbaum, 1985). Toe gap statistic can be applied using the FCM algorithm as long as the fuzzy partition matrix can be employed in computing the gap statistic metric, Wk . Three new methodologies are examined for computing this metric in order to apply the gap statistic to the FCM algorithm. Toe fuzzy partition matrix generated by FCM can also be thresholded based upon the maximum membership to allow computation similar to the kmeans algorithm. This is assumed to be the current method for employing the gap statistic with the FCM algorithm and is compared to the three proposed methods. In our results, the gap statistic outperformed the cluster validity indices for FCM, and one of the new methodologies introduced for computing the metric, based upon the FCM objective function, out performed the threshold method for m=2.
Notes
This item is only available in print in the UCF Libraries. If this is your thesis or dissertation, you can help us make it available online for use by researchers around the world by STARS for more information.
Thesis Completion
2006
Semester
Summer
Advisor
Georgiopoulos, Michael
Degree
Bachelor of Science (B.S.)
College
College of Engineering and Computer Science
Degree Program
Computer Engineering
Subjects
Dissertations, Academic -- Engineering; Engineering -- Dissertations, Academic
Format
Identifier
DP0021999
Language
English
Access Status
Open Access
Length of Campus-only Access
None
Document Type
Honors in the Major Thesis
Recommended Citation
Hong, Sui, "Experiments with K-Means, Fuzzy c-Means and Approaches to Choose K and C" (2006). HIM 1990-2015. 571.
https://stars.library.ucf.edu/honorstheses1990-2015/571