Experiments with K-Means, Fuzzy c-Means and Approaches to Choose K and C

Abstract

A parameter specifying the number of clusters in an unsupervised clustering algorithm is often unknown. Different cluster validity indices proposed in the past have attempted to address this issue, and their performance is directly related to the accuracy of a clustering algorithm. Toe gap statistic proposed by Tibshirani (2001) was applied to k-means and hierarchical clustering algorithms for estimating the number of clusters and is shown to outperform other cluster validity measures, especially in the null model case. In our experiments, the gap statistic is applied to the Fuzzy c-Means (FCM) algorithm and compared to existing FCM cluster validity indices examined by Pal (1995). A comparison is also made between two initialization methods where centers are randomly assigned to data points or initialized using the furthest first algorithm (Hochbaum, 1985). Toe gap statistic can be applied using the FCM algorithm as long as the fuzzy partition matrix can be employed in computing the gap statistic metric, Wk . Three new methodologies are examined for computing this metric in order to apply the gap statistic to the FCM algorithm. Toe fuzzy partition matrix generated by FCM can also be thresholded based upon the maximum membership to allow computation similar to the kmeans algorithm. This is assumed to be the current method for employing the gap statistic with the FCM algorithm and is compared to the three proposed methods. In our results, the gap statistic outperformed the cluster validity indices for FCM, and one of the new methodologies introduced for computing the metric, based upon the FCM objective function, out performed the threshold method for m=2.

Notes

This item is only available in print in the UCF Libraries. If this is your thesis or dissertation, you can help us make it available online for use by researchers around the world by downloading and filling out the Internet Distribution Consent Agreement. You may also contact the project coordinator Kerri Bottorff for more information.

Thesis Completion

2006

Semester

Summer

Advisor

Georgiopoulos, Michael

Degree

Bachelor of Science (B.S.)

College

College of Engineering and Computer Science

Degree Program

Computer Engineering

Subjects

Dissertations, Academic -- Engineering; Engineering -- Dissertations, Academic

Format

Print

Identifier

DP0021999

Language

English

Access Status

Open Access

Length of Campus-only Access

None

Document Type

Honors in the Major Thesis

This document is currently not available here.

Share

COinS