Keywords
Beowulf, Data mining, Fuzzy ARTMAP, Neural network, Parallel processing
Abstract
The Fuzzy–ARTMAP (FAM) algorithm has been proven to be one of the premier neural network architectures for classification problems. FAM can learn on line and is usually faster than other neural network approaches. Nevertheless the learning time of FAM can slow down considerably when the size of the training set increases into the hundreds of thousands. In this dissertation we apply data partitioning and network partitioning to the FAM algorithm in a sequential and parallel setting to achieve better convergence time and to efficiently train with large databases (hundreds of thousands of patterns). We implement our parallelization on a Beowulf clusters of workstations. This choice of platform requires that the process of parallelization be coarse grained. Extensive testing of all the approaches is done on three large datasets (half a million data points). One of them is the Forest Covertype database from Blackard and the other two are artificially generated Gaussian data with different percentages of overlap between classes. Speedups in the data partitioning approach reached the order of the hundreds without having to invest in parallel computation. Speedups on the network partitioning approach are close to linear on a cluster of workstations. Both methods allowed us to reduce the computation time of training the neural network in large databases from days to minutes. We prove formally that the workload balance of our network partitioning approaches will never be worse than an acceptable bound, and also demonstrate the correctness of these parallelization variants of FAM.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2004
Semester
Spring
Advisor
Georgiopoulos, Michael
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Electrical and Computer Engineering
Degree Program
Electrical and Computer Engineering
Format
application/pdf
Identifier
CFE0000065
URL
http://purl.fcla.edu/fcla/etd/CFE0000065
Language
English
Release Date
May 2004
Length of Campus-only Access
None
Access Status
Doctoral Dissertation (Open Access)
Subjects
Dissertations, Academic -- Engineering and Computer Science; Engineering and Computer Science -- Dissertations, Academic
STARS Citation
Castro, Jose R., "Modifications To The Fuzzy-ARTMAP Algorithm For Distributed Learning In Large Data Sets" (2004). Electronic Theses and Dissertations. 5.
https://stars.library.ucf.edu/etd/5