Title
Feature Reduction And Database Maintenance In Netnews Classification
Abstract
We propose a statistical feature-reduction technique to filter out the most ambiguous articles in the training data for categorizing the NETNEWS articles. We also incorporate a batch updating scheme to periodically do maintenance on the term structures of the news database after training. The baseline method combines the terms of all the articles of each newsgroup in the training set to represent the newsgroups as single vectors. After training, the incoming news articles are classified based on their similarity to the existing newsgroup categories. Our implementation uses an inverted file to store the trained term structures of each newsgroup, and uses a list similar to the inverted file to buffer the newly arrival articles, for efficient routing and updating purposes. Our experimental results using real NETNEWS articles and newsgroups demonstrate (1) applying feature reduction to the training set improves the routing accuracy, efficiency, and database storage; (2) updating improves the routing accuracy; and (3) the batch technique improves the efficiency of the updating operation.
Publication Date
1-1-1999
Publication Title
Proceedings of the International Database Engineering and Applications Symposium, IDEAS
Number of Pages
137-144
Document Type
Article
Personal Identifier
scopus
Copyright Status
Unknown
Socpus ID
0032590472 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/0032590472
STARS Citation
Hsu, Wen Lin and Lang, Sheau Dong, "Feature Reduction And Database Maintenance In Netnews Classification" (1999). Scopus Export 1990s. 4048.
https://stars.library.ucf.edu/scopus1990/4048