Title

Feature Reduction And Database Maintenance In Netnews Classification

Abstract

We propose a statistical feature-reduction technique to filter out the most ambiguous articles in the training data for categorizing the NETNEWS articles. We also incorporate a batch updating scheme to periodically do maintenance on the term structures of the news database after training. The baseline method combines the terms of all the articles of each newsgroup in the training set to represent the newsgroups as single vectors. After training, the incoming news articles are classified based on their similarity to the existing newsgroup categories. Our implementation uses an inverted file to store the trained term structures of each newsgroup, and uses a list similar to the inverted file to buffer the newly arrival articles, for efficient routing and updating purposes. Our experimental results using real NETNEWS articles and newsgroups demonstrate (1) applying feature reduction to the training set improves the routing accuracy, efficiency, and database storage; (2) updating improves the routing accuracy; and (3) the batch technique improves the efficiency of the updating operation.

Publication Date

1-1-1999

Publication Title

Proceedings of the International Database Engineering and Applications Symposium, IDEAS

Number of Pages

137-144

Document Type

Article

Personal Identifier

scopus

Socpus ID

0032590472 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/0032590472

This document is currently not available here.

Share

COinS