Title

Classification Algorithms For Netnews Articles

Abstract

We propose several algorithms using the vector space model to classify the news articles posted on the NETNEWS according to the newsgroup categories. The baseline method combines the terms of all the articles of each newsgroup in the training set to represent the newsgroups as single vectors. After training, the incoming news articles are classified based on their similarity to the existing newsgroup categories. We propose to use the following techniques to improve the classification performance of the baseline method: (1) use routing (classification) accuracy and the similarity values to refine the training set; (2) update the underlying term structures periodically during testing; and (3) apply k-means clustering to partition the newsgroup articles and represent each newsgroup by k vectors. Our test collection consists of the real news articles and the 519 subnewsgroups under the REC newsgroup of NETNEWS in a period of 3 months. Our experimental results demonstrate that the technique of refining the training set reduces from one-third to two-thirds of the storage. The technique of periodical updates improves the routing accuracy ranging from 20% to 100% but incurs runtime overhead. Finally, representing each newsgroup by k vectors (with k = 2 or 3) using clustering yields the most significant improvement in routing accuracy, ranging from 60% to 100%, while causing only slightly higher storage requirements.

Publication Date

1-1-1999

Publication Title

International Conference on Information and Knowledge Management, Proceedings

Number of Pages

114-121

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1145/319950.319965

Socpus ID

0033279306 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/0033279306

This document is currently not available here.

Share

COinS