Title
Classification Algorithms For Netnews Articles
Abstract
We propose several algorithms using the vector space model to classify the news articles posted on the NETNEWS according to the newsgroup categories. The baseline method combines the terms of all the articles of each newsgroup in the training set to represent the newsgroups as single vectors. After training, the incoming news articles are classified based on their similarity to the existing newsgroup categories. We propose to use the following techniques to improve the classification performance of the baseline method: (1) use routing (classification) accuracy and the similarity values to refine the training set; (2) update the underlying term structures periodically during testing; and (3) apply k-means clustering to partition the newsgroup articles and represent each newsgroup by k vectors. Our test collection consists of the real news articles and the 519 subnewsgroups under the REC newsgroup of NETNEWS in a period of 3 months. Our experimental results demonstrate that the technique of refining the training set reduces from one-third to two-thirds of the storage. The technique of periodical updates improves the routing accuracy ranging from 20% to 100% but incurs runtime overhead. Finally, representing each newsgroup by k vectors (with k = 2 or 3) using clustering yields the most significant improvement in routing accuracy, ranging from 60% to 100%, while causing only slightly higher storage requirements.
Publication Date
1-1-1999
Publication Title
International Conference on Information and Knowledge Management, Proceedings
Number of Pages
114-121
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1145/319950.319965
Copyright Status
Unknown
Socpus ID
0033279306 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/0033279306
STARS Citation
Hsu, Wen Lin and Lang, Sheau Dong, "Classification Algorithms For Netnews Articles" (1999). Scopus Export 1990s. 3898.
https://stars.library.ucf.edu/scopus1990/3898