Title | Scalability of text classification |
Publication Type | Conference Paper |
Year of Publication | 2006 |
Authors | Bouras, C, Poulopoulos, V, Antonellis, I, Zouzias, A |
Conference Name | 2nd International Conference on Web Information
Systems and Technologies (WEBIST 2006), Setubal, Portugal |
Date Published | 19 - 22 April |
Abstract | We explore scalability issues of the text classification problem where using (multi)labeled training
documents we try to build classifiers that assign documents into classes permitting classification in multiple
classes. A new class of classification problems, called ?scalable? is introduced that models many problems
from the area of Web mining. The property of scalability is defined as the ability of a classifier to adjust
classification results on a ?per-user? basis. Furthermore, we investigate on different ways to interpret
personalization of classification results by analyzing well known text datasets and exploring existent
classifiers. We present solutions for the scalable classification problem based on standard classification
techniques and present an algorithm that relies on the semantic analysis using document decomposition into
its sentences. Experimental results concerning the scalability property and the performance of these
algorithms are provided using the 20newsgroup dataset and a dataset consisting of web news.
|