%0 Conference Paper %B 2nd Intenational Conference on Data Management Technologies and Applications, Reykjavvk, Iceland %D 2013 %T Enhancing news articles clustering using word n - grams %A Christos Bouras %A Vassilis Tsogkas %X

In this work we explore the possible enhancement of the document clustering results, and in particular clus-tering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web using n-grams, extracted from the articles at an offline stage. We compared this technique with the sin-gle minded bag-of-words representation that our clustering algorithm, W-kmeans, previously used. Our ex-perimentation revealed that via tuning of the weighting parameters between keyword and n-grams, as well as the n itself, a significant improvement regarding the clustering results metrics can be achieved. This re-flects more coherent clusters and better overall clustering performance.

%B 2nd Intenational Conference on Data Management Technologies and Applications, Reykjavvk, Iceland %P 53-60 %8 July 29 - 31 %G eng