Efficient extraction of news articles based on RSS crawling

TitleEfficient extraction of news articles based on RSS crawling
Publication TypeConference Paper
Year of Publication2010
AuthorsBouras, C, Poulopoulos, V, Adam, G
Conference NameInternational Conference on Machine and Web Intelligence, Algiers, Algeria (Invited Paper)
Date Published3 - 5 October

The expansion of the World Wide Web has led to a state where a vast amount of Internet users face and have to overcome the major problem of discovering desired information. It is inevitable that hundreds of web pages and weblogs are generated daily or changing on a daily basis. The main problem that arises from the continuous generation and alteration of web pages is the discovery of useful information, a task that becomes difficult even for the experienced internet users. Many mechanisms have been constructed and presented in order to overcome the puzzle of information discovery on the Internet and they are mostly based on crawlers which are browsing the WWW, downloading pages and collect the information that might be of user interest. In this manuscript we describe a mechanism that fetches web pages that include news articles from major news portals and blogs. This mechanism is constructed in order to support tools that are used to acquire news articles from all over the world, process them and present them back to the end users in a personalized manner