Utilizing RSS feeds for crawling the Web

TitleUtilizing RSS feeds for crawling the Web
Publication TypeConference Paper
Year of Publication2009
AuthorsBouras, C, Poulopoulos, V, Adam, G
Conference NameThe Fourth International Conference on Internet and Web ASpplications and Services - ICIW 2009
Date Published24 - 28 May
Abstract

We present “advaRSS” crawling mechanism which
is created in order to support peRSSonal, a mechanism used to
create personalized RSS feeds. In contrast to the common
crawling mechanisms our system is focalized on fetching the
latest news from the major and minor portals worldwide by
utilizing their communication channels. The challenge between
“advaRSS” and a usual crawler is the fact that the news is
produced in a random order any time of the day and thus the
freshness of the offline collection can be measured even in
minutes. This means that the system has to be updated with
news every single time they occur. In order to achieve this we
utilize the communication channels that exist on the modern
architecture of the WWW and more specifically in almost
every modern news portal. As the RSS feeds are used by every
major and minor portal it is possible to keep our crawler up to
date and retain a high freshness of the “offline content” that is
maintained in our system?s database by applying algorithms in
order to observe the temporal behaviour of each RSS feed.