Extracted from Pike v7.8 release 866 at 2016-11-06.
pike.ida.liu.se
[Top]
Web
Web.Crawler

Module Web.Crawler

Description

This module implements a generic web crawler.

Features:

Fully asynchronous operation (Several hundred simultaneous requests)

Supports the /robots.txt exclusion standard

Extensible

URI Queues

Allow/Deny rules

Configurable

Number of concurrent fetchers

Bits per second (bandwidth throttling)

Number of concurrent fetchers per host

Delay between fetches from the same host

Supports HTTP and HTTPS