0.1.6 • Published 10 years ago
spidee v0.1.6
Tiny web crawler
Install
npm install spidee
Usage
var spidee = require('spidee')(options);
spidee.crawl();
or
var spidee = require('spidee')();
spidee.configure(options).crawl();
Options
option | mandatory | default value | description |
---|---|---|---|
url | yes | starting point for crawling | |
sleep | no | 150 | how long should spider wait between each request |
timeout | no | 1500 | how long should spider wait for response |
repeat | no | 1 | how many request should spider do on each link, including starting url, useful for cache testing |
ignoreRelative | no | false | determinates if spider should ignore all relative links |
onSuccess | no | callback called on request success | |
onFailure | no | callback called on request failure | |
shouldCrawl | no | function | function defining if spider should follow this url, useful to prevent spider to run out of your web scope, which it does by default. function MUST return boolean |