0.0.9 • Published 12 years ago
claw v0.0.9
claw
A very simple web scraper chassis.
Takes:
- a page url
- a selection to scrape
- fields to pull out from within that section
- an output folder
- number of seconds to delay
and it creates CSV and JSON files with the results. Claw creates a separate file for each page it scrapes.
// libararies
var claw = require('claw');
// get settings
var page = 'http://www.bing.com/search?q=hello';
var selector = 'h3 a';
var fields = {
"text" : "text()",
"href" : "attr('href')"
};
claw(page, selector, fields, 'output', 3);
Give it an array of pages, and it will save the results of each page to a separate file.
claw(['http://www.bing.com/search?q=hello', 'http://www.bing.com/search?q=goodbye'], selector, fields, 'output', 3);
Claw can also grab its page list from JSON file that is a list of urls (or an object with .href properties).
claw("pages.json", selector, fields, 'output', 3);
Questions? Ideas? Hit me up on twitter - @dylanized