Claw NPM | npm.io

claw

A very simple web scraper chassis.

Takes:

a page url
a selection to scrape
fields to pull out from within that section
an output folder
number of seconds to delay

and it creates CSV and JSON files with the results. Claw creates a separate file for each page it scrapes.

// libararies
var claw = require('claw');
	
// get settings
var page = 'http://www.bing.com/search?q=hello';

var selector = 'h3 a';

var fields = {
	"text" : "text()",
	"href" : "attr('href')"
};

claw(page, selector, fields, 'output', 3);

Give it an array of pages, and it will save the results of each page to a separate file.

claw(['http://www.bing.com/search?q=hello', 'http://www.bing.com/search?q=goodbye'], selector, fields, 'output', 3);

Claw can also grab its page list from JSON file that is a list of urls (or an object with .href properties).

claw("pages.json", selector, fields, 'output', 3);

Questions? Ideas? Hit me up on twitter - @dylanized

path request cheerio underscore jsonfile json2csv

@everything-registry/sub-chunk-1333 @zalastax/nolb-claw

12 years ago

12 years ago

12 years ago

12 years ago

12 years ago

12 years ago

12 years ago

12 years ago

12 years ago

12 years ago