0.0.9 • Published 12 years ago

claw v0.0.9

Weekly downloads
35
License
-
Repository
github
Last release
12 years ago

claw

A very simple web scraper chassis.

Takes:

  • a page url
  • a selection to scrape
  • fields to pull out from within that section
  • an output folder
  • number of seconds to delay

and it creates CSV and JSON files with the results. Claw creates a separate file for each page it scrapes.

// libararies
var claw = require('claw');
	
// get settings
var page = 'http://www.bing.com/search?q=hello';

var selector = 'h3 a';

var fields = {
	"text" : "text()",
	"href" : "attr('href')"
};

claw(page, selector, fields, 'output', 3);
	

Give it an array of pages, and it will save the results of each page to a separate file.

claw(['http://www.bing.com/search?q=hello', 'http://www.bing.com/search?q=goodbye'], selector, fields, 'output', 3);

Claw can also grab its page list from JSON file that is a list of urls (or an object with .href properties).

claw("pages.json", selector, fields, 'output', 3);

Questions? Ideas? Hit me up on twitter - @dylanized

0.0.9

12 years ago

0.0.81

12 years ago

0.0.8

12 years ago

0.0.7

12 years ago

0.0.6

12 years ago

0.0.5

12 years ago

0.0.4

12 years ago

0.0.3

12 years ago

0.0.2

12 years ago

0.0.1

12 years ago