1.0.1 • Published 8 years ago

phantomjs-scraper v1.0.1

Weekly downloads
1
License
ISC
Repository
github
Last release
8 years ago

phantomjs-scraper

PhantomJS module for web scraping. Documentation to be completed. Eventually.

Usage

When you require the module it returns an object with three keys:

  • Scraper: the Scraper class, ready to be instantiated
  • Spider: the Spider class, to be extended by the user (here you implement your code)
  • util: misc utility functions

So basically, the process is as following: 1. Require the module and point to its content

var phantomScraper = require('phantomjs-scraper');
var Scraper = phantomScraper.Scraper;
var Spider = phantomScraper.Spider;
  1. Create a configuration object
var config = 
{
	dir_root: fs.workingDirectory,
	dir_data:  exports.dir_root + "/data",
	dir_rsc: exports.dir_root + "/bower_components",
	dir_logs: exports.dir_root + "/logs",
	dir_spiders: exports.dir_root + "/spiders"
};
  1. Instantiate a Scraper object
var sc = new Scraper(settings);
  1. Create a subclass of Spider and implement your code there in the spiders directory specified before
  2. Indicate your scraper to instantiate a new spider object of your custom type:
sc.createSpider('myCustomSpider');
  1. Fire the scraper - scraper.start()
sc.start();