1.0.1 • Published 8 years ago
phantomjs-scraper v1.0.1
phantomjs-scraper
PhantomJS module for web scraping. Documentation to be completed. Eventually.
Usage
When you require the module it returns an object with three keys:
- Scraper: the Scraper class, ready to be instantiated
- Spider: the Spider class, to be extended by the user (here you implement your code)
- util: misc utility functions
So basically, the process is as following: 1. Require the module and point to its content
var phantomScraper = require('phantomjs-scraper');
var Scraper = phantomScraper.Scraper;
var Spider = phantomScraper.Spider;
- Create a configuration object
var config =
{
dir_root: fs.workingDirectory,
dir_data: exports.dir_root + "/data",
dir_rsc: exports.dir_root + "/bower_components",
dir_logs: exports.dir_root + "/logs",
dir_spiders: exports.dir_root + "/spiders"
};
- Instantiate a Scraper object
var sc = new Scraper(settings);
- Create a subclass of Spider and implement your code there in the spiders directory specified before
- Indicate your scraper to instantiate a new spider object of your custom type:
sc.createSpider('myCustomSpider');
- Fire the scraper - scraper.start()
sc.start();