0.0.16 • Published 8 years ago

punch-scraper v0.0.16

Weekly downloads
56
License
-
Repository
-
Last release
8 years ago

punch-scraper

Config

  • proxyManagerConfig - punch proxy manager config
  • maxTry - How many time scrapper will try to fetch the link before error
  • strategy - scraper strategies
  • name - strategy name valid values: CASPERJS, HTTP, PHANTOMJS
  • proxy - proxy ip
  • lambda - for CASPERJS or PHANTOMJS only
  •  aws_key - aws key
  •  aws_secret - aws secret key
  •  region - aws region
  •  lambda_name - aws labmda function name
  • eval - code that should be evaled for CASPERJS or PHANTOMJS only
  • services
  • include - array of proxy services to use
  • exclude - array of proxy services to not use
  • valid valuesGIMMI_PROXY, HIDE_MY_ASS, IN_CLOCK, PROXY_SERVER_LIST, UK_PROXY, US_PROXY

Method

  • scrape - scrape urls
  • start - start the scraper manager
  • stop - stop the scraper manager

Usage

'use strict';

const ScrapeManager = require('./scraper-manager/');
const scrapeManager = new ScrapeManager();
const config = {
    eval: "response.write(page.content);response.close();",
    strategy: {
        name: 'phantomjs',
        lambda: {
            aws_key: 'XXX-XXX-XXX',
            aws_secret: 'XXX-XXX-XXX',
            lambda_name: 'node-phantomjs-aws-lambda-server-development',
            region: 'us-west-2'
        }
    }
};

let links = [
  'http://www.google.com/',
  'http://www.google.com/'
];


scrapeManager.start()
.then(() => scrapeManager.scrape(links, config))
.then((results) => {
    console.log(results);
    console.log('done');
    scrapeManager.stop();
});