0.0.16 • Published 10 years ago

punch-scraper v0.0.16

Weekly downloads
56
License
-
Repository
-
Last release
10 years ago

punch-scraper

Config

  • proxyManagerConfig - punch proxy manager config
  • maxTry - How many time scrapper will try to fetch the link before error
  • strategy - scraper strategies
  • name - strategy name valid values: CASPERJS, HTTP, PHANTOMJS
  • proxy - proxy ip
  • lambda - for CASPERJS or PHANTOMJS only
  •  aws_key - aws key
  •  aws_secret - aws secret key
  •  region - aws region
  •  lambda_name - aws labmda function name
  • eval - code that should be evaled for CASPERJS or PHANTOMJS only
  • services
  • include - array of proxy services to use
  • exclude - array of proxy services to not use
  • valid valuesGIMMI_PROXY, HIDE_MY_ASS, IN_CLOCK, PROXY_SERVER_LIST, UK_PROXY, US_PROXY

Method

  • scrape - scrape urls
  • start - start the scraper manager
  • stop - stop the scraper manager

Usage

'use strict';

const ScrapeManager = require('./scraper-manager/');
const scrapeManager = new ScrapeManager();
const config = {
    eval: "response.write(page.content);response.close();",
    strategy: {
        name: 'phantomjs',
        lambda: {
            aws_key: 'XXX-XXX-XXX',
            aws_secret: 'XXX-XXX-XXX',
            lambda_name: 'node-phantomjs-aws-lambda-server-development',
            region: 'us-west-2'
        }
    }
};

let links = [
  'http://www.google.com/',
  'http://www.google.com/'
];


scrapeManager.start()
.then(() => scrapeManager.scrape(links, config))
.then((results) => {
    console.log(results);
    console.log('done');
    scrapeManager.stop();
});
0.0.16

10 years ago

0.0.15

10 years ago

0.0.14

10 years ago

0.0.13

10 years ago

0.0.12

10 years ago

0.0.11

10 years ago

0.0.10

10 years ago

0.0.9

10 years ago

0.0.8

10 years ago

0.0.7

10 years ago

0.0.6

10 years ago

0.0.5

10 years ago

0.0.4

10 years ago

0.0.3

10 years ago