0.2.2 • Published 6 years ago

@coya/web-scraper v0.2.2

Weekly downloads
1
License
ISC
Repository
github
Last release
6 years ago

Web Scraper

Web scraper on top of PhantomJS or Chromium.
If you chose to use PhantomJS, the module is designed as a connection client/server between the PhantomJS web scraper server and a client acting like a driver and sending scraping HTTP requests to the server.
Chromium is different because it is driven directly from NodeJS.

Installation

npm install @coya/web-scraper

Build (for dev)

git clone https://github.com/Cooya/WebScraper
cd WebScraper
npm install // it will also install the development dependencies
npm install phantomjs -g // if you need PhantomJS, install it globally
npm run build
npm run example // run the example script in "examples" folder

Usage examples

The package allows to inject JS function :

const { ChromiumScraper } = require('@coya/web-scraper');

// if you want to use PhantomJS instead of Chromium
// const { PhantomScraper } = require('@coya/web-scraper');

const scraper = ChromiumScraper.getInstance();

const getLinks = function() { // return all links from the requested page
    return $('a').map(function(i, elt) {
        return $(elt).attr('href');
    }).get();
};

scraper.request({
    url: 'cooya.fr',
    fct: getLinks // function injected in the page environment
})
.then(function(result) {
    console.log(result); // returned value of the injected function
    scraper.close(); // end the client/server connection and kill the web scraper subprocess
}, function(error) {
    console.error(error);
    scraper.close();
});

Or to inject JS function from an external script :

const { ChromiumScraper } = require('@coya/web-scraper');

// if you want to use PhantomJS instead of Chromium
// const { PhantomScraper } = require('@coya/web-scraper');

const scraper = ChromiumScraper.getInstance();

scraper.request({
    url: 'cooya.fr',
    fct: __dirname + '/externalScript.js', // external script exporting the function to be injected
})
.then(function(result) {
    console.log(result); // returned value of the injected function
    scraper.close(); // end the client/server connection and kill the web scraper subprocess
}, function(error) {
    console.error(error);
    scraper.close();
});

externalScript.js :

module.exports = function() { // return all links from the requested page
    return $('a').map(function(i, elt) {
        return $(elt).attr('href');
    }).get();
};

Methods

ScraperClient.getInstance()

The ScraperClient object is a singleton, only one client can be created, so this method is required to get the client instance.

request(params)

Send a request to a specific url and inject JavaScript into the page associated. Return a promise with the result in parameter.

ParameterTypeDescriptionDefault value
paramsobjectsee below for details about thisnone

close()

Terminate the PhantomJS web scraper process that will allow to end the current NodeJS script properly.

Request parameters spec

ParameterTypeDescriptionRequired
urlstringtarget urlyes
fctfunctionJS function to inject into the pageyes
fctstringpath to script path and function to inject separated by hash key (e.g. "path/to/script/script.js#functionToCall")yes
refererstringreferer header parameter set in each requestoptional
argsobjectobject passed to the injected functionoptional
debugbooleanenable the debug mode (verbose)optional
0.2.2

6 years ago

0.2.1

6 years ago

0.2.0

6 years ago

0.1.5

6 years ago

0.1.4

6 years ago

0.1.3

6 years ago

0.1.2

6 years ago

0.1.1

6 years ago

0.0.31

6 years ago

0.0.30

6 years ago

0.0.29

6 years ago

0.0.28

6 years ago

0.0.27

6 years ago

0.0.26

6 years ago

0.0.25

6 years ago

0.0.24

6 years ago

0.0.23

6 years ago

0.0.22

6 years ago

0.0.21

6 years ago

0.0.20

6 years ago

0.0.19

6 years ago

0.0.18

6 years ago

0.0.17

6 years ago

0.0.16

6 years ago

0.0.15

6 years ago

0.0.14

6 years ago

0.0.13

6 years ago

0.0.12

6 years ago

0.0.11

6 years ago

0.0.10

6 years ago

0.0.9

6 years ago

0.0.8

6 years ago

0.0.7

6 years ago

0.0.6

6 years ago

0.0.5

6 years ago

0.0.4

6 years ago

0.0.3

6 years ago

0.0.2

7 years ago

0.0.1

7 years ago