1.0.3 • Published 6 years ago

scrappi v1.0.3

Weekly downloads
1
License
ISC
Repository
-
Last release
6 years ago

scrappi

Versatile web scraping utility designed with collecting time series data in mind.

Install

npm i scrappi

Usage

Main use case of scrappi would be to scrape a webpage for value(s), post to an api endpoint, at a specific interval

import scrappi from 'scrappi'


let result = await scrappi({
    target: 'https://google.com', //Get html
    endpoint: 'https://coolapi/post', //Post results to here
    tick: 500, // Every 500ms
    onDocumentReceived: (html) => {
        //do whatever with html in here
    },
    post: (json) => {
        //send json to endpoint
    }
})

Building

To build bundles files run

npm run build

Files are outputted to dist/.

Options

OptionTypeDefaultDescription
targetstringhttps://google.comTarget URL of webpage you want to scrape
endpointstring | Endpoint URL of webpage you want to post to
verbosebooleantrueDisplays additional information during operation
oncebooleanfalseSets scrappi ineration to 1, good for debugging
ticknumber500How often scrappi should scrape and post
onDocumentReceivedfunction() => {}Is called when scrappi receives html from target webpage
postfunction() => {}Is called when scrappi is ready to post json payload

Examples

scrappi is very versatile, use any html and xhr library as you wish

Scrappi + Cheerio + XHR

import scrappi from 'scrappi'
import cheerio from 'cheerio'
import xhr from 'xhr'


let result = await scrappi({
    target: 'https://google.com', //Get html
    endpoint: 'https://coolapi/post', //Post results to here
    tick: 500, // Every 500ms
    onDocumentReceived: (html) => {
        //do whatever with html in here
        let $ = cheerio.load(html)
        return{
            title: $(".title")
        }
    },
    post: (json) => {
        //send json to endpoint
        xhr({
            method: "post",
            body: JSON.Stringify(json),
            uri: "/https://coolapi/post",
            headers: {
                "Content-Type": "application/json"
            }
        }, function (err, resp, body) {
            // check resp.statusCode
        })
    }
})
1.0.3

6 years ago

1.0.2

6 years ago

1.0.1

6 years ago

1.0.0

6 years ago