1.0.5 • Published 3 years ago

@keso/scraping v1.0.5

Weekly downloads
-
License
ISC
Repository
-
Last release
3 years ago

scraping utilities

Installation

npm i @keso/scraping

Usage

scrape

scrape is an operation run once. Set it up to run through a Cron job or similiar for repeated jobs.

Example, scrape.js:

import { scrape } from "@keso/scraping";

scrape("https://petter.envall.se/", parser, analyzer);  

// Document parser, returns scraped data
function parser() {
    return { pageTitle: document.title, };
}

// Analyze and act on the data that was parsed
async function analyzer(data) {
    if (...) {
        ...
    }
}

poll

poll is a simple way to repeatedly scrape a page

Example, poll.js:

import { poll } from "@keso/scraping";

poll("https://example.com/", parser, analyzer, 5000);  

// Document parser, returns scraped data
function parser() {
    return { pageTitle: document.title, };
}

// Analyze and act on the data that was parsed
async function analyzer(data) {
    if (...) {
        ...
    }
}

getSession

Obtain a session object to navigate, interact and parse data from.

Example, session.js:

import { session } from "@keso/scraping";

async function run() {
    const session = await getSession();
    await session.nav("https://example.com/");
    const data = await session.parse(parser);

    const submitButton = await session.page.$(`input[type="submit"]`);
    if (submitButton) {
        submitButton.click();
        await session.page.waitForNavigation();
        const data2 = await session.parse(parser);
    }
    
}

function parser() {
    return { pageTitle: document.title, };
}

run();

session API

The session object has the following API

nav(str) navigates to a URL:

await session.nav(url: string);

parse(fn) parses the current page using a parser function. Returns a promise of the data returned from the parser function:

const data = await session.parse(parser: DocumentParser<T>);

page getter for the current page object. It is the "page" from the Puppeteer API.

const button = await session.page.$(`input[type="submit"]`);
if (button) {
    button.click();
    await session.page.waitForNavigation();
}

setTextFieldValue(value: string, selector: string) sets the desired string value in the corresponding text input or -field.

await setTextFieldValue("foo", `input[name="bar"]`);

close() closes the browser session

await session.close();
1.0.5

3 years ago

1.0.4

3 years ago

1.0.3

3 years ago

1.0.2

3 years ago

1.0.1

3 years ago

1.0.0

3 years ago