1.0.5 • Published 4 years ago
@keso/scraping v1.0.5
scraping utilities
Installation
npm i @keso/scrapingUsage
scrape
scrape is an operation run once. Set it up to run through a
Cron job or similiar for repeated jobs.
Example, scrape.js:
import { scrape } from "@keso/scraping";
scrape("https://petter.envall.se/", parser, analyzer);
// Document parser, returns scraped data
function parser() {
return { pageTitle: document.title, };
}
// Analyze and act on the data that was parsed
async function analyzer(data) {
if (...) {
...
}
}poll
poll is a simple way to repeatedly scrape a page
Example, poll.js:
import { poll } from "@keso/scraping";
poll("https://example.com/", parser, analyzer, 5000);
// Document parser, returns scraped data
function parser() {
return { pageTitle: document.title, };
}
// Analyze and act on the data that was parsed
async function analyzer(data) {
if (...) {
...
}
}getSession
Obtain a session object to navigate, interact and parse data from.
Example, session.js:
import { session } from "@keso/scraping";
async function run() {
const session = await getSession();
await session.nav("https://example.com/");
const data = await session.parse(parser);
const submitButton = await session.page.$(`input[type="submit"]`);
if (submitButton) {
submitButton.click();
await session.page.waitForNavigation();
const data2 = await session.parse(parser);
}
}
function parser() {
return { pageTitle: document.title, };
}
run();session API
The session object has the following API
nav(str) — navigates to a URL:
await session.nav(url: string);parse(fn) — parses the current page using a parser function.
Returns a promise of the data returned from the parser function:
const data = await session.parse(parser: DocumentParser<T>);page — getter for the current page object. It is the "page" from
the Puppeteer API.
const button = await session.page.$(`input[type="submit"]`);
if (button) {
button.click();
await session.page.waitForNavigation();
}setTextFieldValue(value: string, selector: string) — sets the desired string value
in the corresponding text input or -field.
await setTextFieldValue("foo", `input[name="bar"]`);close() — closes the browser session
await session.close();