0.6.0 • Published 3 years ago

scrappy v0.6.0

Weekly downloads
15
License
Apache-2.0
Repository
github
Last release
3 years ago

Scrappy

NPM version NPM downloads Build status Test coverage

Extract rich metadata from URLs.

Try it using Runkit!

Installation

npm install scrappy --save

Usage

Scrappy attempts to parse and extract rich structured metadata from URLs.

import { scraper, urlScraper } from "scrappy";
import * as plugins from "scrappy/dist/plugins";

Scraper

Accepts a request function and a list of plugins to use. The request is expected to return a "page" object, which is the same shape as the input to scrape(page).

const scrape = scraper({
  request,
  plugins: [plugins.htmlmetaparser, plugins.exifdata],
});

const res = await fetch("http://example.com"); // E.g. `popsicle`.

await scrape({
  url: res.url,
  status: res.status,
  headers: res.headers.asObject(),
  body: res.stream(), // Must stream the request instead of buffering to support large responses.
});

URL Scraper

Simpler wrapper around scraper that automatically makes a request(url) for the page.

const scrape = urlScraper({ request });

await scrape("http://example.com");

License

Apache 2.0

0.6.0

3 years ago

0.5.2

4 years ago

0.5.1

4 years ago

0.5.0

4 years ago

0.4.0

4 years ago

0.3.0

7 years ago

0.2.3

8 years ago

0.2.2

8 years ago

0.2.1

8 years ago

0.2.0

8 years ago

0.1.0

8 years ago

0.0.2

8 years ago

0.0.1

8 years ago