2.19.9 • Published 2 months ago

@tuplo/fletcher v2.19.9

Weekly downloads
-
License
MIT
Repository
github
Last release
2 months ago

@tuplo/fletcher

HTTP request library, focused on web scraping.

Install

$ npm install @tuplo/fletcher

# or with yarn
$ yarn add @tuplo/fletcher

Usage

Fetch a HTML page and parse it using cheerio.

import fetch from '@tuplo/fletcher';

const $page = await fetch.html('https://foo.com/page.html');
const heading = $page.find('body > h1');

Fetch a JSON file and parse it.

const { foo } = await fetch.json('https://foo.com/page.html');

Find a script on a page and evaluate it.

const { foo } = await fetch.script('https://foo.com/page.html', {
  scriptPath: 'script:nth-of-type(3)',
});

Find JSON-LD metadata on a page.

const [jsonld] = await fetch.jsonld('https://foo.com/page.html');

Work with the raw Response.

const res = await fetch.response('https://foo.com');
console.log(res.headers);
console.log(await res.text());

Work with Puppeteer for headless browser automation.

const client = fetch.create({
  browserWSEndpoint: 'ws://localhost:3000',
});
const $page = await client.browser.html('https://foo.com');
const { foo } = await client.browser.json('https://foo.com', /ajax-list/);

Options

OptionDescriptionDefault
browserWSEndpointPuppeteer web socket address
cacheCaches requests in memoryfalse
delayIntroduce a delay before the request (ms)1_000
formDataObject with key/value pairs to send as form data
encodingThe encoding used by the source page, will be converted to UTF8
headersA simple multi-map of names to values
jsonDataObject with key/value pairs to send as json data
logShould log all request URLS to stderrfalse
onAfterRequestCallback to be called right after request is resolved
proxyProxy configuration (host, port, username, password)
retryRetries failed responsesasync-retry
scriptFindFnA function to find a script element on the page, execute and return it
scriptPathA CSS selector to pick a script element on the page, execute and return it
scriptSandboxAn object to use as base on an execution of a piece of code found on the page
urlSearchParamsA key-value object listing what parameters to add to the query string of url
userAgentSet a custom user agent
validateStatusA function to decide if the response status is an error and should be thrown

API

fletcher(url: string, options?: FletcherOptions) => http.Response

Generic utility to return a HTTP Response

fletcher.html(url: string, options?: FletcherOptions) => Cheerio<AnyNode>

Requests a HTTP resource, parses it using Cheerio and returns its

const $page = await fletcher.html('https://foo.com/page.html');
const heading = $page.find('body > h1');
fletcher.script<T>(url: string, options?: FletcherOptions) => T

Requests a HTTP resource, finds a script on it, executes and returns its global context.

const { foo } = await fletcher.script('https://foo.com/page.html', {
	scriptPath: 'script:nth-of-type(3)',
});
fletcher.text(url: string, options?: FletcherOptions) => string

Requests a HTTP resource, returning it as a string

fletcher.cookies(url: string, options?: FletcherOptions) => CookieJar

Requests a HTTP resources, returning the cookies returned with it.

fletcher.json<T>(url: string, options?: FletcherOptions) => T

Requests a HTTP resource, returning it as a JSON object

fletcher.jsonld(url: string, options?: FletcherOptions) => unknown[]

Requests a HTTP resource, retrieving all the JSON-LD blocks found on the document

fletcher.response(url: string, options?: FletcherOptions) => Response

Requests a HTTP resource, returning the full HTTP Response object

fletcher.browser.html(url: string) => Cheerio<AnyNode>

Requests a HTTP resource using Puppeteer/Chrome, parses it using Cheerio and returns its.

fletcher.browser.json<T>(url: string, requestUrl: string | RegExp) => T

Requests a HTTP resource using Puppeteer/Chrome, intercepts a request made by that page and returns it as a JSON object

fletcher.create(options: FletcherOptions) => Object

Creates a new instance of fletcher with a custom config

const instance = fletcher.create({ headers: { foo: 'bar' } });
await instance.json('http://foo.com');
2.19.8

2 months ago

2.19.9

2 months ago

2.19.7

3 months ago

2.19.6

5 months ago

2.19.4

6 months ago

2.19.5

6 months ago

2.19.2

6 months ago

2.19.3

6 months ago

2.19.0

7 months ago

2.17.2

7 months ago

2.19.1

7 months ago

2.17.0

7 months ago

2.17.1

7 months ago

2.16.5

9 months ago

2.16.6

9 months ago

2.18.1

7 months ago

2.16.4

10 months ago

2.18.0

7 months ago

2.16.3

12 months ago

2.16.1

1 year ago

2.16.0

1 year ago

2.15.0

1 year ago

2.14.1

1 year ago

2.14.2

1 year ago

2.13.4

1 year ago

2.13.2

1 year ago

2.13.3

1 year ago

2.13.0

1 year ago

2.13.1

1 year ago

2.12.9

1 year ago

2.12.7

1 year ago

2.12.8

1 year ago

2.14.0

1 year ago

2.12.5

1 year ago

2.12.6

1 year ago

2.12.3

1 year ago

2.12.4

1 year ago

2.12.2

2 years ago

2.12.0

2 years ago

2.12.1

2 years ago

2.11.0

2 years ago

2.11.1

2 years ago

2.10.0

2 years ago

2.9.2

2 years ago

2.9.1

2 years ago

2.9.3

2 years ago

2.11.2

2 years ago

2.9.0

2 years ago

2.3.0

2 years ago

2.2.0

2 years ago

2.5.0

2 years ago

2.4.0

2 years ago

2.7.0

2 years ago

2.6.0

2 years ago

2.8.0

2 years ago

2.7.1

2 years ago

2.1.0

2 years ago

2.0.0

2 years ago

1.2.0

2 years ago

1.1.0

2 years ago

1.9.0

2 years ago

1.8.0

2 years ago

1.7.0

2 years ago

1.5.2

2 years ago

1.6.0

2 years ago

1.5.1

2 years ago

1.5.0

2 years ago

1.4.0

2 years ago

1.3.1

2 years ago

1.3.0

2 years ago

1.10.0

2 years ago

1.0.0

3 years ago