1.0.0 • Published 2 years ago

node-proxy-fetch v1.0.0

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

node-proxy-fetch

npm npm version npm downloads sponsors

Fetch web content behind a firewall.

Inspiration

Fetching web content from other websites from client-side usually either results in a CORS or a 403 Forbidden error. A typical workaround for this is to fetch it via a proxy server, but this is also usually blocked due to "Are you a human?" checks.

node-proxy-fetch uses Puppeteer to get the actual page content, grabs the generated HTML, transforms and serves it.

Usage

In your proxy server code, assuming you're using Express:

// Packages:
import express from 'express'
import fetch from 'node-proxy-fetch'


// Constants:
const app = express()


// Functions:
app.get('/', async (req, res) => {
  const webpage = await fetch({
    targetURL: 'https://www.npmjs.com',
    type: 'DOCUMENT',
    puppeteerOptions: {
      baseURL: 'https://www.npmjs.com/package/solid-custom-scrollbars'
    }
  })
  res.send(webpage)
})

app.get('/image', async (req, res) => {
  const image = (await fetch({
    targetURL: 'https://picsum.photos/1000',
    type: 'BLOB'
  })).data
  res.send(image)
})

app.listen(3000)

Usage with Heroku

If you're using this package with Heroku, be sure to add puppeteer-heroku-buildpack as your app's buildpack.

Usage with AWS

If you want to use this package with AWS, try out the sister package aws-proxy-fetch, or check out this guide.

API

targetURL

string

The target URL that you want to fetch.

type

FetchType = 'DOCUMENT' | 'BLOB'

The type of content you are fetching.

axiosOptions

AxiosOptions - OPTIONAL

Options for Axios, only used when type is BLOB.

config

AxiosRequestConfig<any> - OPTIONAL

headers

AxiosRequestHeaders - OPTIONAL

puppeteerOptions

PuppeteerOptions - OPTIONAL

baseURL

string

The base URL with the pattern protocol://domain.tld. All relative paths in the fetched HTML is replaced with this.

waitFor

number - OPTIONAL

The number of milliseconds to wait for before scraping the HTML. This gives time for the Javascript to run on the page. Defaults to 5000.

transformExternalLinks

boolean - OPTIONAL

Whether to transform relative paths with the baseURL or not. Defaults to true.

launchOptions

Partial<PuppeteerOptions> - OPTIONAL

Launch options for Puppeteer.

launchArguments

string[] - OPTIONAL

Launch arguments for Puppeteer.

License

MIT