Fetchfox NPM | npm.io

Getting started

Install the package and playwright:

npm i fetchfox
npx playwright install-deps
npx playwright install

Then use it. Here is the callback style:

import { fox } from 'fetchfox';

const workflow = await fox
  .init('https://pokemondb.net/pokedex/national')
  .extract({ name: 'Pokemon name', number: 'Pokemon number' })
  .limit(3)
  .plan();

const results = workflow
  .run(null, (delta) => { console.log(delta.item) });
  
for (const result of results) {
  console.log('Item:', result.item);
}

If you prefer, you can use the streaming style:

import { fox } from 'fetchfox';

const stream = fox
  .init('https://pokemondb.net/pokedex/national')
  .extract({ name: 'Pokemon name', number: 'Pokemon number' })
  .stream();

for await (const delta of stream) {
  console.log(delta.item);
}

Following URLs

You'll often want to scrape over multiple levels. You can do this using the url field. If you extract a url field, FetchFox will follow that URL on the next step.

For example, you can get HP and attack on the second page of the Pokedex:

const workflow = await fox
  .init('https://pokemondb.net/pokedex/national')
  .extract({ 
    url: 'URL of pokemon profile', 
    name: 'Pokemon name', 
    number: 'Pokemon number'
  })
  .extract({ 
    hp: 'Pokemon HP', 
    attack: 'Pokemon attack power', 
  })
  .limit(3)
  .plan();

const results = workflow
  .run(null, (delta) => { console.log(delta.item) });
  
for (const result of results) {
  console.log('Item:', result.item);
}

This scraper will start at https://pokemondb.net/pokedex/national, and then go to detail pages like https://pokemondb.net/pokedex/pikachu to get the HP and attack values.

Enter your API key

You'll need to give an API key for the AI provider you are using, such as OpenAI. There are a few ways to do this.

The easiest option is to set the OPENAI_API_KEY environment variable. This will get picked up by the FetchFox library, and all AI calls will go through that key. To use this option, run your code like this:

OPENAI_API_KEY=sk-your-key node index.js

Alternatively, you can pass in your API key in code, like this:

import { fox } from 'fetchfox';

const results = await fox
  .config({ ai: { model: 'openai:gpt-4o-mini', apiKey: 'sk-your-key' }})
  .init('https://pokemondb.net/pokedex/national')
  .extract({ name: 'Pokemon name', number: 'Pokemon number' })
  .limit(3)
  .run();

This will use OpenAI's gpt-4o-mini model, and the API key you specify. You can also use OpenRouter to access AI models from other providers:

const results = await fox
  .config({ ai: { model: 'openrouter:google/gemini-flash-1.5', apiKey: 'your-openrouter-key' }})
  .init('https://pokemondb.net/pokedex/national')
  .extract({ name: 'Pokemon name', number: 'Pokemon number' })
  .limit(3)
  .run();

Choose the AI model that best suits your needs.

The following providers are supported

OpenAI: Model strings are openai:..., for example openai:gpt-4o
Google: Model strings are google:..., for example google:gemini-1.5-flash
OpenRouter: Model strings are openrouter:..., for example openrouter:anthropic/claude-3.5-haiku

By default, FetchFox uses OpenAI's gpt-4o-mini model. We've found this model to provide a good tradeoff between cost, runtime, and accuracy. We have a public benchmarks dashboard where you can review performance data on recent commits.