1.0.9 • Published 4 years ago

puppeteer-scraping v1.0.9

Weekly downloads
3
License
MIT
Repository
github
Last release
4 years ago

puppeteer-scraping

Scrape anything with very few lines of code.

puppeteer-scraping is a framework to help you save time scraping any website with Puppeteer.

It uses puppeteer-extra-plugin-stealth under the hood.

Motivation

  • Scraping websites is often about following the same steps
  • We should code only what's unique to the scraped website: the scraping logic, the paths taken and the data extracted
  • Puppeteer is the way to go for JavaScript rendered websites but it could be easier to use

Installation

Using npm:

npm install puppeteer-scraping

Using yarn:

yarn add puppeteer-scraping

Usage

const scraping = require('puppeteer-scraping')
const puppeteer = require('puppeteer')

module.exports = async (req, res) => {  
  const { items } = await scraping({
    puppeteer,
    options: { headless: true },
    method: {
      startPage: 'https://example.com',
      goToPages: {
        '//a[@class="item"]/@href': {
          items: {
            products: {
              productTitle: { path: '//h1' }
            },
          }
        }
      }
    }
  })
  
  res.json(items.products)
}
1.0.9

4 years ago

1.0.8

4 years ago

1.0.7

4 years ago

1.0.6

4 years ago

1.0.5

4 years ago

1.0.2

4 years ago

1.0.3

4 years ago

1.0.1

4 years ago

1.0.0

4 years ago

0.1.0

4 years ago