@opd/crawler NPM

crawler

Web crawler based on Puppeteer

Install

npm install @opd/crawler

Use

import Crawler from '@opd/crawler'
// or commonjs
const Crawler = require('@opd/crawler').default

const crawler = new Crawler(options)

API

`new Crawler(options)`

create crawler instance

options: crawler instance config

parallel: maximum number of crawlers, default is 5
pageEvaluate: evaluate function on current page, see Puppeteer, cannot support extra args now

`crawler.launch([options])`

launch browser use puppeteer.launch

`crawler.queue(urls)`

add urls to crawler queue

Note: check url strictly, means url must start with https?

`crawler.start([urls]): PageResult[]`

start crawl page, if urls is presented, will call crawler.queue firstly.

const result = await crawler.start()
console.log(result)

// [
//   {
//     url, // page url
//     result // crawled result
//   }
// ]

Note: if you call start before launch, browser will also be launched, but with no extra launch options

crawler puppeteer headless-chrome

@babel/runtime core-js puppeteer

@infinitebrahmanuniverse/nolb-_opd @everything-registry/sub-chunk-684 @zalastax/nolb-_opd

2 years ago

4 years ago

4 years ago

4 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago