1.7.0 • Published 6 months ago

@opd/crawler v1.7.0

Weekly downloads
59
License
MIT
Repository
github
Last release
6 months ago

crawler

Web crawler based on Puppeteer

node (scoped) npm (scoped) build Build Status Coverage Status

Install

npm install @opd/crawler

Use

import Crawler from '@opd/crawler'
// or commonjs
const Crawler = require('@opd/crawler').default

const crawler = new Crawler(options)

API

new Crawler(options)

create crawler instance

options: crawler instance config

  • parallel: maximum number of crawlers, default is 5
  • pageEvaluate: evaluate function on current page, see Puppeteer, cannot support extra args now

crawler.launch([options])

launch browser use puppeteer.launch

crawler.queue(urls)

add urls to crawler queue

Note: check url strictly, means url must start with https?

crawler.start([urls]): PageResult[]

start crawl page, if urls is presented, will call crawler.queue firstly.

const result = await crawler.start()
console.log(result)

// [
//   {
//     url, // page url
//     result // crawled result
//   }
// ]

Note: if you call start before launch, browser will also be launched, but with no extra launch options

1.7.0

6 months ago

1.6.2

2 years ago

1.6.1

2 years ago

1.6.0

3 years ago

1.5.1

3 years ago

1.5.0

3 years ago

1.4.0

3 years ago

1.3.2

4 years ago

1.3.1

4 years ago

1.3.0

4 years ago

1.2.2

4 years ago

1.2.1

4 years ago

1.2.0

4 years ago

1.1.4

4 years ago

1.1.3

4 years ago

1.1.1

4 years ago

1.1.2

4 years ago

1.1.0

5 years ago

1.0.0

5 years ago

0.0.6

5 years ago

0.0.5

5 years ago

0.0.4

5 years ago

0.0.2

5 years ago

0.0.1

5 years ago