1.7.0 • Published 6 months ago
@opd/crawler v1.7.0
crawler
Web crawler based on
Puppeteer
Install
npm install @opd/crawler
Use
import Crawler from '@opd/crawler'
// or commonjs
const Crawler = require('@opd/crawler').default
const crawler = new Crawler(options)
API
new Crawler(options)
create crawler instance
options
: crawler instance config
parallel
: maximum number of crawlers, default is5
pageEvaluate
: evaluate function on current page, seePuppeteer
, cannot support extra args now
crawler.launch([options])
launch browser use puppeteer.launch
crawler.queue(urls)
add urls to crawler queue
Note: check url strictly, means url must start with
https?
crawler.start([urls]): PageResult[]
start crawl page, if urls
is presented, will call crawler.queue
firstly.
const result = await crawler.start()
console.log(result)
// [
// {
// url, // page url
// result // crawled result
// }
// ]
Note: if you call
start
beforelaunch
,browser
will also be launched, but with no extra launch options
1.7.0
6 months ago
1.6.2
2 years ago
1.6.1
2 years ago
1.6.0
3 years ago
1.5.1
3 years ago
1.5.0
3 years ago
1.4.0
3 years ago
1.3.2
4 years ago
1.3.1
4 years ago
1.3.0
4 years ago
1.2.2
4 years ago
1.2.1
4 years ago
1.2.0
4 years ago
1.1.4
4 years ago
1.1.3
4 years ago
1.1.1
4 years ago
1.1.2
4 years ago
1.1.0
5 years ago
1.0.0
5 years ago
0.0.6
5 years ago
0.0.5
5 years ago
0.0.4
5 years ago
0.0.2
5 years ago
0.0.1
5 years ago