1.0.0 • Published 7 years ago
moocher v1.0.0
Moocher
Web content scraper
Installation
$ npm install --save moocher # or yarn add moocherUsage
new Moocher(urls, options);urls{String|Array} a single string url or an array of urls to scrape content from.options{Object} (optional) the configuration object.limit{Number} (optional) the number of concurrent requests to make while scraping. Defaults toundefinedwhich does not enforce a concurrency limit (all requests will be run in parallel).
API
Moocher emits the following events:
"mooch": Emits for each response. The callback receives the following arguments:$: The cheerio-loaded document. This means you can just use jQuery methods on the response document.url: The original url passed to Moocher.response: The full response object
"error": Emits when a single request fails"complete": Emits when the moocher is done mooching.
Example
const mooch = new Moocher([
'https://url-1.com',
'http://url-2.com',
'http://url-3.com',
'https://url-4.com',
'http://url-5.com'
], {
limit: 2 // allow only 2 concurrent requests
});
mooch
// emitted for each web page mooched
.on('mooch', ($, url) => {
const $h1 = $('h1');
titles.push($h1.text());
})
// emitted if any request fails
.on('error', (err) => console.error(err))
// emitted when all urls have been mooched
.on('complete', () => {
console.log(`All titles have been mooched: ${titles.join(', ')}`);
})
// start mooching!
.start();