1.0.0 • Published 7 years ago
moocher v1.0.0
Moocher
Web content scraper
Installation
$ npm install --save moocher # or yarn add moocherUsage
new Moocher(urls, options);- urls{String|Array} a single string url or an array of urls to scrape content from.
- options{Object} (optional) the configuration object.- limit{Number} (optional) the number of concurrent requests to make while scraping. Defaults to- undefinedwhich does not enforce a concurrency limit (all requests will be run in parallel).
 
API
Moocher emits the following events:
- "mooch": Emits for each response. The callback receives the following arguments:- $: The cheerio-loaded document. This means you can just use jQuery methods on the response document.
- url: The original url passed to Moocher.
- response: The full response object
 
- "error": Emits when a single request fails
- "complete": Emits when the moocher is done mooching.
Example
const mooch = new Moocher([
  'https://url-1.com',
  'http://url-2.com',
  'http://url-3.com',
  'https://url-4.com',
  'http://url-5.com'
], {
  limit: 2 // allow only 2 concurrent requests
});
mooch
  // emitted for each web page mooched
  .on('mooch', ($, url) => {
    const $h1 = $('h1');
    titles.push($h1.text());
  })
  // emitted if any request fails
  .on('error', (err) => console.error(err))
  // emitted when all urls have been mooched
  .on('complete', () => {
    console.log(`All titles have been mooched: ${titles.join(', ')}`);
  })
  // start mooching!
  .start();