1.0.0 • Published 6 years ago

moocher v1.0.0

Weekly downloads
1
License
MIT
Repository
github
Last release
6 years ago

Moocher

Web content scraper

CircleCI

Installation

$ npm install --save moocher # or yarn add moocher

Usage

new Moocher(urls, options);
  • urls {String|Array} a single string url or an array of urls to scrape content from.
  • options {Object} (optional) the configuration object.
    • limit {Number} (optional) the number of concurrent requests to make while scraping. Defaults to undefined which does not enforce a concurrency limit (all requests will be run in parallel).

API

Moocher emits the following events:

  • "mooch": Emits for each response. The callback receives the following arguments:
    • $: The cheerio-loaded document. This means you can just use jQuery methods on the response document.
    • url: The original url passed to Moocher.
    • response: The full response object
  • "error": Emits when a single request fails
  • "complete": Emits when the moocher is done mooching.

Example

const mooch = new Moocher([
  'https://url-1.com',
  'http://url-2.com',
  'http://url-3.com',
  'https://url-4.com',
  'http://url-5.com'
], {
  limit: 2 // allow only 2 concurrent requests
});

mooch
  // emitted for each web page mooched
  .on('mooch', ($, url) => {
    const $h1 = $('h1');
    titles.push($h1.text());
  })
  // emitted if any request fails
  .on('error', (err) => console.error(err))
  // emitted when all urls have been mooched
  .on('complete', () => {
    console.log(`All titles have been mooched: ${titles.join(', ')}`);
  })
  // start mooching!
  .start();
1.0.0

6 years ago

0.1.2

7 years ago

0.1.1

7 years ago

0.1.0

7 years ago

0.0.2

7 years ago