Html-link-crawler NPM

HTML link crawler

Implements the simplecrawler module to return a bugger for each html link found on a domain.

Usage

Get all html for a localhost:3000, and strip 'fromSearch' query param in order to remove duplicates.

const htmlLinkCrawler = require('html-link-crawler');
htmlLinkCrawler({url: 'http://localhost:3000/', ignoreQsParams: ['fromSearch']})
  .on('htmlFetchComplete', html => {
    console.info(Object.keys(html))
  })
  .start();

Events

The object returned is exactly as the simplecrawler module, with one additional event:

htmlFetchComplete - fired on complete. Returns an object with path to resource as the key and the html as the value.

cheerio simplecrawler

@infinitebrahmanuniverse/nolb-html-l @everything-registry/sub-chunk-1867 gulp-lmt-tasks lmt-utils

0.1.1

8 years ago

0.1.0

8 years ago

0.0.1

8 years ago