Mercator-crawler NPM

Mercator Crawler

Provides a URL-frontier, MetaData fetcher, URL Deduper, and is very plug and play friendly 😄.

To run the default Mercator Crawler with no options (this will fetch metadata and provide a readability like function that grabs the main content/article body):

import { Mercator } from "mercator-crawler";

(async () => {
	const mercator = new Mercator();

	// do not await this seedURL. You can only await it after you have called runToCompletion or iterated through all the data sent back.
	mercator.seedURL("https://www.wsj.com/articles/magnus-carlsen-ian-nepomniachtchi-world-chess-championship-computer-analysis-11639003641").then(x => {
		console.log(x);
	});

	await mercator.runToCompletion();
})();

Example 2:

import { Mercator } from "mercator-crawler";

(async () => {
	const mercator = new Mercator();

	// The sendURL can be awaited as it automatically runs to completion.
	const {articleBody, metadata} = await mercator.sendURL("https://www.wsj.com/articles/magnus-carlsen-ian-nepomniachtchi-world-chess-championship-computer-analysis-11639003641");
	
	console.log(articleBody);
	console.log(metadata);
})();