0.3.10 • Published 11 months ago

simple-node-site-crawler v0.3.10

Weekly downloads
-
License
ISC
Repository
github
Last release
11 months ago

Node-Site-Crawler

A simple node module to crawl a domain and generate a page list. This is very much an experimental work in progress.

Page Anatomy

{
	target: string;
	domain: string;
	source?: string;
	responseCode?: number;
	body?: string;
	links():Array<string>,
	internalLinks():Array<string>,
	externalLinks():Array<string>,
}

Usage examples:

Crawling sites:

import { Crawler } from "simple-node-site-crawler";

async function run() {
	const crawler = new Crawler(`jesseconner.ca`);

	await crawler.crawlSite();
}

run();

Checking Status:

crawler.events.on("update", (status) => {
	if (status.isDone) {
		console.log("Done!");
		return;
	}
	console.log(
		`Crawling ${status.currentPage} (Pages crawled: ${status.pagesCrawled})`,
	);
});

Working with results:

import { Crawler } from "simple-node-site-crawler";
const crawler = new Crawler(`jesseconner.ca`);
const site = crawler.loadResults();

// Find any pages not linked from homepage.
const burriedPages = site.filter(
	(page) => page.source != `https://jesseconner.ca/`,
);
burriedPages.map((page) => console.log(page.source));

// Find any pages that are bad links.
const missingPages = site.filter((page) => page.responseCode > 399);
missingPages.map((page) => console.log(page.source));
0.3.9

1 year ago

0.3.10

11 months ago

0.3.6

2 years ago

0.3.8

2 years ago

0.3.7

2 years ago

0.3.5

3 years ago

0.3.4

3 years ago

0.3.3

3 years ago

0.3.2

3 years ago

0.3.0

4 years ago

0.3.1

4 years ago

0.2.2

4 years ago

0.2.1

4 years ago

0.2.0

4 years ago

0.1.13

4 years ago

0.1.14

4 years ago

0.1.15

4 years ago

0.1.16

4 years ago

0.1.12

4 years ago

0.1.11

4 years ago

0.1.10

4 years ago

0.1.9

4 years ago

0.1.8

4 years ago

0.1.7

4 years ago

0.1.6

4 years ago

0.1.5

4 years ago

0.1.4

4 years ago

0.1.3

4 years ago

0.1.2

4 years ago

0.1.1

4 years ago

0.1.0

4 years ago

0.0.1

4 years ago