0.3.8 • Published 1 year ago

simple-node-site-crawler v0.3.8

Weekly downloads
-
License
ISC
Repository
github
Last release
1 year ago

Node-Site-Crawler

A simple node module to crawl a domain and generate a page list. This is very much an experimental work in progress.

Page Anatomy

{
	target: string;
	domain: string;
	source?: string;
	responseCode?: number;
	body?: string;
	links():Array<string>,
	internalLinks():Array<string>,
	externalLinks():Array<string>,
}

Usage examples:

Crawling sites:

	import { Crawler } from 'simple-node-site-crawler';

	async function run() {

		const crawler = new Crawler(`jesseconner.ca`);

		await crawler.crawlSite();

	}

	run();

Checking Status:

	crawler.events.on( 'update', ( status ) => {
			if ( status.isDone ) {
				console.log( 'Done!' );
				return;
			}
			console.log(
				`Crawling ${ status.currentPage } (Pages crawled: ${ status.pagesCrawled })`
			);
		} );

Working with results:

	import { Crawler } from 'simple-node-site-crawler';
	const crawler = new Crawler(`jesseconner.ca`);
	const site = crawler.loadResults();
 
 	// Find any pages not linked from homepage.
	const burriedPages = site.filter(page => page.source != `https://jesseconner.ca/`);
	burriedPages.map(page => console.log(page.source));
	
	// Find any pages that are bad links.
	const missingPages = site.filter(page => page.responseCode > 399);
	missingPages.map(page => console.log(page.source));
0.3.6

1 year ago

0.3.8

1 year ago

0.3.7

1 year ago

0.3.5

2 years ago

0.3.4

2 years ago

0.3.3

2 years ago

0.3.2

2 years ago

0.3.0

2 years ago

0.3.1

2 years ago

0.2.2

3 years ago

0.2.1

3 years ago

0.2.0

3 years ago

0.1.13

3 years ago

0.1.14

3 years ago

0.1.15

3 years ago

0.1.16

3 years ago

0.1.12

3 years ago

0.1.11

3 years ago

0.1.10

3 years ago

0.1.9

3 years ago

0.1.8

3 years ago

0.1.7

3 years ago

0.1.6

3 years ago

0.1.5

3 years ago

0.1.4

3 years ago

0.1.3

3 years ago

0.1.2

3 years ago

0.1.1

3 years ago

0.1.0

3 years ago

0.0.1

3 years ago