0.3.8 • Published 1 year ago
simple-node-site-crawler v0.3.8
Node-Site-Crawler
A simple node module to crawl a domain and generate a page list. This is very much an experimental work in progress.
Page Anatomy
{
target: string;
domain: string;
source?: string;
responseCode?: number;
body?: string;
links():Array<string>,
internalLinks():Array<string>,
externalLinks():Array<string>,
}
Usage examples:
Crawling sites:
import { Crawler } from 'simple-node-site-crawler';
async function run() {
const crawler = new Crawler(`jesseconner.ca`);
await crawler.crawlSite();
}
run();
Checking Status:
crawler.events.on( 'update', ( status ) => {
if ( status.isDone ) {
console.log( 'Done!' );
return;
}
console.log(
`Crawling ${ status.currentPage } (Pages crawled: ${ status.pagesCrawled })`
);
} );
Working with results:
import { Crawler } from 'simple-node-site-crawler';
const crawler = new Crawler(`jesseconner.ca`);
const site = crawler.loadResults();
// Find any pages not linked from homepage.
const burriedPages = site.filter(page => page.source != `https://jesseconner.ca/`);
burriedPages.map(page => console.log(page.source));
// Find any pages that are bad links.
const missingPages = site.filter(page => page.responseCode > 399);
missingPages.map(page => console.log(page.source));
0.3.6
1 year ago
0.3.8
1 year ago
0.3.7
1 year ago
0.3.5
2 years ago
0.3.4
2 years ago
0.3.3
2 years ago
0.3.2
2 years ago
0.3.0
2 years ago
0.3.1
2 years ago
0.2.2
3 years ago
0.2.1
3 years ago
0.2.0
3 years ago
0.1.13
3 years ago
0.1.14
3 years ago
0.1.15
3 years ago
0.1.16
3 years ago
0.1.12
3 years ago
0.1.11
3 years ago
0.1.10
3 years ago
0.1.9
3 years ago
0.1.8
3 years ago
0.1.7
3 years ago
0.1.6
3 years ago
0.1.5
3 years ago
0.1.4
3 years ago
0.1.3
3 years ago
0.1.2
3 years ago
0.1.1
3 years ago
0.1.0
3 years ago
0.0.1
3 years ago