0.3.10 • Published 2 years ago
simple-node-site-crawler v0.3.10
Node-Site-Crawler
A simple node module to crawl a domain and generate a page list. This is very much an experimental work in progress.
Page Anatomy
{
target: string;
domain: string;
source?: string;
responseCode?: number;
body?: string;
links():Array<string>,
internalLinks():Array<string>,
externalLinks():Array<string>,
}Usage examples:
Crawling sites:
import { Crawler } from "simple-node-site-crawler";
async function run() {
const crawler = new Crawler(`jesseconner.ca`);
await crawler.crawlSite();
}
run();Checking Status:
crawler.events.on("update", (status) => {
if (status.isDone) {
console.log("Done!");
return;
}
console.log(
`Crawling ${status.currentPage} (Pages crawled: ${status.pagesCrawled})`,
);
});Working with results:
import { Crawler } from "simple-node-site-crawler";
const crawler = new Crawler(`jesseconner.ca`);
const site = crawler.loadResults();
// Find any pages not linked from homepage.
const burriedPages = site.filter(
(page) => page.source != `https://jesseconner.ca/`,
);
burriedPages.map((page) => console.log(page.source));
// Find any pages that are bad links.
const missingPages = site.filter((page) => page.responseCode > 399);
missingPages.map((page) => console.log(page.source));0.3.9
2 years ago
0.3.10
2 years ago
0.3.6
3 years ago
0.3.8
3 years ago
0.3.7
3 years ago
0.3.5
4 years ago
0.3.4
4 years ago
0.3.3
4 years ago
0.3.2
4 years ago
0.3.0
4 years ago
0.3.1
4 years ago
0.2.2
5 years ago
0.2.1
5 years ago
0.2.0
5 years ago
0.1.13
5 years ago
0.1.14
5 years ago
0.1.15
5 years ago
0.1.16
5 years ago
0.1.12
5 years ago
0.1.11
5 years ago
0.1.10
5 years ago
0.1.9
5 years ago
0.1.8
5 years ago
0.1.7
5 years ago
0.1.6
5 years ago
0.1.5
5 years ago
0.1.4
5 years ago
0.1.3
5 years ago
0.1.2
5 years ago
0.1.1
5 years ago
0.1.0
5 years ago
0.0.1
5 years ago