0.3.10 • Published 11 months ago
simple-node-site-crawler v0.3.10
Node-Site-Crawler
A simple node module to crawl a domain and generate a page list. This is very much an experimental work in progress.
Page Anatomy
{
target: string;
domain: string;
source?: string;
responseCode?: number;
body?: string;
links():Array<string>,
internalLinks():Array<string>,
externalLinks():Array<string>,
}
Usage examples:
Crawling sites:
import { Crawler } from "simple-node-site-crawler";
async function run() {
const crawler = new Crawler(`jesseconner.ca`);
await crawler.crawlSite();
}
run();
Checking Status:
crawler.events.on("update", (status) => {
if (status.isDone) {
console.log("Done!");
return;
}
console.log(
`Crawling ${status.currentPage} (Pages crawled: ${status.pagesCrawled})`,
);
});
Working with results:
import { Crawler } from "simple-node-site-crawler";
const crawler = new Crawler(`jesseconner.ca`);
const site = crawler.loadResults();
// Find any pages not linked from homepage.
const burriedPages = site.filter(
(page) => page.source != `https://jesseconner.ca/`,
);
burriedPages.map((page) => console.log(page.source));
// Find any pages that are bad links.
const missingPages = site.filter((page) => page.responseCode > 399);
missingPages.map((page) => console.log(page.source));
0.3.9
1 year ago
0.3.10
11 months ago
0.3.6
2 years ago
0.3.8
2 years ago
0.3.7
2 years ago
0.3.5
3 years ago
0.3.4
3 years ago
0.3.3
3 years ago
0.3.2
3 years ago
0.3.0
4 years ago
0.3.1
4 years ago
0.2.2
4 years ago
0.2.1
4 years ago
0.2.0
4 years ago
0.1.13
4 years ago
0.1.14
4 years ago
0.1.15
4 years ago
0.1.16
4 years ago
0.1.12
4 years ago
0.1.11
4 years ago
0.1.10
4 years ago
0.1.9
4 years ago
0.1.8
4 years ago
0.1.7
4 years ago
0.1.6
4 years ago
0.1.5
4 years ago
0.1.4
4 years ago
0.1.3
4 years ago
0.1.2
4 years ago
0.1.1
4 years ago
0.1.0
4 years ago
0.0.1
4 years ago