robots-parser
A specification compliant robots.txt parser with wildcard (*) matching support.
A specification compliant robots.txt parser with wildcard (*) matching support.
Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
JavaScript module detecting bots/crawlers/spiders via user-agent
A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS
This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
Using differents rules, try to know if a user agent string comes from a spider
A web crawler for Nodejs.
Extract the article list from its raw news HTML
A simple email extractor for obfuscated emails.
Priority based Semantic Web Crawler.
ECMAScript parser that produces a Shift format AST
A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.
爬虫客户端
A high-performance charting library.
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
JSpider 3 is a Chrome DevTools crawler framework that includes full crawler support. JSpider 3 是在 Chrome Devtools 中进行爬虫的爬虫框架, 这个框架包括了完整的爬虫支持。
An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.
A partial implementation of the W3C DOM API on top of an HTML5 parser and serializer.
Get a list of local URL links from a root URL.