Spider | npm.io

A specification compliant robots.txt parser with wildcard (*) matching support.

robots.txt parser user-agent scraper spider bot robots-exclusion-standard

3.0.1 • Published 2 years ago

Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.

simple crawler spider cache queue simplecrawler eventemitter

1.1.9 • Published 5 years ago

Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

javascript crawler spider scraper scraping jquery nodejs http https http2

2.0.2 • Published 1 year ago

JavaScript module detecting bots/crawlers/spiders via user-agent

bot spider robot crawler useragent user-agent detector detect detection is-bot

1.2.0 • Published 6 years ago

A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS

crawler detector spider bot middleware single page app expressjs

2.1.0 • Published 12 months ago

This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

es6-javascript crawler bots spider detection

4.0.1 • Published 9 months ago

Using differents rules, try to know if a user agent string comes from a spider

bot user-agent crawler spider

2.0.1 • Published 10 years ago

A web crawler for Nodejs.

crawler crawling spider spidering scraping scraper robot bot cheerio

0.8.2 • Published 11 years ago

Extract the article list from its raw news HTML

crawler scraper spider indexer html cheerio news headlines articles

1.0.6 • Published 3 years ago

A simple email extractor for obfuscated emails.

email email-address extractor obfuscated obfuscated-email obfuscated-emails parse regex obfuscator obfuscation

1.1.3 • Published 8 years ago

Priority based Semantic Web Crawler.

semantic-crawler pdf-crawler text-crawler priority priority-crawler scraper crawling spider scraper scraping

0.0.2 • Published 8 years ago

ECMAScript parser that produces a Shift format AST

Shift AST node parser SpiderMonkey Parser API parse spider monkey

8.0.0 • Published 3 years ago

A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.

robots txt robots.txt parser crawler spider bot robotstxt scraper

2.0.3 • Published 3 years ago

爬虫客户端

2.0.3 • Published 6 years ago

A high-performance charting library.

Arction chart data lightning visualization xy WebGL graph ohlc candlestick

5.2.1 • Published 1 year ago

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

crawler spider supercrawler

2.0.0 • Published 6 years ago

JSpider 3 is a Chrome DevTools crawler framework that includes full crawler support. JSpider 3 是在 Chrome Devtools 中进行爬虫的爬虫框架, 这个框架包括了完整的爬虫支持。

spider javascript web crawl browser front-end

3.2.3 • Published 4 years ago

An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

spider crawler scraper headless node fetch traversing json cookies redirects

1.1.3 • Published 4 years ago

A partial implementation of the W3C DOM API on top of an HTML5 parser and serializer.

dom browser cssom css css3 selector spider style getComputedStyle

0.0.1-beta7 • Published 9 years ago

Get a list of local URL links from a root URL.

spider urls chrome puppeteer

3.0.0 • Published 4 years ago