readdir-sync-recursive
Recursively crawl through directories
Recursively crawl through directories
一个简单的图片爬取模块,可以批量搜索本地文件中的图片链接并下载,基于nodejs
Build a npm module from a file
NodeJS Crawler for Twitter
Moving or backing up your Wordpress site to Blogger
custom scrapping tool
Follows and collects breadcrumbs accross the web
Small-scale webpage archiver
Crawl the web breadth-first from a seed url, statefully
Checks for mixed content a.k.a. HTTP references on HTTPS, very alpha.
SeoCheck module is build to check if there is/are any irregularites in your HTML file. You can customise your own set of rules to check if your end requirements are met.
crawls one or more pages on your site and tests for broken anchor, link, script and image links
An using chrome headless mode to download dynamic page tool.
A powerful crawler support strategy to different url, the crawler can traverse all web page in a site recursively with certain deep. of course, you alse can do not use recursive crawel.
billboard chart crawling module
A streaming directory traverser for node 4 or greater
This app will crawl and fully load a list of URLs or sitemap.xml(soon) using Puppeteer (aka headless Chromium). It's the ONLY crawler that (1) fully loads pages and (2) mimics browser HTTP headers to NGINX or Varnish. At the same time, it's optimized to n
a simple library for :spider: to create models from webpages :spider_web:
Electron based command line Star Wars opening crawl generator
Parses robots.txt files to provide meaningful, useful output as well as reporting syntax errors.