redis-web-crawler
Web Crawler to create directed graph of links among connected sites. Runs with Node.js and stores data with Redis
Web Crawler to create directed graph of links among connected sites. Runs with Node.js and stores data with Redis
Fetch the pre-rendered content, meta, links and Open Graph of a webpage, especially Single-Page Application (SPA)
A Node.js package that provides a convenient wrapper around Puppeteer for handling browser automation tasks. This package simplifies common browser operations like navigation, downloads management, screenshots, and page interactions.
Stop website fingerprinting techniques
Crawl a website to generate knowledge file for RAG
crawl youtube without api key (search videos channels or get all channel/playlist's videos)
Download README files from GitHub repository links
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
An easy, lightweight scraper for humans with many inbuilt features..
A simple node module to crawl a domain and generate a page list.
Generate comprehensive PDFs of entire websites, ideal for RAG.
A snazzy light Node.js image crawler laced with TypeScript goodness! 🕵️🦾
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The elite unit of sitemap.xml generation—precise, efficient, dominating. If RobotsForce1 is your air defense, this is your recon mission.
Generate sitemap just throw any link.
Pacote que permite consultar algumas informações do aluno presentes no SIGA da FATEC
GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information
Simple web crawler for creation CDN cache after deploy.
A CLI tool to crawl GitHub repositories and pull all names and email addresses from commit histories.