1.0.4 • Published 1 year ago

spidey-redis v1.0.4

Weekly downloads
-
License
ISC
Repository
github
Last release
1 year ago

npm package

NPM download Package Quality

Redis Spidey - Distributed Web Scraping Solution Powered by Redis

RedisSpidey is a powerful tool that combines the capabilities of Spidey and Redis to enable efficient distributed crawling and web scraping. Leveraging the advanced features of Redis, RedisSpidey features a distributed architecture that supports parallel operation of multiple instances, all listening to the same queue. Additionally, RedisSpidey pushes scraped data back to Redis queues for easy distributed post-processing, enhancing the overall efficiency of the scraping process.

Features

  • Distributed Crawling: RedisSpidey enables seamless operation of multiple instances of crawlers, all listening to the same queue, for efficient distributed crawling.
  • RedisPipeline: RedisSpidey provides support to push crawled data back to Redis queues for distributed post-processing

Installation

npm install spidey-redis

Options

RedisSpidey supports all Spidey options in addition to the following specific options.

ConfigurationTypeDescriptionDefaultRequired
redisUrlstringRedis url such as redis://localhost:6379nullYes
urlsKeystringRedis input queue name such as urls:queuenullYes
dataKeystringRedis output queue name such as data:queuenullYes if using RedisPipeline
sleepDelaynumberWait for new items in queue if empty5000msNo

Usage

import { RedisSpidey, RedisPipeline } from 'spidey-redis';

class AmazonSpidey extends RedisSpidey {
  constructor() {
    super({
      // spidey options ...
      redisUrl: 'redis://localhost:6379',

      // Input queue
      urlsKey: 'amazon:urls',

      // Output queue
      dataKey: 'amazon:data',

      // Redis pipeline to push crawled data to data queue 
      pipelines: [RedisPipeline],
    });
  }
}

Conclusion

RedisSpidey is the ultimate solution for distributed web scraping and crawling, offering unparalleled performance, scalability, and flexibility. With RedisSpidey, you can easily handle large-scale web scraping tasks with ease, while taking advantage of advanced Redis and Spidey technology for efficient distributed crawling and post-processing of data.

License

Spidey is licensed under the MIT License.

1.0.4

1 year ago

1.0.3

1 year ago

1.0.2

1 year ago

1.0.1

1 year ago

1.0.0

1 year ago