0.1.8 • Published 3 years ago

scrapix-cli v0.1.8

Weekly downloads
-
License
ISC
Repository
github
Last release
3 years ago

experimental

Description

Scrapix is an interactive CLI for scraping and downloading Google images easily. The core of this CLI is the images-scraper package, which uses a puppeteer-based headless browser to automatically search and return the URLs of the images. This scrapix CLI can help developers and non-developers to build image datasets for computer vision tasks i.e. image classification, object detection, face recognition.

Installation

$ npm install -g scrapix-cli

To resolve puppeteer installation issues on Linux based OS: -

$ sudo npm install -g scrapix-cli --unsafe-perm=true

Usage

To start the CLI, Type the scrapix command in your CMD or terminal: -

$ scrapix

Modes

defaultThe search keywords are defined in the terminal directly
fileThe search keywords are defined in a .json file.
Default mode: usage

Required Parameters: -

  • keywords: - The names of images you want to download seperated by commas (i.e. roses,daisy,hibiscus)
  • number: - The number of images you want to download
File mode: usage

Required parameters: -

  • Name of .json file in the base directory to load the keyword and number of images to be scraped and downloaded for each keyword.
Structure of the .json file content
{
    "images": [
        {"keyword": "roses", "number":20},
        {"keyword":"daisy", "number":25},
        {"keyword":"hibiscus", "number":30},
    ]
}

View Video Demonstration

TODOS

  • Image validation
  • Support for custom image processing actions i.e. resizing, compression
  • Improve error handling
  • Provide support for other search engines i.e. Bing, Wikipedia
  • Improve general performance
  • Clean code and develop a better documentation

Contributing Guide

Coming soon