Scrapix-cli NPM

Description

Scrapix is an interactive CLI for scraping and downloading Google images easily. The core of this CLI is the images-scraper package, which uses a puppeteer-based headless browser to automatically search and return the URLs of the images. This scrapix CLI can help developers and non-developers to build image datasets for computer vision tasks i.e. image classification, object detection, face recognition.

Installation

$ npm install -g scrapix-cli

To resolve `puppeteer` installation issues on Linux based OS: -

$ sudo npm install -g scrapix-cli --unsafe-perm=true

Usage

To start the CLI, Type the scrapix command in your CMD or terminal: -

$ scrapix

Modes

default	The search keywords are defined in the terminal directly
file	The search keywords are defined in a .json file.

Default mode: usage

Required Parameters: -

keywords: - The names of images you want to download seperated by commas (i.e. roses,daisy,hibiscus)
number: - The number of images you want to download

File mode: usage

Required parameters: -

Name of .json file in the base directory to load the keyword and number of images to be scraped and downloaded for each keyword.

Structure of the .json file content

{
    "images": [
        {"keyword": "roses", "number":20},
        {"keyword":"daisy", "number":25},
        {"keyword":"hibiscus", "number":30},
    ]
}

View Video Demonstration