0.1.8 • Published 3 years ago
scrapix-cli v0.1.8
Description
Scrapix
is an interactive CLI for scraping and downloading Google images easily. The core of this CLI is the images-scraper package, which uses a puppeteer-based headless browser to automatically search and return the URLs of the images. This scrapix
CLI can help developers and non-developers to build image datasets for computer vision tasks i.e. image classification, object detection, face recognition.
Installation
$ npm install -g scrapix-cli
To resolve puppeteer
installation issues on Linux based OS: -
$ sudo npm install -g scrapix-cli --unsafe-perm=true
Usage
To start the CLI, Type the scrapix
command in your CMD or terminal: -
$ scrapix
Modes
default | The search keywords are defined in the terminal directly |
---|---|
file | The search keywords are defined in a .json file. |
Default mode: usage
Required Parameters: -
keywords
: - The names of images you want to download seperated by commas (i.e. roses,daisy,hibiscus)number
: - The number of images you want to download
File mode: usage
Required parameters: -
- Name of .json file in the base directory to load the keyword and number of images to be scraped and downloaded for each keyword.
Structure of the .json file content
{
"images": [
{"keyword": "roses", "number":20},
{"keyword":"daisy", "number":25},
{"keyword":"hibiscus", "number":30},
]
}
TODOS
- Image validation
- Support for custom image processing actions i.e. resizing, compression
- Improve error handling
- Provide support for other search engines i.e. Bing, Wikipedia
- Improve general performance
- Clean code and develop a better documentation
Contributing Guide
Coming soon