aljazeera-crawler v1.0.3

aljazeera-crawler
aljazeera-crawler is a command line application that helps crawl the https://www.aljazeera.net/ website.
Installation
Either installing the tool globally in your system path.
npm install -g aljazeera-crawlerOr using it directly with the help of npx:
npx aljazeera-crawler [options]Usage
For CLI options, use the -h (or --help) argument:
aljazeera-crawler -hAl Jazeera Crawler Usage: aljazeera-crawler options
Options: --version Show version number boolean -t, --threshold the minimum number of words to be crawled number -d, --domain the domain to crawl string [choices: "politics", "economy", "culture", "sport", "art", "technology", "heritage"] -h, --help Show help boolean
Let's say we want to crawl a minimum of 100k word in the technology domain
We will use either:
aljazeera-crawler -t 100000 -d technologyOr:
aljazeera-crawler --threshold 100000 --domain technologyAfter that a file named output-technology-100000.txt will be created.
Domains
For the possible domains to crawl as of know are:
| Category | Link |
|---|---|
| politics سياسة | https://www.aljazeera.net/news/politics/ |
| economy اقتصاد | https://www.aljazeera.net/news/ebusiness/ |
| culture ثقافة | https://www.aljazeera.net/news/cultureandart/ |
| sport رياضة | https://www.aljazeera.net/sport/ |
| art فن | https://www.aljazeera.net/news/arts/ |
| technology تكنولوجيا | https://www.aljazeera.net/news/scienceandtechnology/ |
| heritage تراث | https://www.aljazeera.net/turath/ |
Licence
MIT