0.1.1 • Published 3 years ago

baha-anime-crawler v0.1.1

Weekly downloads
-
License
MIT
Repository
github
Last release
3 years ago

baha-anime-crawler

A friendly anime related data crawler with built-in concurrency support and rate limiting.

Both CLI tool and library are provided.

It can be helpful if you want to analyze the popularity of anime in Bahamut Anime.

❯ bac -h
Usage: bac [options]

Options:
  -o, --output <dir>    output directory (default: "data")
  -l, --list <file>     crawled list file
  -d, --details <file>  crawled details file
  -e, --episodes        crawl episodes
  -r, --ratings         crawl ratings
  -c, --comments        crawl comments
  -i, --indent          json indent spaces
  -C, --concurrent      max concurrent requests
  -I, --interval        rate limit window interval
  -L, --limit           rate limit max requests
  -h, --help            display help for command

Install

pnpm, npm, yarn

pnpm i -g baha-anime-crawler

build from source

git clone https://github.com/JacobLinCool/baha-anime-crawler.git
cd baha-anime-crawler
pnpm i && pnpm build && pnpm link -g

Dependency Between Stages

There are 3 main stages in this crawler.

  1. Meta list (ListCrawler)
  2. Anime Details (DetailCrawler)
  3. Extra Things (EpisodeCrawler, RatingCrawler, CommentCrawler)

By default, ListCrawler and DetailCrawler will be executed. If you want to crawl extra things, you need to specify the corresponding options.

                  ListCrawler
                    ↓
 CommentCrawler ← DetailCrawler → RatingCrawler
                    ↓
                  EpisodeCrawler