javadocs-scraper v1.4.0
๐ javadocs-scraper
A TypeScript library to scrape Java objects information from a Javadocs website.
Specifically, it scrapes data (name, description, url, etc) about, and links together:
- Packages
- Classes
- Interfaces
- Object Type Parameters (Object Generics), on classes and interfaces
- Enums
- Annotations
- Fields
- Methods
Some extra data is also calculated post scraping, like method and field inheritance.
!CAUTION Tested with Javadocs generated from Java 7 to Java 21. I cannot guarantee this will work with older or newer versions.
Contents
๐ฆ Installation and Usage
- Install with your preferred package manager:
npm install javadocs-scraper
yarn add javadocs-scraper
pnpm add javadocs-scraper- Instantiate a
Scraper:
import { Scraper } from 'javadocs-scraper';
const scraper = Scraper.fromURL('https://...');!NOTE This package uses constructor dependency injection for every class.
You can also instantiate
Scraperwith thenewkeyword, but you'll need to specify every dependency manually.The easier way is to use the
Scraper.fromURL()method, which will use the default implementations.!TIP Alternatively, you can provide your own
Fetcherto fetch data from the Javadocs:import type { Fetcher } from 'javadocs-scraper'; class MyFetcher implements Fetcher { /** ... */ } const myFetcher = new MyFetcher('https://...'); const scraper = Scraper.with({ fetcher: myFetcher });
- Use the
Scraperto scrape information:
const javadocs: Javadocs = await scraper.scrape();
/** for example */
const myInterface = javadocs.getInterface('org.example.Interface');!TIP The
Javadocsobject uses discord.js'Collectionclass to store all the scraped data. This is an extension ofMapwith utility methods, likefind(),reduce(), etc.These collections are also typed as mutable, so any modification will be reflected in the backing
Javadocs. This is by design, since the library no longer uses this object once it's given to you, and doesn't care what you then do with it.Check the discord.js guide or the
Collectiondocs for more info.
๐ Warnings
- Make sure to not spam a Javadocs website. It's your responsibility to not abuse the library, and implement appropiate methods to avoid abuse, like a cache.
- The
scrape()method will take a while to scrape the entire website. Make sure to only run it when necessary, ideally only once in the entire program's lifecycle.
๐ Specifics
There are distinct types of objects that hold the library together:
- A
Fetcherยน, which makes requests to the Javadocs website. Entitiesยฒ, which represent a scraped object.QueryStrategiesยน, which query the website through cheerio. Needed since HTML class and ids change between Javadoc versions.Scrapersยน, which scrape information from a given URL or cheerio object, to a partial object.Partialsยฒ, which represent a partially scraped object, that is, an object without circular references to other objects.- A
ScraperCache, which caches partial objects in memory. Patchersยน, which patch partials to make them full entities, by linking them together.Javadocs, which is the final result of the scraping process.
ยน - Replaceable via constructor injection.
ยฒ - Only a type, not available in runtime.
The scraping process ocurs in the following steps:
- A
QueryStrategyis chosen by theQueryStrategyFactory. - The
RootScraperiterates through every package in the Javadocs root. - For every package, it's fetched, and passed to the
PackageScraper. - The
PackageScraperiterates through every class, interface, enum and annotation in the package and passes them to the appropriateScraper. - Each scraper creates a partial object, and caches it in the
ScraperCache. - Once everything is done, the
Scraperuses thePatchersto patch the partial objects together, by passing the cache to each patcher. - The
Scraperreturns the patched objects, in aJavadocsobject.
!TIP You can provide your own
QueryStrategyFactoryto change the way theQueryStrategyis chosen.import { OnlineFetcher } from 'javadocs-scraper'; const myFetcher = new OnlineFetcher('https://...'); const factory = new MyQueryStrategyFactory(); const scraper = Scraper.with({ fetcher: myFetcher, queryStrategyFactory: factory });