mugshots-client v1.1.1
mugshots-client
About
Unofficial Node.js client for mugshots.com. Exposes both a Readable Stream and an Async Iterator API for streaming Mugshot objects. 🚔👮
Usage
Install
npm i mugshots-client --s
Import
Typescript
import { MugshotStream, Mugshot } from 'mugshots-client';
Javascript (CommonJS)
const { MugshotStream } = require('mugshots-client');
API
Readable Stream API
import { MugshotStream, Mugshot } from 'mugshots-client';
(async () => {
const mugshotStream = await MugshotStream({ maxChunkSize: 10 });
console.log('Stream created.');
mugshotStream.on('error', (error) => {
console.log(error);
});
mugshotStream.on('close', () => {
console.log('Stream closed.');
});
mugshotStream.on('data', (mugshots: Mugshot[]) => {
console.log('data', mugshots);
});
})();
Async Iterator API
import * as puppeteer from 'puppeteer';
import {
CountyIterable,
MugshotUrlChunkIterable,
scrapeMugshots,
PagePool,
Mugshot
} from 'mugshots-client';
(async () => {
const browser = await puppeteer.launch();
const pagePool = PagePool(browser, { max: 10 });
const page = await pagePool.acquire();
const counties = await CountyIterable(page);
for await (const county of counties) {
const mugshotUrls = await MugshotUrlChunkIterable(page, county);
for await (const chunk of mugshotUrls) {
const mugshots = await scrapeMugshots(pagePool, chunk, { maxChunkSize: 20 });
console.log(mugshots);
}
}
})();
Docs
MugshotStream
PagePool
CountyIterable
MugshotUrlIterable
scrapeMugshot
scrapeMugshots
FAQ
Why'd you make this? Isn't www.mugshots.com immoral?
My goals are to: 1. Subvert mugshots.com by making the watermarked records they re-publish from the public domain freely available for anyone to use 2. Bring attention to the moral implications for open records on the internet - More on NPR's Planet Money podcast, Episode 878: Mugshots For Sale 3. Use this library for inequality and social justice research
Why'd you use Puppeteer? Isn't cheerio faster & doesn't it use less resources?
I chose Puppeteer to provide a path forward for obscuring scraping, to future-proof this software against censorship or TOS changes.
Here is an article on making headless Chrome undetectable. My goal is to provide an API for making an undetectable scraper. It will be impossible to detect scraping if we manipulate the Chrome browser's behavior and properties to mimic a human user's browser.