1.1.1 • Published 5 years ago

mugshots-client v1.1.1

Weekly downloads
-
License
GPL-3.0
Repository
-
Last release
5 years ago

mugshots-client

npm version

About

Unofficial Node.js client for mugshots.com. Exposes both a Readable Stream and an Async Iterator API for streaming Mugshot objects. 🚔👮

Usage

Install

npm i mugshots-client --s

Import

Typescript

import { MugshotStream, Mugshot } from 'mugshots-client';

Javascript (CommonJS)

const { MugshotStream } = require('mugshots-client');

API

Readable Stream API

import { MugshotStream, Mugshot } from 'mugshots-client';

(async () => {
  const mugshotStream = await MugshotStream({ maxChunkSize: 10 });
  console.log('Stream created.');

  mugshotStream.on('error', (error) => {
    console.log(error);
  });

  mugshotStream.on('close', () => {
    console.log('Stream closed.');
  });

  mugshotStream.on('data', (mugshots: Mugshot[]) => {
    console.log('data', mugshots);
  });
})();

Async Iterator API

import * as puppeteer from 'puppeteer';
import {
  CountyIterable,
  MugshotUrlChunkIterable,
  scrapeMugshots,
  PagePool,
  Mugshot
} from 'mugshots-client';

(async () => {
  const browser = await puppeteer.launch();
  const pagePool = PagePool(browser, { max: 10 });
  const page = await pagePool.acquire();
 
  const counties = await CountyIterable(page);
  for await (const county of counties) {
    const mugshotUrls = await MugshotUrlChunkIterable(page, county);
    for await (const chunk of mugshotUrls) {
      const mugshots = await scrapeMugshots(pagePool, chunk, { maxChunkSize: 20 });
      console.log(mugshots);
    }
  }
})();

Docs

MugshotStream
PagePool
CountyIterable
MugshotUrlIterable
scrapeMugshot
scrapeMugshots

FAQ

Why'd you make this? Isn't www.mugshots.com immoral?

My goals are to: 1. Subvert mugshots.com by making the watermarked records they re-publish from the public domain freely available for anyone to use 2. Bring attention to the moral implications for open records on the internet - More on NPR's Planet Money podcast, Episode 878: Mugshots For Sale 3. Use this library for inequality and social justice research

Why'd you use Puppeteer? Isn't cheerio faster & doesn't it use less resources?

I chose Puppeteer to provide a path forward for obscuring scraping, to future-proof this software against censorship or TOS changes.

Here is an article on making headless Chrome undetectable. My goal is to provide an API for making an undetectable scraper. It will be impossible to detect scraping if we manipulate the Chrome browser's behavior and properties to mimic a human user's browser.

1.1.1

5 years ago

1.1.0

5 years ago

1.0.4

5 years ago

1.0.3

5 years ago

1.0.1

5 years ago

1.0.0

5 years ago