1.0.3 • Published 4 months ago

postal-code-scraper v1.0.3

Weekly downloads
-
License
MIT
Repository
github
Last release
4 months ago

Postal Code Scraper

šŸ“Œ Overview

Postal Code Scraper is an automated web scraper designed to extract postal code data from countries worldwide. It efficiently fetches postal codes and organizes them into structured JSON files for easy use in applications.

This library uses Puppeteer for web scraping, Cheerio for HTML parsing, p-limit for controlling concurrency, ensuring accurate and efficient data extraction.

šŸš€ Features

  • Scrape postal codes from any country
  • Scrape all countries in one go
  • Save results as JSON files for easy integration
  • Configurable settings (concurrency, retries, headless mode, etc.) <- read more below
  • Structured postal code lookup generation
  • Fully asynchronous for optimized performance

šŸ“¦ Installation

Install via npm:

npm install postal-code-scraper

Or with Yarn:

yarn add postal-code-scraper

šŸ“– Usage Guide

1ļøāƒ£ Import the Library

ES Module (Recommended):

import { PostalCodeScraper } from "postal-code-scraper";

CommonJS:

const { PostalCodeScraper } = require("postal-code-scraper");

2ļøāƒ£ Scrape a Single Country

async function scrapeSingleCountry() {
    await PostalCodeScraper.scrapeCountry("Canada");
}

scrapeSingleCountry();

šŸ“Œ Output Files (saved in ):

  • Canada-postal-codes.json
  • Canada-lookup.json

3ļøāƒ£ Scrape All Countries

async function scrapeAllCountries() {
    await PostalCodeScraper.scrapeCountries();
}

scrapeAllCountries();

šŸ“Œ This will fetch postal codes for every available country.

4ļøāƒ£ Customize Scraper Configuration

const customScraper = new PostalCodeScraper({
    concurrency: 10,  // Limit concurrent requests
    maxRetries: 3,    // Max retries per request (if a request fails -> so we don't lose data)
    headless: false,  // Run Puppeteer in visible mode
    usePrettyName: true, // Store data using country pretty names
    logger: console  // Enable console logging (default is own implemented) 
    directory: 'src/data'  // Choose the folder where you want to save the data
});

async function run() {
    await customScraper.scrapeCountry("Germany");
}

run();

šŸ“ Output Data Format

šŸ”¹ romania-postal-codes.json

{
  "cluj": {
    "agarbiciu": [
      "407146"
    ],
    "aghiresu": [
      "407005"
    ],
    "cluj-napoca": [
      "400001",
      "400002",
      "400003",
      "...",
    ],
}

šŸ”¹ romania-lookup.json

{
  "postalCodeMap": {
    "337563": "tamasesti_2",
    "337564": "valea_4",
    "400001": "cluj-napoca_1",
    "400002": "cluj-napoca_1",
    "400003": "cluj-napoca_1",
  },
  "regions": {
    "cluj-napoca_1": [
      "cluj",
      "cluj-napoca"
    ],
    "tamasesti_2": [
      "hunedoara",
      "tamasesti"
    ],
    "valea_4": [
      "hunedoara",
      "valea"
    ],
  }
}

šŸ›  Configuration Options

OptionTypeDefaultDescription
directorystringsrc/dataThe directory to save data
concurrencynumber15Maximum concurrent requests to process
maxRetriesnumber5Number of retries for failed requests
headlessbooleantrueRun Puppeteer in headless mode
usePrettyNamebooleanfalseUse country pretty names instead of default names
loggerobject nullLogger (custom implementation)Handles event logging, can be set to null to disable logging

ā“ FAQs

1. Where are the postal code files stored?

By default, they are saved in:

src/data/

Each country has two JSON files: one with raw postal codes and another with a structured lookup.

2. Can I scrape multiple countries at once?

Yes, using scrapeCountries(), which scrapes all countries automatically.

3. Can I change the output directory?

Yes, by changing the directory attribute in configuration.

4. Does this package work with TypeScript?

Yes! The package includes TypeScript types for better development experience.

5. How can I turn off logging?

You, by setting the logger attribute in configuration to null.

šŸ— Future Enhancements

  • āœ… Support for exporting data as CSV

šŸ¤ Contributing

Contributions are welcome! Feel free to submit a pull request or open an issue.

šŸ“œ License

MIT License Ā© 2024