postal-code-scraper v1.0.3
Postal Code Scraper
š Overview
Postal Code Scraper is an automated web scraper designed to extract postal code data from countries worldwide. It efficiently fetches postal codes and organizes them into structured JSON files for easy use in applications.
This library uses Puppeteer for web scraping, Cheerio for HTML parsing, p-limit for controlling concurrency, ensuring accurate and efficient data extraction.
š Features
- Scrape postal codes from any country
- Scrape all countries in one go
- Save results as JSON files for easy integration
- Configurable settings (concurrency, retries, headless mode, etc.) <- read more below
- Structured postal code lookup generation
- Fully asynchronous for optimized performance
š¦ Installation
Install via npm:
npm install postal-code-scraper
Or with Yarn:
yarn add postal-code-scraper
š Usage Guide
1ļøā£ Import the Library
ES Module (Recommended):
import { PostalCodeScraper } from "postal-code-scraper";
CommonJS:
const { PostalCodeScraper } = require("postal-code-scraper");
2ļøā£ Scrape a Single Country
async function scrapeSingleCountry() {
await PostalCodeScraper.scrapeCountry("Canada");
}
scrapeSingleCountry();
š Output Files (saved in ):
Canada-postal-codes.json
Canada-lookup.json
3ļøā£ Scrape All Countries
async function scrapeAllCountries() {
await PostalCodeScraper.scrapeCountries();
}
scrapeAllCountries();
š This will fetch postal codes for every available country.
4ļøā£ Customize Scraper Configuration
const customScraper = new PostalCodeScraper({
concurrency: 10, // Limit concurrent requests
maxRetries: 3, // Max retries per request (if a request fails -> so we don't lose data)
headless: false, // Run Puppeteer in visible mode
usePrettyName: true, // Store data using country pretty names
logger: console // Enable console logging (default is own implemented)
directory: 'src/data' // Choose the folder where you want to save the data
});
async function run() {
await customScraper.scrapeCountry("Germany");
}
run();
š Output Data Format
š¹ romania-postal-codes.json
{
"cluj": {
"agarbiciu": [
"407146"
],
"aghiresu": [
"407005"
],
"cluj-napoca": [
"400001",
"400002",
"400003",
"...",
],
}
š¹ romania-lookup.json
{
"postalCodeMap": {
"337563": "tamasesti_2",
"337564": "valea_4",
"400001": "cluj-napoca_1",
"400002": "cluj-napoca_1",
"400003": "cluj-napoca_1",
},
"regions": {
"cluj-napoca_1": [
"cluj",
"cluj-napoca"
],
"tamasesti_2": [
"hunedoara",
"tamasesti"
],
"valea_4": [
"hunedoara",
"valea"
],
}
}
š Configuration Options
Option | Type | Default | Description |
---|---|---|---|
directory | string | src/data | The directory to save data |
concurrency | number | 15 | Maximum concurrent requests to process |
maxRetries | number | 5 | Number of retries for failed requests |
headless | boolean | true | Run Puppeteer in headless mode |
usePrettyName | boolean | false | Use country pretty names instead of default names |
logger | object null | Logger (custom implementation) | Handles event logging, can be set to null to disable logging |
ā FAQs
1. Where are the postal code files stored?
By default, they are saved in:
src/data/
Each country has two JSON files: one with raw postal codes and another with a structured lookup.
2. Can I scrape multiple countries at once?
Yes, using scrapeCountries()
, which scrapes all countries automatically.
3. Can I change the output directory?
Yes, by changing the directory
attribute in configuration.
4. Does this package work with TypeScript?
Yes! The package includes TypeScript types for better development experience.
5. How can I turn off logging?
You, by setting the logger
attribute in configuration to null
.
š Future Enhancements
- ā Support for exporting data as CSV
š¤ Contributing
Contributions are welcome! Feel free to submit a pull request or open an issue.
š License
MIT License Ā© 2024