1.0.0 • Published 2 years ago

sparse-geonames-ingest v1.0.0

Weekly downloads
-
License
bsd
Repository
-
Last release
2 years ago

sparse-geonames-ingest

scripts to generate sparse p2p archives for geonames data

to search the data generated by this tool, use sparse-geonames-search

Sorting and loading results are written to disk to save memory.

This tool is capable of building an archive from allCountries.txt, a 1.5GB archive (as of 2022) with all geonames data. On my laptop this took 38 minutes and used several gigabytes of RAM. cities500.txt takes 10 seconds.

usage

usage: sparse-geonames-ingest COMMAND {OPTIONS}

sparse-geonames-ingest ingest {-i INFILE} -o OUTDIR

  Load, sort, and build in one command.

sparse-geonames-ingest load {-i INFILE} -o OUTDIR

  Read geonames newline-delimited text file from INFILE or "-" (stdin),
  writing output to OUTDIR.

sparse-geonames-ingest sort -o OUTDIR

  Sort loaded data from OUTDIR, writing results to OUTDIR.

sparse-geonames-ingest build -o OUTDIR (--size=SIZE)

  Build sorted data from OUTDIR, writing results to OUTDIR.
  Each file in the output is at most SIZE bytes.

api

var sgingest = require('sparse-geonames-ingest')

var ingest = sgingest(opts)

  • opts.outdir - directory to use for output and temporary storage
  • opts.recordsLimit - number of records to buffer before flushing to disk
  • opts.lookupLimit - number of lookups to buffer before flushing to disk
  • opts.limit - number of records and lookups to buffer before flushing to disk

ingest.write(line, cb)

Process one line of input from a geonames text file, calling cb(err) when complete.

ingest.flush(cb)

Flush buffered records in memory to disk, calling cb(err) when complete.

ingest.sort(cb)

Sort

ingest.build(opts={}, cb)

install

npm install sparse-geonames-ingest

license

bsd