1.0.20 • Published 3 years ago

wiki-import v1.0.20

Weekly downloads
-
License
MIT
Repository
github
Last release
3 years ago

wiki-import — Wikipedia AST to LevelDB

Parses pages from a Wikipedia XML dump into JSON AST using wikiparse and saves them into a LevelDB database.

Installation

npm i -g wiki-import

Usage

Download and unpack a *-pages-articles.xml.bz2 archive of your choosing from Wikimedia Downloads. Make sure you have enough free space for the database and run:

wiki-to-leveldb <pages-articles.xml[.bz2]> <dbPath> [workerCount = cpuCount] [--with-source]

for example:

lbzip2 -kd simplewiki-20220101-pages-articles.xml.bz2
wiki-to-leveldb simplewiki-20220101-pages-articles.xml simplewiki

The first step is optional: if you want to trade time for storage space, you can pass an *.xml.bz2 archive directly to wiki-to-leveldb for streaming decompression, that will be about 1.5 times slower — lbzip2 is highly parallel.

It takes about 4 minutes to import simplewiki (0.9 GB dump) on tmpfs on a 4-core CPU.

1.0.20

3 years ago

1.0.19

3 years ago

1.0.18

3 years ago

1.0.17

3 years ago

1.0.16

3 years ago

1.0.15

3 years ago

1.0.14

3 years ago

1.0.13

3 years ago

1.0.12

3 years ago

1.0.11

3 years ago

1.0.10

3 years ago

1.0.9

3 years ago

1.0.8

3 years ago

1.0.7

3 years ago

1.0.6

3 years ago

1.0.5

3 years ago

1.0.4

3 years ago

1.0.3

3 years ago

1.0.2

3 years ago

1.0.1

3 years ago

1.0.0

3 years ago