1.0.20 • Published 1 year ago

wiki-import v1.0.20

Weekly downloads
-
License
MIT
Repository
github
Last release
1 year ago

wiki-import — Wikipedia AST to LevelDB

Parses pages from a Wikipedia XML dump into JSON AST using wikiparse and saves them into a LevelDB database.

Installation

npm i -g wiki-import

Usage

Download and unpack a *-pages-articles.xml.bz2 archive of your choosing from Wikimedia Downloads. Make sure you have enough free space for the database and run:

wiki-to-leveldb <pages-articles.xml[.bz2]> <dbPath> [workerCount = cpuCount] [--with-source]

for example:

lbzip2 -kd simplewiki-20220101-pages-articles.xml.bz2
wiki-to-leveldb simplewiki-20220101-pages-articles.xml simplewiki

The first step is optional: if you want to trade time for storage space, you can pass an *.xml.bz2 archive directly to wiki-to-leveldb for streaming decompression, that will be about 1.5 times slower — lbzip2 is highly parallel.

It takes about 4 minutes to import simplewiki (0.9 GB dump) on tmpfs on a 4-core CPU.

1.0.20

1 year ago

1.0.19

2 years ago

1.0.18

2 years ago

1.0.17

2 years ago

1.0.16

2 years ago

1.0.15

2 years ago

1.0.14

2 years ago

1.0.13

2 years ago

1.0.12

2 years ago

1.0.11

2 years ago

1.0.10

2 years ago

1.0.9

2 years ago

1.0.8

2 years ago

1.0.7

2 years ago

1.0.6

2 years ago

1.0.5

2 years ago

1.0.4

2 years ago

1.0.3

2 years ago

1.0.2

2 years ago

1.0.1

2 years ago

1.0.0

2 years ago