0.1.0 • Published 7 years ago

wikipedia-edit-stream v0.1.0

Weekly downloads
4
License
Apache-2.0
Repository
github
Last release
7 years ago

wikipedia-edit-stream

Listen for page edit notifications from Wikipedia IRC and push them into a MongoDB collection to use as a test dataset.

All changes to WikiMedia Foundation installations of MediaWiki publish a message per change event to a corresponding IRC channel given a language and project name, e.g. en.wikipedia, fr.wikipedia, en.wikibooks, etc:

npm.io

To test various pieces of the MongoDB ecosystem, we need datasets of all shapes and sizes, which makes the freely available, high volume change data from WikiMedia extremely useful as we can deploy new releases and configurations of MongoDB and start putting it under the real-world pressures instead of synthetic micro-benchmarks or machine generated datasets.

npm.io

Configuration

The following customizations are available by setting environment variables.

MONGODB_URL MongoDB deployment to persist changes to, e.g. mongodb://username:password@hostname:port/db.

MONGODB_COLLECTION Collection to populate Default: edits.

LANGUAGE Two letter language code of the WikiMedia project Default: en.

PROJECT The WikiMedia project id to listen to Default: wikipedia.

Deploy Your Own

npm.io

CLI

npm i -g wikipedia-edit-stream mongodb-runner cross-env;
mongodb-runner start --name=wikipedia --port=27018;
cross-env MONGODB_URL=mongodb://localhost:27018/wikipedia wikipedia-edit-stream;

License

Apache 2.0