wikipedia-edit-stream v0.1.0
wikipedia-edit-stream
Listen for page edit notifications from Wikipedia IRC and push them into a MongoDB collection to use as a test dataset.
All changes to WikiMedia Foundation installations of MediaWiki publish
a message per change event to a corresponding IRC channel given a language
and project name, e.g. en.wikipedia
, fr.wikipedia
, en.wikibooks
, etc:
To test various pieces of the MongoDB ecosystem, we need datasets of all shapes and sizes, which makes the freely available, high volume change data from WikiMedia extremely useful as we can deploy new releases and configurations of MongoDB and start putting it under the real-world pressures instead of synthetic micro-benchmarks or machine generated datasets.
Configuration
The following customizations are available by setting environment variables.
MONGODB_URL
MongoDB deployment to persist changes to, e.g. mongodb://username:password@hostname:port/db
.
MONGODB_COLLECTION
Collection to populate Default: edits
.
LANGUAGE
Two letter language code of the WikiMedia project Default: en
.
PROJECT
The WikiMedia project id to listen to Default: wikipedia
.
Deploy Your Own
CLI
npm i -g wikipedia-edit-stream mongodb-runner cross-env;
mongodb-runner start --name=wikipedia --port=27018;
cross-env MONGODB_URL=mongodb://localhost:27018/wikipedia wikipedia-edit-stream;
License
Apache 2.0
7 years ago