0.7.0 • Published 7 years ago

couchit v0.7.0

Weekly downloads
2
License
-
Repository
github
Last release
7 years ago

Couchit

Couchit is a database iterator with tools to validate and manage documents in a CouchDB database.

Couchit runs a set of user-defined JavaScript functions against all documents in a CouchDB database or view, or only some of them by specifying a start and/or an end key(s). Couchit comes with a built-in set of Utility Functions to perform helpful operations such as generating hashes, performing json-schema validation, and storing objects from iteration for post-processing.

Installation

Command Line Interface (CLI)

Couchit can run as a stand-alone application. Install it via npm:

npm install -g couchit

Then, to run:

couchit ./config.js

As a module

Couchit can also run as a module in another program. First, add it to your projects package.json

npm install --save couchit

Then, include it in your program:

const Couchit = require('couchit');
const config = require('./config.js');
new Couchit().iterate(config, console.log);

Configuration

Couchit can be controlled by environment variables or a config file.

The preferred way is through environment variables. This allows Couchit to run without storing sensititve information in code. It will be useful when setting Couchit to run as an AWS Lambda, for example.

Configuration Settings

The following list all configuration settings. If no environment variables are set, the Default Value will be used.

SettingDescriptionDefault Value
COUCHDB_ENDPOINTURI and port of CouchDB server, (does not include http://)'localhost:5984'
COUCHDB_DATABASECouchDB database'db'
COUCHDB_USERNAMECouchDB user'couchdb'
COUCHDB_PASSWORDCouchDB password'couchdb'
OPTS_INTERVALNumber of ms to wait between page requests100
OPTS_START_KEYView start keynull
OPTS_END_KEYView end keynull
OPTS_PAGE_SIZENumber of documents to retrieve per batch1000
OPTS_NUM_PAGESNumber of pages to retrieveundefined
OPTS_BATCH_SIZEBatch update size100
OPTS_QUIETSuppress report at end of runfalse
OPTS_TASKSJavaScript functions with tasks to run{ "count-docs": (util, doc) => { util.count('total-docs') } }

You can use a combination of environment variables and config.js Default Values to run Couchit. Just remember that environment variables always override defaults.

Setting an environment variable in Windows Powershell

$env:COUCHDB_PASSWORD="couchdb"

Setting an environment variable in macOS/Linux bash

export COUCHDB_PASSWORD="couchdb"

Tasks

Tasks are how you validate and manage documents as they are iterated over. There are a number of built-in Utility Functions that can be used by calling their util method, for example, to get a document hash: util.hash(doc). Additional document functionality is provided via util.nano, which exposes nano document functions. See the dependent-updates task below for an example of how nano document functions can be used.

Utility Functions

On each document iteration, the following functions are available via the Util() object:

FunctionDescription
auditAdd an object to the audit array, which is returned in the callback
countIncrement a counter associated with a particular key
dereferenceReference external json-schema file definitions from other files
logAlias for console.log
hashGenerate a SHA256 hash for a given document, object, or string
nanoExposes nano document functions
removeDelete the document from the database
saveSave the document back to the database
validateValidate a document using json-schema
incrementWaitsSet a wait. Used to ensure asyncronous process can complete for callback response
decrementWaitsRemove a wait

Example Tasks

You can define any number of named tasks to run, which will be run once per document. The following are examples of common tasks that may be useful to adopt for your needs.

Count all documents:

This is a trivial example where a counter (total-docs) is incremented for each document retrieved.

{
    "count-docs": (util, doc) => util.count('total-docs'),
}

Audit data for later user:

By using util.audit(), you can store any object for use after processing has completed. This is useful for initiating a post-processing step that is based on the output of Couchit run.

{
    "audit-bad-docs": (util, doc) => {
        if (doc.status && doc.status === 'bad') {
            const object = { bad_doc_id: doc._id, status: doc.status }
            util.audit(object);
        }
    }
}

Calculate the hash of part of a document:

By using util.audit(), you can store any object for use after processing has completed. This is useful for initiating a post-processing step that is based on the output of Couchit run.

{
    "hash-doc-contents": (util, doc) => {
        const hash = util.hash(doc);
        util.log('SHA256 hash of doc: ' + hash);
    }
}

Validate a schema:

Determine if a document is valid based on a json-schema specification. This task uses ajv for validation.

{
    "validate-schema": (util, doc) => {
        const schema = require('./test/schema.json');
        const data = require('./test/data.json');
        const valid = util.validate(schema, data);
        util.log(doc._id + ' is valid? ' + valid);
    }
}

Update a parent document by checking for existence of child docs:

Note that nano functions run asyncrounously while documents are interated over. To ensure complete stats and (optional) audit object tracking, use the util.incrementWaits() to set a wait prior calling the async function and util.decrementWaits() to remove the wait upon completion.

{
    "dependent-updates": (util, doc) => {
      if (doc.childKeys) {
        const keys = doc.childKeys;

        // Set a wait
        util.incrementWaits();

        // Do a bulk get for all childKeys
        util.nano.fetch(keys, (err, result) => {
          if (err) {
            console.log(err);
          } else {
            const newChildKeys = [];
            const rows = result.rows;
            const hasMissingChildKeys = false;

            // Only add docs to the newChildKeys that were found in the db
            rows.forEach(row => {
              if (row.doc) {
                newChildKeys.push(row.id);
              } else {
                hasOrphans = true;
                util.log('Missing child doc: ' + row.id);
              }
            });

            if (hasMissingChildKeys) {
              doc.childKeys = newChildKeys;
              util.nano.insert(doc, (err, result) => {
                console.log(result);

                // Remove a wait
                util.decrementWaits();
              });
            }
          }
        });
      }
    }
}

Credits

  • Couchit is based on Couchtato; thanks to Cliffano Subagio for his work.