1.0.0-beta.4 • Published 2 years ago

node-bitcask v1.0.0-beta.4

Weekly downloads
-
License
ISC
Repository
-
Last release
2 years ago

Node-Bitcask node_version npm_version

Whats bitcask, you ask?

"Bitcask^1 is an Erlang application that provides an API for storing and retrieving key/value data using log-structured hash tables that provide very fast access. The design of Bitcask was inspired, in part, by log-structured filesystems and log file merging." - riak docs

node-bitcask is a open source NodeJS implementation of the proposed storage engine. It is a log structured hash table.

Log structured: Data structure that allows append operations only, to utilise the sequential speeds of traditional mechanical hard drives.

Hash tables: In memory Key-Value pairs with O(1) read and write complexity.

node-bitcask allows asynchronous and synchronous operations on such a log structured hash table, where values are md5 verified, to store and provide data accurately. It also features resilience to power outage with snapshots, vertical scalability, space optimization by periodic compaction.

Installation


Install via npm:

npm install node-bitcask

Instantiation


Using node-bitcask is extremely simple. node-bitcask can be imported with ES5 require().

const nb = require('node-bitcask');
nb.put("zebra", "an African wild animal that looks like a horse, with black or brown and white lines on its body");
nb.get("zebra");

API


Inserting data

Data can be simply stored with:

put(key, data, callback)

key is unique entity String that can be used to refer back to the data. Put returns void, and asynchronously stores the data.

Note: putSync(key, data) is also available for synchronous put operation.

Accessing data

To get back your data use:

get(key, callback)

get asynchronously find key referenced data and on success, provides the data to given callback. In case no data is found (maybe due to deleted key, incomplete storage, power-outage, bugs etc) callback will be invoked with null argument. Note: getSync(key) is also available for synchronous get operation.

Exporting the database

To export the database essential files

exportDataSync(newLogFileDir, newKVFileDir)

logFile is an essential file which contains the entire data of the database. kvFile keeps a snapshot of the in memory kv-store, just in case a power-outage occurs, or anything else goes wrong. exportsDataSync accepts two fs.PathLike arguments which are used to copy the data to. It synchronously copies all the data to given paths.

Importing previous database

To import previously exported data use:

importDataSync(oldLogFileDir, oldKVFileDir)

oldLogFileDir and oldKVFileDir are paths to where kvfile and logfiles were copied to. importDataSync first synchronously copies the data to its desired directory. After the copying succeeds the entire database is reconstructed from these files.

Deleting a log

deleteLog(key)

Deletes the key. Note: after deletion, the data may still exist in the logfile for a small duration.

Deleting the database

To delete all the data and the KV store use:

nb.unload()

Checking if key exists

To check if nb contains a undeleted key, use:

nb.contains(key)

returns true if key is present in nb.

Checking if empty

nb.isEmpty()

returns true if nb is empty. This can be due no log operation executed, or every key is deleted.

Getting total keys count

nb.size()

Returns an Integer which equals to the count of active keys in nb.

Iterating over the keys

nb allows reading sequentially over all the active keys.

const keys = nb.keys()
for(let key of keys){
    console.log(nb.getSync(key));
}

will iterate over all keys in nb and read them synchronously

Configuration


node-bitcask allows for greater configuration of where data is stored, when to do compaction, etc. Configuring node-bitcask is as simple as:

const nb = require("node-bitcask");
nb.configure({
    dataDir: "./some/arbitrary/folder",
    kvSnapshotPath: "./path/to/file",
    backupKVInterval: 1000, //in ms
    compactionInterval: 10000, //in ms
})

You can omit any key that you dont want to configure, and its value will stay to default.

Some special notes


  • Old time fans will remember there used to be Stream api for reading and writing large data with putStream and getStream. Sadly these features had to go, as hashing this large stream of data wouldn't be possible.

  • async fs operations are handled by thread pool in nodejs. So performance of these operations will depend on the count of CPU cores in the system, and their respective speed. i.e. A single logical cpu core can do one fs operation at a time, so 4c/8t cpu will handle 8 fs operations concurrently.^4

  • Why not cache frequently accessed keys? Good observation, still cache system like redis will be helpful only if it is run on another physical machine. Running it concurrently will affect the file system access performance of node-bitcask, since -as mentioned already- nodejs file system operations utilize thread pool, one less cpu thread will marginally drop the performance of node-bitcask. Integrating node-bitcask on a completely different stand alone hardware might be better choice, always.

Known Issues


  • Sync read and writes will fail during compaction. Please prefer using async variations of both until future updates.
  • Some writes between creating snapshot and powerloss/SIGTERM will be available in logfile, but no references will be written in snapshot.
  • Current storage format for data can be improved.

Changelogs


Version 1.0.0-beta.4

  • Optimised write performance when consecutively writing to the same key asynchronously. Also decreased fs access for the same. Prefer using async operations.
  • Faster reads when reading immediately after async write.
  • Fix sync operations during compaction.
  • Closing file descriptor opened during getSync()

Version 1.0.0-beta.3

  • Compaction fix for data exceeding writeStream highwatermark(1kb).
  • Queueing async get() and log() until compaction ends for consistency.
  • Fix deletion for already deleted keys.
  • Added helpfull utils like isEmpty(), keys(), contains(key), size() for better manipulation and querying.
  • Persisting garbage collectible information, so compaction can be efficient on imported database.
  • Known issues section in README.

Version 1.0.0-beta.2

  • Removed stream api.
  • Added md5 checksums (crypto module) for verifying the integrity of written data.
  • Replace setTimeout with setInterval for compaction.
  • Fixed WriteStream used internally for compaction would not emit close event, also replace listener from close to drain event.
  • Convert basic operations to asynchronous operations.
  • Added synchronous functions.
  • Added changelogs to README
  • Some more minor fixes.

Version 1.0.0-beta.1

  • Fixed undefined instance variables of node-bitcask resulted in failure of most operations
  • Removed test operations from index.js which would get executed on importing node-bitcask.
  • Handled file doesn't exist.
  • Made internal variables and function private.
  • Removed most of the debug logs

References


  1. Bitcask - Riak Docs. ^1
  2. Designing Data Intensive Applications by Martin Kleppmann.(came across bitcask reading this one, great book) ^2
  3. Bitcask Paper. ^3