1.0.0 • Published 5 years ago

daval v1.0.0

Weekly downloads
-
License
MIT
Repository
github
Last release
5 years ago

daval

experimental document storage w/ messagepack, ajv, redis & elasticsearch

intro & summary

  • good parts
    • redis in-memory storage provides speed
    • redis single-threadness + transactions provides ACID compliance
    • redis disk storage provides persistence
    • messagepack allows low memory & disk consumption for redis
    • elasticsearch provides powerful search capability
  • trade-offs / gotchas
    • messagepack trades speed for memory & disk capacity
      • but lets you store all JS data types in redis
      • fixstr, str8/16/32
      • positive fixint, negative fixint
      • uint8/16/32/64, int8/16/32/64, float32/64
      • true, false, undefined, NaN, +Infinity, -Infinity
      • arrays, objects, nested arrays, nested objects
      • buffers, arraybuffers, typedarrays
      • dictionary support further reduces size
    • messagepack what-the-pack module buffer is 8KB by default
      • you can set it up to 1GB which is already an overkill
    • redis trades durability for speed
      • redis appendfsync is everysec by default
      • you can set appendfsync to always to reverse trade-off
    • elasticsearch trades data consistency for search capability
      • elasticsearch index.refresh_interval is 1s by default
  • ideal use cases
    • you want fast ACID-compliant data updates
    • you want elasticsearch search capability
    • you can tolerate 1s data inconsistency between redis in-memory and disk storage
    • you can tolerate 1s data inconsistency between redis and elasticsearch
    • you have data updates that need to reflect in search immediately ok
    • you have data updates that does not need to reflect in search immediately ok
    • you have data that requires some schema ok
    • you have data that does not require schema ok
    • you have data that needs to exist in elasticsearch ok
    • you have data that does not need to exist in elasticsearch ok
  • todo
    • Type
      • make use of ajv schemas optional ok
      • provide option to index in elasticsearch ok
      • allow Query to check if Type is in elasticsearch ok
    • Entity & Transactions
      • redis & elasticsearch calls try-catch retry planned
    • Logging
      • local file planned
      • local db planned
      • third-party db planned

setup

  • spin up redis (4.x and up) instance
  • spin up elasticsearch (6.x and up) instance
  • add module
$ yarn add daval
  • create client instance
const Client = require('daval');
const client = new Client(
  // redis config
  {
    host: '127.0.0.1',
    port: 6379,
    password: 'password'
  },
  // elasticsearch config
  {
    host: 'localhost:9200'
  }
);
const { Type, Entity, Transaction, Query } = client;
  • optional configurations
const Client = require('daval');

// for details:
// https://www.npmjs.com/package/what-the-pack

// 16.8 MB buffer
Client.MessagePack.reallocate(2 ** 24);

// register 'name' word in dictionary
Client.MessagePack.register('name');

// initialize instance here..

Type class

  • constructor (label String, useElastic Boolean)
    • label gets transformed into lowercase
    • useElastic is eitehr true or false
  • useSchema (schema Object)
    • schema must be a valid ajv json schema
    • returns self
  • example
const User = new Type('User', true)
  .useSchema({
    "properties": {
      "name": {
        "type": "string",
      },
      "age": {
        "type": "number",
        "minimum": 25,
      }
    }
  });

Entity class

  • constructor (t Type, id String)
    • t must be instance of Type
    • id must be unique for each Entity
  • upsert (data Object, refresh Boolean) async
    • data must pass this entity's Type schema if schema exists
    • absence of this entity's id generates a uuidv4 id
    • if refresh === true, client awaits update visibility in search before returning
  • merge (data Object, refresh Boolean) async
    • data must pass this entity's Type schema if schema exists
    • absence of this entity's id throws an error
    • uses lodash.merge to merge existing and new data
    • if refresh === true, client awaits update visibility in search before returning
  • exists () async Boolean
    • absence of this entity's id throws an error
  • fetch () async Object
    • fetched data must pass this entity's Type schema if schema exists
    • absence of this entity's id throws an error
  • delete (refresh Boolean) async
    • absence of this entity's id throws an error
    • if refresh === true, client awaits update visibility in search before returning
  • example
  const alice = new Entity(User);
  await alice.upsert({ name: 'alice' });
  console.log('id', alice.id);
    // ie. d14a4dc9-e19c-48bf-94b5-1d820c7566d0
  await alice.merge({ age: 25 });
  console.log('exists', await alice.exists());
    // true
  console.log('fetch', await user.fetch());
    // { name: 'alice', age: 25 }
  await alice.delete();
  console.log('exists', await alice.exists());
    // false
  console.log('fetch', await alice.fetch());
    // undefined

Transaction class

  • with (...entities Entity) Transaction
    • entities are instances of Entity class
    • returns Transaction, for chaining
  • run (fn Function, refresh Boolean) async
    • fn is a function accepting single-parameter e
    • fn can also be an async function
    • parameter e is a function (e stands for entity)
    • used as e(x) where x is an instance of Entity
    • e(x) returns an object containing the entity's data, which can be modified
    • if fn gracefully returns (regardless of return value), all modifications to the e(x) objects will be committed
    • all data fetched are validated with schema if schema exists
    • all data modifications are validated with schema if schema exists
    • if refresh === true, client awaits update visibility in search before returning
  • example
const alice = new Entity(User);
const bob = new Entity(User);
await alice.upsert({ name: 'alice' });
await bob.upsert({ name: 'bob' });
await new Transaction()
  .with(alice, bob)
  .run(async (e) => {
    e(alice).age = 25;
    e(bob).age = 26;
  });
console.log({
  alice: await alice.fetch(),
  bob: await bob.fetch()
});
// {
//   alice: { name: 'alice', age: 25 },
//   bob: { name: 'bob', age: 26 }
// }

Query class

  • constructor (...t Type)
    • t are Type instances to include in search
  • from (offset Integer)
    • offset is amount of records to offset
  • size (amount Integer)
    • amount is amount of records to return
  • sort (field String, direction String, mode String)
    • field is name of field to sort
    • direction must be asc or desc
    • mode must be min, max, sum, avg, or median
    • can be called multiple times to stack multiple sorts
  • range (field String)
    • returns Object with the following methods
    • gt(value Number) - greater than
    • gte(value Number) - greater than or equal
    • lt(value Number) - less than
    • lte(value Number) - less than or equal
  • matchAll ()
    • matches all documents
  • matchNone ()
    • matches no documents
  • scroll (duration DurationString, scrollId String)
    • duration, ie. 30s, 1m, 1h, 1d
    • scrollId is used in scroll continuation
  • sourceFilter (...fields String)
    • selects / specifies the field(s) to return
  • term (field String, value String)
    • finds documents that contain the exact term specified
  • terms (field String, values String)
    • filters documents that have fields that match any of the provided terms
  • run () async
    • returns an Object with the following properties
      • scrollId
      • ids
      • entities
      • data
      • hitsRetreived
      • hitsTotal
  • all methods aside from run() allows chaining

Query coverage

  • No description / code for non-supported parts.
  • Search API
    • Request Body
      • from - sets offset
        • .from(10)
      • size - sets amount of records to return
        • .size(10)
      • sort - sorts by field, ascending or descending
        • .sort('age', 'asc')
        • .sort('age', 'desc')
        • .sort('age', 'desc', 'avg')
      • scroll - retrieve large numbers of results from a search request
        • .scroll('10m', scrollId)
      • source filtering - specifies fields to return
        • .sourceFilter('name', 'age')
    • suggesters
    • count
    • validate
    • explain
    • profiling
  • Query DSL
    • Full text queries (partial support)
      • match - filters fields with values
        • .match('name', 'josh')
        • .match('age', 25)
      • match_phrase
      • match_phrase_prefix
      • multi_match
      • common
      • query_string
      • simple_query_string
    • Term level queries (partial support)
      • term - finds documents that contain the exact term specified
        • .term('name', 'alice')
        • .term('name', 'bob')
      • terms - filters documents that have fields that match any of the provided terms
        • .terms('tags', 'Horror', 'Comedy')
        • .terms('tags', 'Urgent')
      • terms_set
      • range - greater than / less than filters
        • .range('age').gt(25)
        • .range('age').gte(25)
        • .range('age').lt(25)
        • .range('age').lte(25)
      • exists
      • prefix
      • wildcard
      • regexp
      • fuzzy
      • type
      • ids
    • Compound queries (no support)
      • constant_score
      • bool
      • dis_max
      • function_score
      • boosting
    • Joining queries (no support)
      • nested
      • has_child
      • has_parent
    • Geo queries (no support)
      • geo_shape
      • geo_bounding_box
      • geo_distance
      • geo_polygon
    • Specialized queries (no support)
      • more_like_this
      • script
      • percolate
      • wrapper
    • Span queries (no support)
      • span_term
      • span_multi
      • span_first
      • span_near
      • span_or
      • span_not
      • span_containing
      • span_within
      • field_masking_span
    • Misc (partial support)
      • match_all - matches all documents
        • .matchAll()
      • match_none - matches no documents
        • .matchNone()
      • minimum_should_match
      • multi term query rewrite

exposed clients

notes

  • on elasticsearch, we use type's label as 'index' and 'type' value, because:
    • es 7.x onwards will get rid of mapping types
    • it's currently recommended to use same 'index' and 'type' value in latest es 6.x
  • throwing errors within transactions effectively aborts it
  • the Query class covers the basic query functionality of Google Cloud Datastore
    • set search offset
    • set search size (amount of results to return)
    • set search scroll id
    • specify which fields to return
    • sort items in ascending and descending
    • filter items with exact field values
    • filter items with greater than, less than, greater than or equal, and less than or equal
    • filter items with array fields containing specific values
  • Difference between match and term
    • The match query analyzes the input string and constructs more basic queries from that.
    • The term query matches exact terms.
    • If you have a document containing "CAT" and search for "cat" the match query will find it but the term query won't. That is, if you lowercase in your analysis config which it does by default.
  • 1,000 documents at 1 KB each is 1 MB, 1,000 documents at 100 KB each is 100 MB
  • on indices locked by storage, unlock with:
    • curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

external references

license

MIT | @davalapar

1.0.0

5 years ago