1.0.0 • Published 5 years ago
daval v1.0.0
daval
experimental document storage w/ messagepack, ajv, redis & elasticsearch
intro & summary
- good parts
- redis in-memory storage provides speed
- redis single-threadness + transactions provides ACID compliance
- redis disk storage provides persistence
- messagepack allows low memory & disk consumption for redis
- elasticsearch provides powerful search capability
- trade-offs / gotchas
- messagepack trades speed for memory & disk capacity
- but lets you store all JS data types in redis
- fixstr, str8/16/32
- positive fixint, negative fixint
- uint8/16/32/64, int8/16/32/64, float32/64
- true, false, undefined, NaN, +Infinity, -Infinity
- arrays, objects, nested arrays, nested objects
- buffers, arraybuffers, typedarrays
- dictionary support further reduces size
- messagepack
what-the-pack
module buffer is8KB
by default- you can set it up to
1GB
which is already an overkill
- you can set it up to
- redis trades durability for speed
- redis
appendfsync
iseverysec
by default - you can set
appendfsync
toalways
to reverse trade-off
- redis
- elasticsearch trades data consistency for search capability
- elasticsearch
index.refresh_interval
is1s
by default
- elasticsearch
- messagepack trades speed for memory & disk capacity
- ideal use cases
- you want fast ACID-compliant data updates
- you want elasticsearch search capability
- you can tolerate
1s
data inconsistency between redis in-memory and disk storage - you can tolerate
1s
data inconsistency between redis and elasticsearch - you have data updates that need to reflect in search immediately
ok
- you have data updates that does not need to reflect in search immediately
ok
- you have data that requires some schema
ok
- you have data that does not require schema
ok
- you have data that needs to exist in elasticsearch
ok
- you have data that does not need to exist in elasticsearch
ok
- todo
Type
- make use of ajv schemas optional
ok
- provide option to index in elasticsearch
ok
- allow
Query
to check ifType
is in elasticsearchok
- make use of ajv schemas optional
Entity
&Transactions
- redis & elasticsearch calls try-catch retry
planned
- redis & elasticsearch calls try-catch retry
Logging
- local file
planned
- local db
planned
- third-party db
planned
- local file
setup
- spin up redis (4.x and up) instance
- spin up elasticsearch (6.x and up) instance
- add module
$ yarn add daval
- create client instance
const Client = require('daval');
const client = new Client(
// redis config
{
host: '127.0.0.1',
port: 6379,
password: 'password'
},
// elasticsearch config
{
host: 'localhost:9200'
}
);
const { Type, Entity, Transaction, Query } = client;
- optional configurations
const Client = require('daval');
// for details:
// https://www.npmjs.com/package/what-the-pack
// 16.8 MB buffer
Client.MessagePack.reallocate(2 ** 24);
// register 'name' word in dictionary
Client.MessagePack.register('name');
// initialize instance here..
Type class
- constructor (label
String
, useElasticBoolean
)label
gets transformed into lowercaseuseElastic
is eitehrtrue
orfalse
- useSchema (schema
Object
)schema
must be a validajv
json schema- returns self
- example
const User = new Type('User', true)
.useSchema({
"properties": {
"name": {
"type": "string",
},
"age": {
"type": "number",
"minimum": 25,
}
}
});
Entity class
- constructor (t
Type
, idString
)t
must be instance ofType
id
must be unique for eachEntity
- upsert (data
Object
, refreshBoolean
)async
data
must pass this entity'sType
schema if schema exists- absence of this entity's
id
generates auuidv4
id - if
refresh
===true
, client awaits update visibility in search before returning
- merge (data
Object
, refreshBoolean
)async
data
must pass this entity'sType
schema if schema exists- absence of this entity's
id
throws an error - uses
lodash.merge
to merge existing and newdata
- if
refresh
===true
, client awaits update visibility in search before returning
- exists ()
async
Boolean
- absence of this entity's
id
throws an error
- absence of this entity's
- fetch ()
async
Object
- fetched
data
must pass this entity'sType
schema if schema exists - absence of this entity's
id
throws an error
- fetched
- delete (refresh
Boolean
)async
- absence of this entity's
id
throws an error - if
refresh
===true
, client awaits update visibility in search before returning
- absence of this entity's
- example
const alice = new Entity(User);
await alice.upsert({ name: 'alice' });
console.log('id', alice.id);
// ie. d14a4dc9-e19c-48bf-94b5-1d820c7566d0
await alice.merge({ age: 25 });
console.log('exists', await alice.exists());
// true
console.log('fetch', await user.fetch());
// { name: 'alice', age: 25 }
await alice.delete();
console.log('exists', await alice.exists());
// false
console.log('fetch', await alice.fetch());
// undefined
Transaction class
- with (...entities
Entity
)Transaction
entities
are instances ofEntity
class- returns
Transaction
, for chaining
- run (fn
Function
, refreshBoolean
)async
fn
is a function accepting single-parametere
fn
can also be anasync
function- parameter
e
is a function (e
stands for entity) - used as
e(x)
wherex
is an instance ofEntity
e(x)
returns an object containing the entity's data, which can be modified- if
fn
gracefully returns (regardless of return value), all modifications to thee(x)
objects will be committed - all data fetched are validated with schema if schema exists
- all data modifications are validated with schema if schema exists
- if
refresh
===true
, client awaits update visibility in search before returning
- example
const alice = new Entity(User);
const bob = new Entity(User);
await alice.upsert({ name: 'alice' });
await bob.upsert({ name: 'bob' });
await new Transaction()
.with(alice, bob)
.run(async (e) => {
e(alice).age = 25;
e(bob).age = 26;
});
console.log({
alice: await alice.fetch(),
bob: await bob.fetch()
});
// {
// alice: { name: 'alice', age: 25 },
// bob: { name: 'bob', age: 26 }
// }
Query class
- constructor (...t
Type
)t
areType
instances to include in search
- from (offset
Integer
)offset
is amount of records to offset
- size (amount
Integer
)amount
is amount of records to return
- sort (field
String
, directionString
, modeString
)field
is name of field to sortdirection
must beasc
ordesc
mode
must bemin
,max
,sum
,avg
, ormedian
- can be called multiple times to stack multiple sorts
- range (field
String
)- returns
Object
with the following methods - gt(value
Number
) - greater than - gte(value
Number
) - greater than or equal - lt(value
Number
) - less than - lte(value
Number
) - less than or equal
- returns
- matchAll ()
- matches all documents
- matchNone ()
- matches no documents
- scroll (duration
DurationString
, scrollIdString
)duration
, ie.30s
,1m
,1h
,1d
scrollId
is used in scroll continuation
- sourceFilter (...fields
String
)- selects / specifies the field(s) to return
- term (field
String
, valueString
)- finds documents that contain the exact term specified
- terms (field
String
, valuesString
)- filters documents that have fields that match any of the provided terms
- run ()
async
- returns an
Object
with the following propertiesscrollId
ids
entities
data
hitsRetreived
hitsTotal
- returns an
- all methods aside from
run()
allows chaining
Query coverage
- No description / code for non-supported parts.
Search API
Request Body
from
- sets offset.from(10)
size
- sets amount of records to return.size(10)
sort
- sorts by field, ascending or descending.sort('age', 'asc')
.sort('age', 'desc')
.sort('age', 'desc', 'avg')
scroll
- retrieve large numbers of results from a search request.scroll('10m', scrollId)
source filtering
- specifies fields to return.sourceFilter('name', 'age')
suggesters
count
validate
explain
profiling
Query DSL
Full text queries
(partial support)match
- filters fields with values.match('name', 'josh')
.match('age', 25)
match_phrase
match_phrase_prefix
multi_match
common
query_string
simple_query_string
Term level queries
(partial support)term
- finds documents that contain the exact term specified.term('name', 'alice')
.term('name', 'bob')
terms
- filters documents that have fields that match any of the provided terms.terms('tags', 'Horror', 'Comedy')
.terms('tags', 'Urgent')
terms_set
range
- greater than / less than filters.range('age').gt(25)
.range('age').gte(25)
.range('age').lt(25)
.range('age').lte(25)
exists
prefix
wildcard
regexp
fuzzy
type
ids
Compound queries
(no support)constant_score
bool
dis_max
function_score
boosting
Joining queries
(no support)nested
has_child
has_parent
Geo queries
(no support)geo_shape
geo_bounding_box
geo_distance
geo_polygon
Specialized queries
(no support)more_like_this
script
percolate
wrapper
Span queries
(no support)span_term
span_multi
span_first
span_near
span_or
span_not
span_containing
span_within
field_masking_span
Misc
(partial support)match_all
- matches all documents.matchAll()
match_none
- matches no documents.matchNone()
minimum_should_match
multi term query rewrite
exposed clients
- redis
Object
ioredis
client- https://www.npmjs.com/package/ioredis
- elastic `Object
elasticsearch
client- https://www.npmjs.com/package/elasticsearch
- clients are exposed for complex calls
notes
- on elasticsearch, we use type's label as 'index' and 'type' value, because:
- es 7.x onwards will get rid of mapping types
- it's currently recommended to use same 'index' and 'type' value in latest es 6.x
- throwing errors within transactions effectively aborts it
- the
Query
class covers the basic query functionality ofGoogle Cloud Datastore
- set search offset
- set search size (amount of results to return)
- set search scroll id
- specify which fields to return
- sort items in ascending and descending
- filter items with exact field values
- filter items with
greater than
,less than
,greater than or equal
, andless than or equal
- filter items with array fields containing specific values
- Difference between
match
andterm
- The match query analyzes the input string and constructs more basic queries from that.
- The term query matches exact terms.
- If you have a document containing "CAT" and search for "cat" the match query will find it but the term query won't. That is, if you lowercase in your analysis config which it does by default.
- 1,000 documents at 1 KB each is 1 MB, 1,000 documents at 100 KB each is 100 MB
- on indices locked by storage, unlock with:
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
external references
- redis client
- elasticsearch client
- messagepack module
- redis installation
- elasticsearch installation
license
MIT | @davalapar
1.0.0
5 years ago