0.1.0 • Published 7 years ago

endeo v0.1.0

Weekly downloads
3
License
MIT
Repository
github
Last release
7 years ago

endeo

Build Status Dependency Status npm version Coverage Status

Encode and decode objects, arrays, strings into bytes.

endeo => en code + de code = o bject

The majority of encode and decode work is done by packages enbyte and debyte. Their perspective is about the values they're given to encode and decode. For example, enbyte encodes {} as EMPTY_OBJECT and endeo encodes it as [OBJECT, TERMINATOR] (at top-level).

The endeo package has the over-arching perspective of encoding and decoding values in sequence and handling streaming. It's possible to use enbyte and debyte directly for a variety of uses. My main focus is on endeo and providing all the features.

I've separated the parts into their own packages so they can be used standalone and, so they can be replaced with custom implementations in an endeo instance. Also, they have separate github repo's so they have their own issues/PR's.

The various parts can be developed and released separately.

See packages:

  1. endeo
  2. enbyte
  3. debyte
  4. unstring
  5. @endeo/decoder
  6. @endeo/bytes
  7. @endeo/types
  8. @endeo/input
  9. @endeo/output
TODO: TCK

I have an @endeo/tck package under development. It's a "test compatibility kit" with data to use when testing an implementation of endeo's encoding to ensure it works properly.

The above packages have tests to ensure they work. When I finish the TCK then they will also be tested with that.

The TCK will allow alternate implementations to ensure they adhere to the spec. The alternate implementations may be written in other node community languages such as TypeScript. Or, it could be in other languages such as Java and Go. I plan to make other language implementations, eventually.

TODO: objen

I have in development a package which reads an "object spec" from a JSON file.

I'm also going to make one which reads a Google Protocol Buffers ".proto" file to make a "creator function" and "enhancers" for an "object spec". This will allow using "proto" files with endeo to output endeo style encoding (not protobuf encoding).

Install

# when using the standard implementations, use aggregator.
# endeo-std depends on all the standard implementations.
# usually, this is the one to use.
npm install --save endeo-std

# when specifying custom components.
# only use this when *replacing* some standard implementations.
npm install --save endeo

Table of Contents

A. Simplified Examples

  1. encode -> buffer -> decode
  2. encode/decode via transform streams

B. Progressively Enhanced Use

  1. generically encode any object
  2. reduce string bytes
  3. define an "object spec"
  4. add "object spec" enhancers
  5. specify types

C. API

  1. builder/constructor
  2. add() an object spec
  3. create an input/output
  4. encode() to buffer
  5. encoder() transform
  6. decode() from buffer
  7. decoder() transform

D. Vocabulary

E. Encoding Specification

  1. Indicator Byte
  2. Specifier Byte
  3. Compression Savings

F. Comparison

  1. Feature Table
  2. Compression Table

G. MIT License

A. Simplified Examples

A1. encode -> buffer -> decode

var result = endeo.encode({ key1: 123, key2: 'test' })

// result will have an `error` property if it error'd.
// has `buffer` when successful
var buffer = result.buffer

var object = endeo.decode(buffer, 0)

A2. encode/decode via transform streams

var encoder = endeo.encoder()
var decoder = endeo.decoder()

// for show (we wouldn't do this...)
encoder.pipe(decoder)

// add 'error' and 'data' events on either stream.
// encoder outputs Buffer.
// decoder outputs objects (object/array/String).

// this will be encoded into a Buffer,
// piped to decoder,
// and decoder will decode and push() it.
encoder.write({ key1: 123, key2: 'test' })

B. Progressively Enhanced Use

B1. generically encode any object

Any object can be encoded without special requirements.

var object = getSomeObject()

var buffer = endeo.encode(object)

// or, knowing it's an object use the specific function:
var buffer = endeo.object(object)

// or, via the encoder transform:
var encoder = endeo.encoder()
encoder.write(object)

B2. reduce string bytes

Replace strings with ID's to reduce bytes required.

An unstring instance handles which strings are replaced. It has configurable restrictions controlling which strings it auto-learns. It also accepts strings at creation.

Provide expected strings for object keys and values.

// the keys are always strings,
// these values are strings.
var object = {
  key1: 'value1',
  key2: 'value2'
}

// add 2 of the 4 strings to unstring
endeo.unstring.add('key1', 'value1')

// when encoding, key1/value1 are replaced with ID's.
var buffer = endeo.encode(object)
// or, directly:
var buffer = endeo.object(object)

// Note, if the unstring's restrictions allow the other 2 strings
// to be "auto-learned" then they will be encoded as strings
// only this first time to tell the receiver to learn the string.
// subsequent times they'll be replaced with ID's as well.

B3. define an "object spec"

Configure endeo with an "object spec" for a known object key structure. Then it completely avoids sending keys.

Also, the "object spec" defines default values for the keys so default values are reduced to a single byte meaning "default".

An "object spec" is easily defined with an object. The keys map to the default values.

The object is provided from a function I'll call a "creator" function. It has multiple uses. Providing the object to define the "object spec" is one. It's also used at decoding time to create a new object to fill with the decoded values.

The "object spec" is remembered in endeo and retrieved when decoding. When encoding, it's provided to objectWithSpec() or embedded on the object with key $ENDEO_SPECIAL. A convenience method, spec.imprint() helps hide the property on an object or set it into a class's prototype.

// the creator function:
function createThing() {
  // create a new instance each time:
  return {
    key1: null,    // default = null
    key2: 123,     // default = 123
    key3: 'string' // default = 'string'
  }
}

// add the "object spec"
var spec = endeo.add(createThing)

// our object has the same keys.
var object = {
  key1: 'one',
  key2: 12345,
  key3: 'string'
}

// encode with the spec:
var buffer = endeo.objectWithSpec(spec, object)

// Or, embed the spec:

// manual imprint:
object = {
  $ENDEO_SPECIAL: spec,
  key1: 'one',
  key2: 12345,
  key3: 'string'
}

// imprint via spec:
spec.imprint(object)

// class prototype embedding:
spec.imprint(MyThing.prototype)
object = new MyThing(1, 2)

// now, both encode() and special() will find the spec
// and do the special encoding.
buffer = endeo.encode(object)
// Or, go right to encoding a special object
buffer = endeo.special(object)

B4. add "object spec" enhancers

Providing enhancers will alter how an "object spec" is used. Enhancers:

  1. type - A type is a pre-defined enhancer. an @endeo/specials instance may be trained with types so the creator can reference them by name. Or, they can be specified in an enhancer's type property.
  2. encode - A custom encode function to use for the value instead of analyzing it to determine how to encode it. Specifying this will speedup encoding by avoiding value analysis. It can also allow a custom byte encoding for the value. The function params are (enbyte, value, output). See enbyte and @endeo/output.
  3. decode - A custom decode function. If you specify a custom encode() then use this to specify its decode().
  4. decoderNode - This is for streaming decode via @endeo/decoder. Encoding combines streaming (chunk encoding) and "put it all in one buffer" by using @endeo/output to handle that. The @endeo/input doesn't do the same for decoding. So, there is decode() for "i have it all in one buffer ready to decode" and decoderNode for @endeo/decoder when streaming. See stating to understand how the node should be implemented. The function params are (control, nodes, context) as for all stating nodes.
  5. select - when a key's value is always one from a set of values then provide the values in an array to the select property. In the creator function set the value of the key to the default value, as usual. This will encode the index of the value instead of the value itself.

How to specify enhancers:

function createThing() {
  return {
    custom1: null,
    custom2: null,
    fruit: 'orange'
  }
}

// get some pre-defined types:
var types = require('@endeo/types')

// teach our @endeo/specials about the 'day' type.
endeo.specials.addType('day', types.day)

// only provide enhancers for the keys you want to enhance.
var enhancers = {
  // let's say custom1 is always an int fitting within 2 bytes.
  // use the pre-defined 2 byte int type.
  // it has a custom encode().
  custom1: types.int2,

  // let's say custom2 is a date, without time component.
  // use the pre-defined 'day' type.
  // reference it via its name because we added it to specials.
  custom2: 'day',

  // fruit will be a select of a few fruits:
  // specify as { select: [ ... ] }, or, shortcut:
  fruit: [ 'apple', 'banana', 'orange', 'kiwi' ]
}

// now add the "object spec" with both creator and enhancers
var spec = endeo.add(createThing, enhancers)

var object = {
  custom1: 12345,
  custom2: new Date(2001, 2, 3), // March 3rd, 2001.
  fruit: 'banana'
}

// and encode it as shown previously.
var buffer = endeo.objectWithSpec(object, spec)

// the `custom1` value will always be two bytes and
// encoding will happen without analyzing it to
// determine it's a number, an int, an int requiring 2 bytes.
// so, faster.

// the `custom2` value will be encoded using 4 bytes.
// 2 for year, 1 for month, 1 for day-of-month.
// that's less than Date's usual 24 bytes when
// encoding it as '2001-02-03T00:00:00.000Z',
// or 7 bytes for the Date.getTime() number.

// the `fruit` value will be encoded as `1` because
// 'banana' is at index 1 in the fruit enhancer's array.
// if the value was 'orange' then it'd be encoded as
// DEFAULT.

B5. specify types

When encoding endeo must analyze each value to determine how to encode it.

Specify every value's type in an "object spec" via enhancers to avoid all analysis work.

For common values use pre-defined types in @endeo/types.

Int values are a bit tricky. Endeo will use the fewest bytes needed to convey an int. It does this via analysis as well. If you know your int will always fit into a certain number of bytes then you may specify its type and it will always use the same number of bytes. This means a value which could have been encoded with less bytes will still use that larger number of bytes. And, if the value did exceed what can be conveyed with that number of bytes then its value will be mangled. You decide which way to go.

For complex values you may provide custom types as described above in #4.

For inner objects you can put them in the creator function's returned object if their key structure is consistent. Making it part of the "object spec". For varying keys the inner part will be encoded via generic object encoding (with unstring support) unless you provide a custom type. Note, it's also possible for inner objects to be special objects with their own "object spec".

When all values of a special object have a custom type enhancer with an encode() then encoding will proceed without any value analysis.

C. API

C1. builder/constructor

Endeo exports a builder function which calls the constructor.

It accepts options to configure the inner components or replace them entirely with custom implementations.

Build with Standard Implementations

Basic build will try to use "standard implementations" available both individually and conveniently with package endeo-std.

Install the packages:

# individually:
npm install --save @endeo/bytes enbyte debyte unstring @endeo/specials @endeo/decoder

# via aggregator:
npm install --save endeo-std

Standard component implementation packages:

  1. unstring - string cache for sending ID's instead of the strings
  2. @endeo/specials - builds "object specs"
  3. @endeo/bytes - the byte markers
  4. enbyte - does the majority of encode work
  5. debyte - does the majority of decode work
  6. @endeo/decoder - transform for streaming decode work
var buildEndeo = require('endeo')

// to build with the standard implementations:
// do: npm install -S endeo-std
// and no custom options.
var endeo = buildEndeo()

// customize standard implementations:
endeo = buildEndeo({
  // customize unstring:
  // only tell it strings to use
  strings: [ 'some', 'strings' ]
  // Or, give an entire options object to unstring:
  unstringOptions: {
    // see the unstring package for all its options
    strings: [ 'same', 'thing' ],
    min: 2,
    max: 100
  },

  // customize @endeo/specials with types to start with:
  types: {
    // see @endeo/specials, and @endeo/types for an example
    some: { /* type */ }
  }

  // customize enbyte:
  // it receives the unstring instance and bytes.

  // customize debyte:
  // it receives the unstring, bytes, and specs

  // customize @endeo/decoder:
  // it receives: bytes, specs, types, unstring, and unstringOptions.

  // add object specs:
  specs: [
    /* some "object spec" instances, see @endeo/specials */
  ]
})

Build with Custom Implementations

Customize every inner component with an alternate implementation:

var buildEndeo = require('endeo')

var endeo = buildEndeo({
  Input: /* builder: function(buffer, index, options) */,
  Ouput: /* builder: function(writer, target) */,
  unstring: /* duck-typed unstring instance */,
  bytes: /* byte values object, see @endeo/bytes */,
  specials: /* duck-typed @endeo/specials instance */,
  enbyte: /* duck-typed enbyte instance */,
  debyte: /* duck-typed debyte instance */,
  encoder: /* function returns transform instance */,
  decoder: /* function returns transform, see @endeo/decoder */
})

C2. add() an object spec

Endeo uses an @endeo/specials instance to build an "object spec". Then, it retains them in an array and refers to them by their ID which is the index into that array.

A receiving endeo must have the same specs so the ID's map to the right specs.

An "object spec" can be built with nothing more than a simple object (returned by a "creator" function).

It may also have "enhancers" which augment its operations.

A "special object" with the "object spec" "imprinted" on it may be given to encode(), object(), and special() to encode it. Both encode() and object() will test for the spec and find it. The special() will get the spec from the imprinted property and error if it's not there.

Here's how to use a "creator function" and then how to make encoding use the "object spec":

// a "creator" function builds a new object with
// the keys mapped to their default values.
function thing() {
  return {
    key1: 12345, // 12345 is now default value
    key2: null   // null is default value
  }
}

// teach it about the special object.
// not providing "enhancers" (arg 2)
var spec = endeo.add(creator)

var myThing = {
  key1: 2468,
  key2: 'test'
}

// this would encode `myThing` as a generic object:
var buffer = endeo.encode(myThing)
// Or:
buffer = endeo.object(myThing)

// and this errors because spec is missing:
buffer = endeo.special(myThing)

// ! provide the spec in one of 4 ways:

// 1. as an arg
buffer = endeo.objectWithSpec(myThing, spec)

// 2. by imprinting it
spec.imprint(myThing)
buffer = endeo.encode(myThing)
// Or:
buffer = endeo.special(myThing)

// 3. by setting it in object at creation time
myThing = {
  $ENDEO_SPECIAL: spec,
  key1: 2468,
  key2: 'test'
}

// 4. by imprinting it on a class's prototype
// imagine the extra stuff to make this a class is done.
function MyThing(key1, key2) {
  this.key1 = key1
  this.key2 = key2
}

spec.imprint(MyThing.prototype)
myThing = new MyThing(2468, 'test')

buffer = endeo.encode(myThing)
// Or:
buffer = endeo.special(myThing)

Here's how to provide "enhancers" for the "object spec":

// assume we have thing() function from above as the creator.

// grab some pre-defined types:
var types = require('@endeo/types')

// teach endeo the one type we're going to use below.
// this allows referring to it via its name.
endeo.specials.addType('int2', types.int2)

// "enhancers" is an object mapping the "object spec" keys
// to extra info.
var enhancers = {
  // specify a pre-defined type by name:
  key1: 'int2',
  // Or, provide the type directly:
  key1: types.int2,

  // specify type here:
  key2: {
    // optional custom encode:
    encode: function (enbyte, value, output) {},

    // optional custom decode:
    decode: function (debyte, input) {},

    // optional custom decoderNode (for @endeo/decoder):
    decoderNode: function (control, N) {},

    // must be used exclusively, can't use others with this.
    // creates encode() decode() which uses index to refer
    // to which one is the value.
    select: [
      'some', 'values', 'to', 'choose', 'one', 'of'
    ],

    // optional, this will *combine* the info here with
    // the named pre-defined type.
    // these values override the one referenced.
    type: 'someType'
  }
}

// then provide it as the second arg to add().
var spec = endeo.add(thing, enhancers)

C3. create an input/output

Both input and output help with working with a buffer. Output goes beyond that and helps output buffer chunks for streaming or combine them all into a single buffer for the final result.

An output may be reused in sequential encode operations. Endeo helps build one with a convenience method output(). It will use the standard implementation unless endeo was built with a custom one.

// this output will build up buffer chunks as encoding progresses.
var output = endeo.output()

// get all the chunks in one buffer:
var result = output.complete()
// result either has `error` or `buffer`
var buffer = result.buffer

// or, have the output written to a stream:
var output = endeo.output(writable.write, writable)
// Or, a transform:
var output = endeo.output(transform.push, transform)

// chunks will be sent as they fill up.
// flush out remaining content using the same function:
output.complete()

// control the chunk sizes:
output.size = 2048

// Note, written chunk size may vary when output decides
// to send a non-full chunk because it has a large value
// to send as its own chunk.
// You probably won't ever notice, but, I want to mention it
// in case someone decides to do something based on the idea
// the chunk size will always being the same. It won't.

// also, you may choose to *not* call output.complete() after
// giving it something to encode. Later encoding operations
// will fill the chunk and send it, eventually.
// be sure to at least call it once when you're done encoding
// everything.

An input helps track where in the buffer the decode operation is at and extracts values. It may be reused by calling reset() with a new buffer and index. Endeo helps build one with a convenience method input(). It will use the standard implementation unless endeo was built with a custom one.

// provide a buffer and the index to start at.
var input = endeo.input(buffer, 0 /* , options */)

// you may provide the buffer/index via properties in the
// options (3rd arg).
// I originally had the options as the only arg,
// but, it seemed a waste when I was always using buffer/index
// all the time.
// so, they are now the first two args.
// I maintain an options 3rd arg in case someone wants to
// put the new buffer/index in an object to pass on to
// another place which receives that and then creates an Input
// (or resets one) with it.
var input = endeo.input(null, null, {
  buffer: someBuffer, index: 0
})

// reset the Input with a new buffer/index:
input.reset(newBuffer, 0)

C4. encode() to buffer

Endeo can encode a value into a single Buffer via multiple "entry points".

The "entry points":

  1. encode() - encodes any "top level" value (object, array, string)
  2. object() - encodes any object, special or generic
  3. objectWithSpec() - encodes only "special objects" with the provided "object spec"
  4. special() - encodes only a "special object" with an "imprinted" "object spec"
  5. array() - encodes an array
  6. string() - encodes a string
// encode()
endeo.encode({ some: object})
endeo.encode([ 'some', 'array' ])
endeo.encode('some string')

// object()
endeo.object({ generic: 'object' })
endeo.object({ special: 'object, (imprinted)' })
endeo.object({ // manual imprint:
  $ENDEO_SPECIAL: spec,
  /* key/values */
})

// objectWithSpec()
endeo.objectWithSpec({ some: 'object' }, spec)

// special()
endeo.special({ special: 'imprinted' })
endeo.special({ // manually imprinted
  $ENDEO_SPECIAL: spec,
  /* key/values */
})

// array()
endeo.array([ 'some', 'array' ])

// string()
endeo.string('some string')

All the above "entry points" create an output to gather all the chunks and provide the result as a single buffer.

To output the chunks to a stream use decoder() as decribed below in C7. decoder() transform. Or, create your own output and call the inner versions of the "entry points":

var output = endeo.output(stream.write, stream)

endeo._encode(value, output)
endeo._object(value, output)
endeo._objectWithSpec(value, spec, output)
endeo._special(value, output)
endeo._array(value, output)
endeo._string(value, output)

// these all call output.complete() when they're done.
// the chunks are sent to the stream so the result
// returned is:  { success: true }

// you may continue to reuse `output`.

C5. encoder() transform

Endeo makes it easy to stream. The encoder() creates a new Transform you can use in pipelines or write to directly.

The encoder calls endeo.encode() to encode the objects it receives.

The @endeo/output then pushes buffer chunks as they fill up.

var encoder = endeo.encoder()

// usual event style:
encoder.on('error', function(error) {
  // ...
})
encoder.on('data', function(buffer) {
  // ...
})

// usual pipe():
source.pipe(encoder).pipe(target)

// or write to it directly:
encoder.write({ some: 'object' })
encoder.write([ 'some', 'array' ])

// a "string" is a "top level" value.
// however, writing a string requires writableObjectMode = false.
// and encoder defaults to writableObjectMode = true.
// so, if you want to use "top level" strings,
// then make an encoder with writableObjectMode set to false:
encoder = endeo.encoder({
  writableObjectMode: false
})

// then:
encoder.write('some string')

C6. decode() from buffer

Endeo can decode a value from a single Buffer via multiple "entry points".

To differentiate from encoding operations prepend 'de' to the names.

The "entry points":

  1. decode() - decodes any "top level" value (object, array, string)
  2. deobject() - decodes any object, special or generic
  3. despecial() - decodes only a "special object" with an ID
  4. dearray() - decodes an array
  5. destring() - decodes a string
var buffer = getSomeEncodedBuffer()
result = endeo.decode(buffer, 0)
result = endeo.deobject(buffer, 0)
result = endeo.despecial(buffer, 0)
result = endeo.dearray(buffer, 0)
result = endeo.destring(buffer, 0)

All the above "entry points" create an input for the buffer and index.

Create your own input and call the inner versions of the "entry points":

var input = endeo.input(buffer, 0)

result = endeo._decode(input)
result = endeo._deobject(input)
result = endeo._despecial(input)
result = endeo._dearray(input)
result = endeo._destring(input)

// you may continue to reuse `input`
// by resetting it with a new buffer/index:
input.reset(newBuffer, 0)

C7. decoder() transform

Endeo makes it easy to stream. The decoder() creates a new Transform you can use in pipelines or write to directly.

The "standard implementation" for decoder is a Transform created by @endeo/decoder. It uses the stating package.

Write, or pipe, Buffer's to the decoder and it will push(), and emit "data" events, with the decoded result.

var decoder = endeo.decoder()

// usual event style:
decoder.on('error', function(error) {
  // ...
})
decoder.on('data', function(result) {
  // ...
})

// usual pipe():
source.pipe(decoder).pipe(target)

// or write to it directly:
decoder.write(someBuffer)
decoder.write(anotherBuffer)

// a "string" is a "top level" value.
// however, push()'ing a string will error when
// writableObjectMode = true.
// at the moment, @endeo/decoder cheats by
// converting a string to a String so it's an object.
// in the future, it will have an alternate solution.

D. Vocabulary

Words and phrases I use while describing endeo stuff:

word/phrasedescription
endeoname of the whole project, the spec, and the primary package
object specknows sequence of keys, their default values, optionally custom operations
special objectan object with an "object spec"
generican object without an "object spec"
creatorfunction returning new object with keys and default values for an "object spec"
enhancerextra information to augment an "object spec" beyond the key and default value
markera byte with specific meaning in endeo encoding, such as ARRAY.
encodera transform stream which accepts objects and outputs Buffer's
decodera transform stream which accepts Buffer's (chunks of one thing, or chunks with multiple things, or partials) and outputs objects (or arrays, or String cuz string isn't an object...)
auto-learnan "unstring" instance may "learn" a new string when asked for its ID if restrictions allow it.
unstringa package which caches strings, has configurable restrictions for auto-learning strings, and reduces bytes sent by replacing strings with their ID
specialsan instance of package @endeo/specials which can be trained with custom types and analyses a "creator" and "enhancers" to produce an "object spec"
imprintan "object spec" may be provided as an arg to objectWithSpec() along with the object value. When I say "imprint" it, I mean either set the spec into the object with key $ENDEO_SPECIAL, use spec's imprint() method to set it on an object, or a class's prototype. The imprint() method sets $ENDEO_SPECIAL as a non-enumerable non-writable property on the target.
standardI'm providing implementations for each part of the endeo work. When I say "standard" I mean these implementations I've made. To allow using endeo with custom implementations the endeo package doesn't have the "standard implementations" as dependencies. To install a single package which depends on all the "standard implementations" use the endeo-std package. It has no code content. It only depends on all the "standard implementations" so they'll be installed. It's a "package aggregator".
top levelendeo considers an object, array, or string to be a "top level" object. A "full chunk" has one of those three.
full chunka group of bytes which can be decoded into an object, array, or string.
entry pointthere are multiple functions to encode and decode. these are "entry points". The encode() and decode() are the most generic "entry points" capable of handling any "top level" value. There are other functions for specific types of "top level" value. When you know the type you may use these to "get right to it".
known stringthe strings in an "unstring" are "known strings" and can be replaced with their ID during encoding.

E. Encoding Specification

TODO: write up the spec. The below is part of it.

E1. Indicator Byte

The first byte of a "top level" value's encoded results is the "indicator byte".

A byte may have a value zero to 255. What do they mean as an "indicator":

byte/rangedescription
0 - 249It's the numeric ID of the "special object" encoded in the following bytes.
250SPECIAL. It's a "special object" and its ID is 250 or greater so read an int to get its ID from the next byte(s).
251OBJECT. It's a generically encoded object. It has a series of string/value pairs followed by a TERMINATOR.
252ARRAY. It's an array. It has a series of values followed by a TERMINATOR.
253STRING. It's a string. It has three forms. [length, bytes], [GET_STRING, id], [NEW_STRING, id, length, bytes].
254 - 255not used. It's the start of something so SUB_TERMINATOR and TERMINATOR aren't valid indicator bytes.

In the future I may use 249 to mean "i'm sending you an object spec to learn". For now, 249 is open for business.

E2. Specifier Byte

The first byte of a value is the "specifier byte". It specifies either the actual value or the info needed to read the following bytes to get the value.

A byte may be from 0 to 255. Here's what they mean as a "specifier byte":

byte/rangedescription
0 - 100represent themselves. 0 is 0. 1 is 1. 100 is 100.
101 - 200represent -1 to -100.
201 - 208represent positive int with a certain number of bytes. 201 means 1 byte. 208 means 8 bytes.
209 - 216represent negative int with a certain number of bytes. 209 means 1 byte. 216 means 8 bytes.
2174 byte floating point number
2188 byte floating point number
219 - 237are unassigned.
238represents a series of 5 default values
239means the next value is an int specifying how many default values to use
240represents the value is the default in the "object spec" for that key.
241represents null
242true
243false
244an empty string, ''
245an empty array, []
246an empty object, {}. Note, at the "top level", an empty object is [OBJECT, TERMINATOR] ([251, 255])
247raw bytes
248next is info for a string "unstring" must learn for later. It provides the ID (int), then the length (int), then the string's bytes.
249next is a string ID for a "known string".
250SPECIAL. It's a "special object" and its ID is 250 or greater so read an int to get its ID from the next byte(s).
251OBJECT. It's a generically encoded object. It has a series of string/value pairs followed by a TERMINATOR.
252ARRAY. It's an array. It has a series of values followed by a TERMINATOR.
253STRING. It's a string. It has three forms. [length, bytes], [GET_STRING, id], [NEW_STRING, id, length, bytes].
254SUB_TERMINATOR marks the end of an inner value (object or array)
255TERMINATOR ends a "top level" value (except a string). SUB_TERMINATOR's are collapsed into a TERMINATOR. So, if some inner things end at the end of the "top level" value then there won't be a series of SUB_TERMINATOR's followed by a TERMINATOR. There will only be the TERMINATOR. Avoids the redundancy.

E3. Compression Savings

Compare encoding a standard JSON string with a length header to various levels of endeo encoding.

{ // the object I used:
  "key1": 1,
  "key2": 257,
  "key3": 65537,
  "key4": 16777217,
  "key5": 4294967297,
  "key6": 1099511627777,
  "key7": 281474976710657,
  "key8": "an unknown string",
  "key9": 'a known string',
  "key9": "orange",
  "key10": [
    "some", "array", 1, 1000, 1000000,
    { "object": "in the array" },
    [ "array", "in the array" ]
  ],
  "key11": {
    "key12": "inner object",
    "key13": 12345,
    "key14": -54321,
    "key15": new Date(2001, 2, 3, 0, 0, 0, 0),
    "array": [ "some", "array", "in inner object", 1, 55555, 99999999 ]
  }
}
#bytessaved% reducedencoding method
140700%JSON.stringify() and buffer.write() length is 4 byte int.
229211528%endeo generic object
320020751%endeo with some strings in unstring for replacement
416823959%endeo special object with only basic spec (keys)
516324460%endeo special with some defaults (key4, key14)
615625162%endeo special with 'day' type and select for key9

Explanation:

  • The first big gain is from using less bytes to encode smaller ints
  • The next big gain is from replacing strings with ID's
  • Then a small gain from avoiding encoding all the main keys a few others (see below)
  • Then a 1% gain by using defaults for "key4" and "key14"
  • then a 2% gain by using custom type for the Date and a "select" for "key9"

Strings Replaced

  • all main keys: "key1" thru "key15"
  • 'a known string'
  • 'array'
  • 'object'

some defaults

  • key4 = 'orange'
  • key14 = -54321

F. Comparison

F1. Feature Table

Compare endeo features versus PSON and Google's protobuf.

PSON focuses on replacing strings with int ID's. It's not trying to have these other features.

Protobuf:

  • leaves streaming to be handled by the dev instead of supporting it directly (PSON too).
  • encodes an ID for each value so ones left out aren't encoded at all versus endeo encoding a NULL or DEFAULT for no-longer-used keys.
  • uses zig-zag int encoding versus endeo's "specifier byte" representing -100 to 100 or designating how many bytes to read for the int. Also, endeo "shifts" the values when stepping up to using another byte because the lowest number represented by that isn't 0, it's 1 more than the largest number represented by the previous number of bytes. This is because, if the value were less, we'd have used less bytes.

TODO: fill in more endeo/protobuf features

feature descriptionendeoPSONprotobuf
reduced int bytes:white_check_mark::white_check_mark::white_check_mark:
replace strings:white_check_mark::white_check_mark::x:
configurable string learn:white_check_mark::x::x:
object definition:white_check_mark::x::white_check_mark:
object def avoids keys:white_check_mark::x::x:
1byte object IDs up to 249:white_check_mark::x::x:
provides streaming:white_check_mark::x::x:
value choice array:white_check_mark::x::white_check_mark:
value choice any default:white_check_mark::x::x:
one-byte defaults:white_check_mark::x::x: (?)
generate classes:x::x::white_check_mark:
multiple languages:x::x::white_check_mark:

Instead of generating classes I prefer the way endeo's "object spec" can be applied to an object during encoding without the object being built/generated. Hooking into a JS class is as easy as imprinting the spec on the prototype. In other languages I see endeo providing the serializer/deserializer implementation hooked into that languages object serialization support, or, working like it does in JS.

Also, it's going to be possible to generate classes from an endeo "object spec".

I plan to support multiple languages.

F2. Compression Table

TODO: do the compression shown in E3 with PSON and protobuf (via a .proto file) and show the bytes used by each.

G. MIT License

0.1.0

7 years ago

0.0.0

7 years ago