0.0.2 • Published 8 years ago

object-serializer v0.0.2

Weekly downloads
3
License
Unlicense
Repository
-
Last release
8 years ago
WARNING: This is so far hardly tested, and is also subject to change the
API and/or file format which is not currently stable. Should not be in
use except for experimental purposes and for testing.

This module exports a constructor function which should be called by using
the new operator. It takes two arguments, the internal version (default 0)
and the user version (default -1).

The internal version is the number of standard types to implement, and is
a unsigned 16-bit number. It will be written to the output, and when read
back it will be an error if it does not match. The standard types
implemented in this way must be taken from a prefix of the following list:
Array, ArrayBuffer, Date. Additional standard types include: Map, RegExp,
WeakMap, WeakSet.

The user version is a signed 16-bit number and can be used for whatever
purpose you want to use it for. It will be written to the output and when
read back it will be an error if it does not match.

There is no asynchronous operation; all operations are synchronous. It is
designed that it might be used to save the state of a roguelike game or
other standalone game, or possibly for a MUD state; not for realtime
applications. (If it were asynchronous, then the values could be changed
while it is working and that would make a mess of the serialization.)


=== Instance properties ===

These are properties that serializer instances have and that can be used
because the prototype for serializer instances have some of them too.
Note: External values/types have to be defined in the same order for
serializing as for unserializing, otherwise it won't work.

.defineObject(obj)
  Defines the given object or symbol as an external value which the
  serialized data can reference.

.defineStandardType(obj)
  If obj is a key in the standard type list, then defines obj.prototype as
  an external type with the standard implementation function.

.defineType(obj,fn)
  Given obj which is a prototype object and fn which is the implementation
  function, defines an external type, using the given function for
  serializing and unserializing objects with the given prototype.

.internalVersion
  The internal version number.

.serialize(stream,value)
  Serialize the given value using the given streaming function.

.unserialize(stream)
  Unserialize using the given streaming function, and then it returns the
  value that has been read.

.userVersion
  The user version number.


=== External types ===

An implementation of an external type is a function that takes three
arguments. The first argument is the context (either a reading context or
a writing context) to use. The second argument is the object to write; in
the case of reading, it is an empty object with the correct prototype,
which you might or might not use. The third argument is the set function.

The set function does nothing during writing. During reading, it replaces
the empty object it created with the object specified as its argument;
your function also needs to return that same object. If the object to be
read is not the same object this function was given, then it is necessary
to call the set function before returning or calling the .key or .value or
.properties methods of the context object.

This function returns the object read/written.


=== Stream functions ===

When specifying the stream function for serialize/unserialize, you can
also specify a number or a non-callable object.

If a number is given, it is assumed to be a file descriptor number.

Serializing treats a non-callable object as a Node.js writable stream, and
it will cause it to call the write method of that object.

Unserializing treats a non-callable object as a Buffer, ArrayBuffer, or
typed array. It reads from that buffer starting at the beginning.

The stream function takes one argument which is a Buffer instance. If
serializing it should write out the contents of that buffer, and if
unserializing it should read data into that buffer (using its full size).

The return value of a stream function is irrelevant and is not used. (If
you explicitly provide your own stream function, it is possible to use its
return value for something in an external type implementation; I do not
see how that can be useful, but maybe you have a use for it.)


=== Static properties ===

The properties directly of the object exported by this module are:

.ReadingContext(owner,stream)
  A function that is the constructor for a reading context, where owner is
  a serializer instance and stream is a reading stream function.

.WritingContext(owner,stream,root)
  A function that is the constructor for a writing context, where owner is
  a serializer instance and stream is a writing stream function.

.prototype
  The prototype for serializers.

.standardTypes
  A WeakMap of standard types. The keys are functions which have a
  property called "prototype" designating the object which is the
  prototype for this standard type, and the values are functions
  which are called to implement this standard type. (It does include
  RegExp even though that is not in the list of automatics.)


=== Reading/writing contexts ===

The following properties exist on reading/writing context instances or on
the prototype for them. Such instances will be passed as the first
argument to a function for implementing external types.

Most functions will return the value read/written; most will ignore the
argument when it is a writing context. There are some special cases.

.buffer(buf)
  Read/write the given buffer (a Node.js Buffer instance) and returns the
  buffer. For reading only, it can also be a number which is how many
  bytes to read; it returns a new buffer.

.float32(x)
  Read/write a 32-bit floating point number.

.float64(x)
  Read/write a 64-bit floating point number.

.int16(x)
  Read/write a signed 16-bit integer.

.int32(x)
  Read/write a signed 32-bit integer.

.int8(x)
  Read/write a signed 8-bit integer.

.integer(x)
  Read/write a signed 32-bit integer using a variable representation. If
  most of the numbers are small but there are some large numbers too, then
  this results smaller file size than using int32.

.key(x)
  Read/write a key, which is any string or symbol. It will keep track of
  any keys previously used to shorten further uses of them, as well as to
  use a special case for nonnegative integer keys. If you write int8(0)
  and then try to read it with key() you will get null as the result.

.owner
  The serializer that this context belongs to.

.properties(obj,omitkeys)
  Read/write properties of obj (which must be specified even for reading),
  excluding those listed in omitkeys (ignored during reading). The
  omitkeys, if specified, is an object whose own keys (the prototype is
  ignored) are keys that should not be written (probably because they were
  already written by an external type implementation function). Returns
  obj (the object whose properties are read/written).

.queue(fn)
  Enqueue a function to be executed after the main value is finished. You
  can also enqueue during an enqueued function, and it will execute after
  all other enqueued functions are finished. This is used in the internal
  implementation of serialization of weak sets/maps, although you can also
  use it in your own external type implementations. The function enqueued
  is not given any arguments.

.reading
  True for reading contexts, or false for writing contexts.

.root
  Only for writing contexts; it is the root value being serialized.

.stream(buf)
  The stream function.

.string(x)
  Read/write a string. It does not have to be a valid Unicode text; any
  sequence of 16-bit characters can be used.

.uint16(x)
  Read/write a unsigned 16-bit integer.

.uint32(x)
  Read/write a unsigned 32-bit integer.

.uint8(x)
  Read/write a unsigned 8-bit integer.

.value(data)
  Serialize or unserialize any value. (Note: The format is different than
  using functions like .integer or .string; .value uses a different header
  than the other functions (some of which use no header).)

.writing
  True for writing contexts, or false for reading contexts.


=== File format ===

For proper specification of file format you must look at the program, and
I am sorry if this document is incomplete or incorrect.

The file starts with a header of two small-endian 16-bit numbers; first
the internal version number and then the user version number. Immediately
after this header is the value to be serialized.

A value is stored as a mode byte, possibly followed by other data
depending on the contents of the mode byte. The mode byte is split in two
nybbles. The high nybble can be:

[0] Short value
  No data follows. Low nybble specifies exact value:
  0 = undefined
  1 = null
  2 = false
  3 = true
  4 = +0
  5 = NaN
  6 = ""
  7 = -0
  8 = +Infinity
  9 = -Infinity
  10 = A new empty array (saved)
  11 = +1
  12 = +2
  13 = +3
  14 = A new symbol (saved)
  15 = -1

[1] Object (not using external types)
  The low nybble specifies what prototype should be used:
  0 = null
  1 = Use a value that follows
  2 = Use default prototype (Object.prototype)
  3-15 = An external value
  If 1, then another value follows before the property list.
  If 3-15, then the external value number modulo 13 is used and is 0-12,
  and a varint follows which is 31 less than the quotient (rounded down).
  After any extra bytes needed to define the prototype, the property list
  follows (described below). The new object is saved before reading
  anything else (including the prototype value if applicable).

[2] External value
  An external value, identified by a typeid.

[3] Saved value
  Access a previously saved object or symbol which has been created during
  the unserialization. Identified by a typeid, where 0 means the first
  saved value, 1 is the second saved value, and so on.

[4] String of 8-bit characters
  A typeid which is one less than the number of characters, followed by
  the characters as one byte each.

[5] String of 16-bit characters
  A typeid which is one less than the number of characters, followed by
  the characters which are each unsigned small-endian 16-bit numbers.

[6] Signed 12-bit integer
  A signed 12-bit integer in big-endian format. The low nybble and the
  next byte together form the number.

[7] Signed 20-bit integer
  A signed 20-bit integer in big-endian format. The low nybble and the
  next two bytes together form the number.

[8] Signed 32-bit integer
  The low nybble is always zero. Follow by a 32-bit integer in big-endian
  format. If the low nybble isn't zero, reading a value in an external
  type implementation throws the value of the mode byte; this applies for
  both major types 8 and 9.

[9] Floating point number
  The low nybble is always zero. Follow by a 64-bit floating point number
  in big-endian format.

[10-15] Objects with external types
  The external type is identified by a typeid; the typeid is multiplied by
  six, and then add the high nybble of the mode byte and subtract ten. The
  data that follows depends on the definition of the external type.

A typeid consists of the low nybble of the mode byte and may be followed
by additional bytes. It is always an unsigned integer. If bit3 of the mode
byte is clear then no additional bytes follow; the value is the low 3-bits
of the mode byte. If bit3 is set then there are one or three more bytes
(one if bit2 is clear, three if bit2 is set). If one extra byte then the
low 2-bits of the mode byte is multiply by 256, add the value of the extra
byte, and then add 8 more. If three extra bytes then it is a big-endian
26-bit unsigned integer.

A varint represents any signed 32-bit integer. If bit7 of the first byte
is set then the actual value is the bitwise complement of the rest of the
encoded value. The bit6 and bit5 tell the size of the remaining data, and
the low 5-bits are the low 5-bits of the encoded number. Specification by
bit6 and bit5 of first byte is programmed as follows:

[00]
  No more bytes follow.

[01]
  Has one byte following, which is eight more bits (bit12-bit5) of the
  resulting number.

[10]
  Has two bytes following which is a big-endian 16-bit number; multiply by
  32 and add to the other number.

[11]
  Has three or four bytes following. If the high bit of the first
  following byte is set then only three bytes; otherwise all four bytes.
  In either case it is the remaining higher bits of the number as
  big-endian, but the high bit of the first following byte isn't any part
  of it.

A varstring consists of a varint followed by the data. If the number is
positive then it is a length of the string in 8-bit characters. If the
number is negative then the bitwise complement of that number is the
length of the string in small-endian 16-bit characters.

A property list consists of pairs of keys and values, terminated by a zero
byte. A key is encoded as listed below (as hex bytes):

[01-7F] Existing keys
  Access one of the first 127 existing keys.

[80-D7] Short numeric
  Make a numeric key 0 to 87.

[D8] Long string
  Follow by a varstring with is the key. It is saved in the list of
  existing keys if the length is nonzero.

[D9] New symbol
  Make a new symbol and save it in the list of existing values (not in the
  list of existing keys).

[DA] Existing symbol
  Follow by a varint which is 31 less than an existing value number; this
  value is expected to be a symbol (not an object).

[DB] Long numeric
  Follow by a varint. The numeric key is 128 more than that number.

[DC] Symbol from external value
  Follow by a varint which is 31 less than an external value number; this
  value needs to be a symbol (not an object).

[DD-DF] Long existing keys
  Access an existing key. It is accessed by (id-0xDD)+3*(varint+31) where
  it is a zero-based existing key number.

[E0-FF] Short string
  Make a string of length from 1 to 32 characters and store it in the list
  of existing keys if the length isn't 1. String consists of 8-bit
  characters only.