1.0.0 • Published 8 years ago

xschema v1.0.0

Weekly downloads
1
License
Unlicense
Repository
github
Last release
8 years ago

xschema

The xschema library is a high performance data processing and validation library based on an extensible model/schema builder. It allows to build a schema that can be then used to process and validate any kind of JavaScript data (the root variable can be object, array, or any other primitive type). The library is designed for critical areas where the performance is important and even a minor overhead at validation side can cause service delays. The data validation and processing has been moved into extreme by using a JavaScript code-generator that generates the best possible data processing and validating functions for any user-defined schema. The most used JavaScript engines today have built-in JIT compiler so the code generated by xschema is then compiled by the VM into a machine code that will execute very fast and outperform all JavaScript data processing libraries that don't use such technique.

The performance is not the only aspect and feature offered by xschema. The library has been designed in a way that it should be very straightforward to define a schema, to reuse or inherit the existing one, and to create your own types that will extend the built-in functionality. There is no library that could satisfy all possible needs and use-cases so the possibility to extend the library is important. The library provides the most important types as full-featured built-ins.

The schema structure is always declarative and most of the schemas can be serialized back to JSON (xschema calls it a normalized JSON). The library also allows to associate a custom information called metadata with any field. Metadata is completely ignored by xschema library, but other tools can take advantage of it (for example you can associate a SQL table names with your entities and use them in your DB layer).

Additionally, xschema has several data processing options that help to deal with common problems like implementing data insertion, updating, deletion, and querying. Processing options can also be used to filter out objects' properties that are not defined (useful when extracting information from request's body or from more objects mixed together) and to accumulate all validation errors to have complete report of the validation.

Disclaimer

The xschema library has been designed to solve common, but also very specific problems. It's very fast and the support for metadata allows to simply extend it by new features. All built-in features are used in production and you will find many of them handy when implementing web services that do CRUD operations, because a single schema can be used to validate data that is inserted, updated, queried, or deleted. The library has been designed to be very fast, but is also very complete and configurable.

Introduction

the xschema library uses a declarative approach to build schemas, but it comes with its own syntax instead of relying on existing solutions like JSONSchema. The main reason for such move was to simplify the way schemas are defined by introducing shortcuts and directives that start with $ character. Shortcuts are used to simplify declaration of the most common concepts (for example an array of integers can be written as $type: "int[]") and directives are used to configure the type itself. Object's members are always defined without a $ prefix, but it's possible to define also members that start with $ by escaping it as \\$ (escaping and schema normalization is explained later).

var PersonSchema = xschema.schema({
  firstName  : { $type: "text", $maxLength: 64 },
  lastName   : { $type: "text", $maxLength: 64 },
  dateOfBirth: { $type: "date", $leapYear: false },

  active     : { $type: "bool" },
  score      : { $type: "int", $min: 0 },
  keywords   : { $type: "text[]" },
  bashrc     : { $type: "string", $maxLength: 4096 },

  address: {
    line1    : { $type: "text" },
    line2    : { $type: "text" },
    city     : { $type: "text" },
    zip      : { $type: "text" },
    country  : { $type: "text" }
  }
});

The example above defines a schema called PersonSchema, which is an object holding properties of various types specified by $type directive. Careful readers have noticed that the root object and nested address object have omitted the $type directive. It automatically uses object if no $type is provided, which allows to remove some verbosity in the schema declaration. Other directives like $min, $max, $leapYear, ..., are used to configure the type itself.

Confused by string vs text type? Well, string is any string in JavaScript in contrast to text, which is a string that doesn't contain \u0000-\u0008, \u000B-\u000C, and \u000E-\u001F characters. These characters have special meaning and in many cases their presence in your application's data is unwanted and may be dangerous.

Confused by [] suffix in keywords member? It's a xschema shortcut that defines an array, which can also be defined by using array type like this:

var KeywordsSchema = xschema.schema({
  $type: "array",
  $data: {
    $type: "text"
  }
});

var KeywordsSchema = xschema.schema({
  $type: "text[]"
});

Both schemas defined above are equivalent and internally normalized into the same structure.

Data Processing Concepts

The library comes with two base concepts that are used to work with data.

  • xschema.process(...) is a concept used to create a new data based on existing data. It's very useful in cases that more entities are mixed together in a single object and you need to separate/extract their content into independent objects. This happens for example in a request-body object. Data processing does not just validate the input data, but it can also sanity it before creating the output. If configured, you can trim/simplify input text, remove unknown properties, or insert fields having default values if they are not present.

  • xschema.test(...) is a concept used to test whether the given data conforms to the schema, but without using sanitizers.

Data Processing Options

Several data processing options exist that allow to use a single schema for multiple purposes. The default xschema.kNoOptions specifies no options and the schema is processed in a default way (i.e. strict mode).

The additional options are used to control:

  • Extraction Mode - Options xschema.kExtractTop, xschema.kExtractNested, and xschema.kExtractTop are used to control data extraction. Data extraction means extracting only specified properties from objects that can contain more properties that are not defined in the schema. It's useful when extracting parameters from a request body or to extract data that is known by schema, but without failing in cases that there is something that is not not known.

  • Delta Mode - Option xschema.kDeltaMode can be used to force nearly all properties to be optional. This is used in cases that you allow delta updates, but you still require some properties that specify DB keys to be present.

  • Error Accumulation - Option xschema.kAccumulateErrors is used in case that you want to get all errors that happened during data processing, but just the first one.

Data extraction options:

  • xschema.kExtractTop - Extract from top-level object only.
  • xschema.kExtractNested - Extract from nested object(s) only.
  • xschema.kExtractAll - Combination of xschema.kExtractTop and xschema.kExtractNested, which results in extraction from any "object".

TODO

Built-In Data Types

The following data types are built-in:

Type and AliasesJS TypeDescription
anyanyAny type (variant)
bool, booleanbooleanBoolean
double, numbernumberDouble precision floating point number
numericnumberDouble precision, used to distinguish from number when specifying a DB columns
int8number8-bit signed integer
uint8number8-bit unsigned integer
int16, shortnumber16-bit signed integer
uint16, ushortnumber16-bit unsigned integer
int32number32-bit signed integer
uint32number32-bit unsigned integer
int, integernumber53-bit signed integer (safe integer)
uintnumber53-bit unsigned integer (safe integer)
lat, latitudenumberLatitude value (-90...90)
lon, longitudenumberLongitude value (-180...180)
charstringString containing exactly 1 character
stringstringAny string
textstringA restricted multi-line string
textlinestringA restricted single-line string
bigintstringA string that contains a 64-bit integer
colorstringA color values specified by hash "#RRGGBB" or a CSS name
macstringMAC address
ipstringIPV4 or IPV6 address
ipv4stringIPV4 address
ipv6stringIPV6 address
datestringDate
datetimestringDate and time without milliseconds
datetime-msstringDate and time with milliseconds (ms)
datetime-usstringDate and time with microseconds (μs)
mapobjectMap type
objectobjectObject type, default if $type is not specified
arrayarrayArray type

Any

Any $type is specified as any.

Any type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$allowedany[]nullArray of values that are allowed. Any type allows to put anything into the $allowed array. If an array or object is put in there a deep comparison will be performed to verify if the input data conforms to it

Boolean

Boolean $type is specified as bool or boolean.

Boolean type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$allowedbool[]nullArray of boolean values that are allowed. This is useful to restrict the value to be always true or false, but it does nothing if the array is empty or both true and false values are specified.

Number and Integer

Number type $type is specified by the following type names and properties:

Type and AliasesMinimum ValueMaximum ValueDescription
double, numberNoneNoneDouble precision floating point
numericNoneNoneNumeric value (alias to double, but can be used to distinguish between double and numeric in case of describing DB schema)
int8-1281278-bit signed integer
uint802558-bit unsigned integer
int16, short-327683276716-bit signed integer
uint16, ushort06553516-bit unsigned integer
int32-2147483648214748364732-bit signed integer
uint320429496729532-bit unsigned integer
int, integer-9007199254740991900719925474099153-bit signed integer, matches Number.isSafeInteger() behavior
uint0900719925474099153-bit unsigned integer, matches Number.isSafeInteger() behavior
lat, latitude-9090Latitude (double precision)
lon, longitude-180180Longitude (double precision)

Number type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$allowednumber[]nullArray of numbers that are allowed. If this directive is used it cancels all directives that specify minimum, maximum, or any other number related constraints
$minnumbernullMinimum value (the number has to be greater or equal than $min)
$maxnumbernullMaximum value (the number has to be lesser or equal than $max)
$minExclusivenumbernullMinimum value is exclusive
$maxExclusivenumbernullMaximum value is exclusive
$divisibleBynumbernullThe number has to be divisible by this value (without a remainder)

Character

Character $type is specified as char and it's a string that has length equal to one (that is, one character long string).

Character type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the char can be an empty string
$allowedchar[]nullArray of characters that are allowed

String and Text

String $type is specified as string, text, or textline. If type string is specified any JavaScript string passes, however, if text type is specified the validator only passes if the string doesn't contain \u0000-\u0008, \u000B-\u000C, and \u000E-\u001F characters. Use text to disallow these characters that have special meaning and are in most cases unwanted (especially the \u0000 character). The textline type restricts text from using line and paragraph delimiter characters.

String/Text type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptybooltrueSpecifies if the string can be an empty
$allowedstring[]nullArray of strings that are allowed. If this directive is used it cancels all directives related to string length validation, except $empty directive, which always applies, regardless of other constraints
$lengthnumbernullExact string length
$minLengthnumbernullMinimum string length
$maxLengthnumbernullMaximum string length
$reRegExpnullRegular expression

BigInt

BigInt $type is specified as bigint. BigInt is a string that contains only ASCII digits (characters from 0 to 9) and an optional minus sign at the beginning. It allows to validate whether the number represented as a string doesn't overflow 64 bits and also allows to set a possible minimum and maximum value (also as string). BigInt can also be configured to allow more than 64-bits by using $min and $max directives, described below.

BigInt type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the string can be an empty
$allowedstring[]nullArray of strings that are allowed. If this directive is used it cancels $min and $max constraints
$minstringnullMinimum value (as string)
$maxstringnullMaximum value (as string)

Optionally, you can use xschema.misc.isBigInt(s, min, max) function to check whether a string value matches BigInt with optional min and max constraints. This function doesn't require a schema instance.

Color

Color $type is specified as color. Color is a string that matches #RGB, #RRGGBB or color-name format. It supports all color names that are defined by CSS specification and allows to include a dictionary having extra color names that you need to allow. Color names are case-insensitive by default.

Color type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the string can be an empty
$cssNamesbooltrueSpecifies if CSS color names are allowed
$extraNamessetnullA set (dictionary having key: true) that contains extra color names that are allowed

Credit Card

Credit card $type is specified as creditcard. It checks whether the string is a valid credit card number by using a LUHN algorithm. The validator doesn't accept dashes or any other characters used as separators.

Credit card type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the string can be an empty

ISBN

ISBN $type is specified as isbn. It checks whether the string is a valid ISBN number.

ISBN type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the string can be an empty
$formatstring""Specifies the ISBN format to accept. The default value null (or alternatively "") is used to accept any valid ISBN number. To restrict to a particular format use "10" or "13".

MAC Address

MAC address $type is specified as mac. MAC address is a string in form XX:XX:XX:XX:XX:XX that specifies a network MAC address.

MAC address type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the string can be an empty
$separatorchar:Specifies separator used between MAC address components

IP Address

IP address $type is specified as ip, ipv4, or ipv6. IP address is a string specifying a network IP address. The ip type matches both IPV4 and IPV6 addresses, in contrast to ipv4 and ipv6 types, that match IPV4 and IPV6 addresses, respectively.

IP address type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the string can be an empty
$allowPortboolfalseSpecifies id port is also allowed

UUID

UUID $type is specified as uuid. UUID validator is used to check whether the string contains a valid UUID number, that can be optionally surrounded by curly braces.

UUID type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the string can be an empty
$formatstring"rfc"Specifies the UUID format. It can be "rfc" to accept UUIDs in a "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" format, "windows" to accept UUIDs surrounded by curly brackets, or "any" to accept either RFC or WINDOWS format. If not specified or null "rfc" format is used.
$versionstringnullSpecifies the version of UUID to accept. Version is a string from "1" to "5". It can contain an optional "+" sign (like "3+") to accept UUIDs of a particular version and all newer.

DateTime

DateTime $type is specified as date, datetime, datetime-ms, and datetime-us. It's a formatted string that contains date, time, or date+time components. The validator extracts these components and validates whether they are correct. Leap years and leap seconds support is built-in and can be configured through directives.

DateTime defaults:

Date TypeFormatDescription
dateYYYY-MM-DDDate only
datetimeYYYY-MM-DD HH:mm:ssDate+time
datetime-msYYYY-MM-DD HH:mm:ss.SSSDate+time+ms
datetime-usYYYY-MM-DD HH:mm:ss.SSSSSSDate+time+μs

DateTime format options:

Format OptionFixed LengthRangeDescription
Yfalse1-9999Year (1-4 digits)
YYtrue00-99Year (2 digits)
YYYYtrue0001-9999Year (4 digits)
Mfalse1-12Month (1-2 digits)
MMtrue01-12Month (2 digits)
Dfalse1-31Day (1-2 digits)
DDtrue01-31Day (2 digits)
Hfalse0-23Hour (1-2 digits)
HHtrue00-23Hour (2 digits)
mfalse0-59Minute (1-2 digits)
mmtrue00-59Minute (2 digits)
sfalse0-60Second (1-2 digits)
sstrue00-60Second (2 digits)
SSStrue000-999Millisecond (3 digits)
SSSSSStrue000000-999999Microsecond (6 digits)
?trueAny other character requires exact match of that character, for example -, /, ., ,, etc...

DateTime type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$emptyboolfalseSpecifies if the string can be an empty
$formatstringnullSpecifies date+time format, see format options above
$leapYearbooltrueSpecifies whether to allow leap year date
$leapSecondboolfalseSpecifies whether to allow leap second date+time

Map

Map $type is specified as map. It's an object where all keys are strings and all values have the same type specified by $data directive.

Map type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the map can be null
$dataobjectnullSpecifies the schema of all map values.

Object

Object $type is specified as object or can be omitted completely. Object is a special type that allows to specify its members by using unprefixed keys (keys that don't start with "$"). For example the following specifies an object that has mandatory members (keys) "a" and "b":

var Schema = xschema({
  a: { $type: "int" },
  b: { $type: "int", $optional: true }
});

// These will pass.
xschema.test({ a: 1       }, Schema);
xschema.test({ a: 1, b: 2 }, Schema);

Object's directives and members can be mixed in the same definition, for example the following schema defines an object that has a nested object, which is optional:

var Schema = xschema({
  nested: {
    $optional: true,

    a: { $type: "int" },
    b: { $type: "int", $optional: true }
  }
});

// These will pass.
xschema.test({                        }, Schema);
xschema.test({ nested: { a: 1       } }, Schema);
xschema.test({ nested: { a: 1, b: 2 } }, Schema);

TODO

Object type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null

Array

Array $type is specified as array or by [] suffix (shortcut).

Array type directives:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the array can be null
$lengthintnullSpecifies an exact length of the array.
$minLengthintnullSpecifies the minimum length of the array.
$maxLengthintnullSpecifies the maximum length of the array.
$dataobjectnullSpecifies the schema of all array items.

Custom Types

TODO

Global Directives

Global directives can be applied to any $type:

Directive NameValueDefaultDescription
$nullboolfalseSpecifies if the value can be null
$fnfunctionnullA user-defined validation function. It should return true or "" on success and false or "ErrorCode" on failure. The function is always called on a processed object, if posible. For example, if an object is transformed to object containing less members, the user-function will be called with that object, not the input one.