0.5.0 • Published 6 months ago

entrails v0.5.0

Weekly downloads
-
License
SEE LICENSE IN LI...
Repository
-
Last release
6 months ago

entrails

Entrails is a smol library to find what you're looking for in arrays of objects.

Contents

What is this?

It is designed to have a concise, human-readable query language; an extensible predicate, coercion, and variable system; and no dependencies. It is meant to work on arrays of objects with primitive values, but may be extended to work with whatever value types you need.

Here's an example of a query that one might load from yaml (bringing your own yaml loader):

- filter:
  - date on or before $TODAY
  - date.year on or after 2020
  - state not equals draft
  - or:
    - tags includes poems
    - type in poem, poetry
    - any of authors starts with Matt
    - any backlinks:url stars with poetry
- sort: 
  - date desc
  - title asc
- limit: 10

If all that natural language string-y stuff looks too imprecise for you, you can also provide the your query as the data structure the above would be internally compiled to:

{
	filter: [
		{ get: "date", predicate: 'lessThanOrEquals', target: "$TODAY" },
		{ get: "date", call: "year", predicate: 'greaterThanOrEquals', target: "2020" }
		{ get: "state", invert: true, predicate: "equals", target: "draft" },
		{ or: [
			{ get: "tags", predicate: "includes", target: "#poems" },
			{ get: "type", predicate: "in", target: ["poem", "poetry"] },
			{ coll: "any", get: "author", predicate: "startsWith", target: "Matt" }
		]}
	],
	sort: [
		{ get: "date", dir: "desc" },
		{ get: "title", dir: "asc" }
	],
	limit: 10
}

I have felt a need for something like this time and again in my work prototyping interface designs for working with data, and finally decided to coalesce and release the proto-ideas that led to it.

Who this is for

  1. You need a lightweight, extensible query engine that works against modestly-sized in-memory arrays.
  2. You like the yaml/text example above.
  3. You meet the criteria covered by section 2, 3 and 4 of the license.

Who should not use this

  1. You are not dealing with in-memory arrays. Perhaps at some point I'll make a SQL builder from this, but that's not on my roadmap.
  2. You are dealing with very large in-memory arrays and need performance above all else. I haven't done much in the way of performance tuning. I'm not opposed to doing this work but it's not on my roadmap.
  3. You want type-safety above all else. This project was extracted from a use-case of needing a query language that could work in a textarea, against data that had a widely varying shape. For this use case, the equivalency checking provided by javascript's == operator was sufficient; desired, even.
  4. You are not comfortable with the license or do not fall into the group covered under section 2. If you fall under section 2, 3 and 4 and would like to purchase a license for usage, please contact me.

Alternatives

I produced this work only after evaluating many other options for a lightweight language to query in-memory arrays. I found nothing that could meet my criteria.

  • ArrayQuery -- I tried to make this work, and took some inspiration from it. Ultimately I found it inflexible and unintuitive: the way it handles or logic, collections, and nesting all felt very counterintuitive to me.
  • Query -- provides MonogDB-style filtering capabilities. The audience on my project's use case did not undertand this syntax at all, and wanted something more SQL-like. It also doesn't support sorting/limiting.
  • Inquiry -- uses JSONPath for its query language. If you think MongoDB-style queries are confusing, you ain't seen nothin' yet.
  • Crossfilter -- This is more of a database around large arrays, queryable via an API. It's also currently unmaintained. For my use case I would have had to make a DSL for it.
  • Datascript -- More of a database with datalog-style queries. As much as I like the idea of datalog, I find it very verbose in practice and for my use case would have had to retrofit it with many necessary things. Datascript also does not support sorting of materialized results as part of its datalog queries.

Install

This package is ESM only. In Node.js (version 16+), install it with npm:

npm install entrails

API

The default export is entrails. This package exports the following additional identifiers:

entrails(data, query, context)

Perform query on data under context.

Parameters

  • data: (Record<string, any>[]) -- an array containing the objects to sift through
  • query: (InputQuery) -- a record describing the query to perform
  • context: (Context, optional) -- a record containing additional information to evaluate both the query and data

Returns

  • result: (Record<string, any>[]) -- an array of the input objects matching the query criteria

defaultPredicates

An object (Record<string, function>) of the default predicate functions used by the query processor:

  • equals: asserts the leaf value is loosely equal (==) to the target value. Aliases: equal, =
  • exists: asserts the value is not loosely equivalent (==) to null. Aliases: exist
  • greaterthan: asserts the leaf value is > the target value. Aliases: after, greater than, gt, >
  • greaterthanorequals: asserts the leaf value is >= the target value. Aliases: on or after, greater than or equals, greater than or equal to, gte, >=
  • in: asserts the leaf value is included in the target value. Nominally, this assumes the target value is an Array, however it may also be used to test that the leaf is a substring of a target string. No aliases.
  • includes: asserts that the leaf value is either a collection type which contains the target value, or the leaf value is a string for which target is a substring of. Aliases: contains, include
  • lessthan: asserts that the leaf value is < the target value. Aliases: before, less than, lt, <
  • lessthanorequal: asserts that the leaf value is <= the target value. Aliases: on or before, less than or equals, less than or equal to, lte, <=
  • startsWith asserts that the leaf value is a string which starts with the target value. Aliases: starts with, starting with

defaultCollectionPredicates

An object (Record<string, function>) of the default collection predicate functions used by the query processor:

  • any: at least one item in the collection passes the given leaf predicate. Aliases: some
  • all: all items in the collection pass the given leaf predicate. Aliases: every
  • none: no items in the collection pass the given leaf predicate. An inversion of any. Aliases: no

makeFilterLineMatcher(context)

This function is exported for testing and UI feedback purposes.

Parameters

Returns

parseStringFilterLine(line, context)

This function is exported for testing and UI feedback purposes.

Parameters

Returns

parseStringSortLine(line, context)

This function is exported for testing and UI feedback purposes.

Parameters

  • line: (SortStringClause) -- value to parse
  • context: (Context, optional) -- if separator is not provided, will use the default separator value, .

Returns

Definitions

InputQuery

InputQuery is widely variable, but is an object with the following keys, and their values:

Context

Provides additional information when used in evaluating the query. It is an object with the following keys:

  • collectionPredicates: (Record<string, function>) -- custom collection predicate functionS which will be merged over the default collection predicates. These are of the all, any, none vein. Functions receive two arguments collection, checkFn:

    • collection: the items to test validity of
    • checkFn: a function which will determine if an item successfully matches
  • hints: (Record<string, Hint>, optional) -- guidance around how to treat certain get values. The keys are the string representation using the separator of the get path. If the provided get value for a predicate or sort clause is, for example ["author", "name"], with the default separator ., the key should be "author.name".

  • intoSeparator: (string, optional) -- override the default : "into" separator for traversing collection items

  • predicates: (Record<string, function>) -- custom predicate functions which will be merged over the default predicates. Predicate functions should take two arguments -- the first being the leaf value and the secong being target from the query (the string parser expects a target value for all predicates except exists), and should work with string values for the second argument from the string parser.

  • separator: (string, optional) -- override the default . separator with any string which does not contain whitespace, letters, or numbers.

  • variables: (Record<string, any>, optional) -- values for substitutions in target values when performing the query

Hint

a Hint is an object with the following keys:

  • calls: (Record<string, function>, optional) -- used to provide additional "sub"-leaf values on what is otherwise a leaf value. The provided function is given the leaf value and produces the "sub"-leaf value. See the Calls Example to get a better sense of use cases.

    Note that calls are reached by the same syntax as a normal getter, and they take precedence over any field present in the data itself. For example, the getter "foo.bar" would, with a hint of { foo: { calls: { bar: () => ... }}}, perform the function in the hint rather than use any leaf value found at { foo: { bar: "value" }} in the input data.

  • coerce: (function, optional) -- used to coerce a potentially complex leaf value into a primitive value for predicate and sorting comparison. See the Coerce Example to get a better sense of use cases.

  • dir: (asc|desc, optional) -- the default sort direction to use for this get

Internal Context

This is used by support functions, it is essentially a copy of a given Context with the following additions:

FilterPredicate

This is an object with the following keys and values:

  • get: (string|Array<string>) -- a path of keys on the input objects to find the value or collection to be evaluated. For traversing nested keys, one may either provide an Array of keys, or use the separator in a single string, which defaults to .. If a string value is provided and no getInto value is specified, will also use the intoSeparator, which defaults to :, to provide a getInto value for further traversal into collection items.

    The following are equivalent with the default separator:

    • "author.name"
    • ["author", "name"]
  • coll: (string) -- a collection predicate, should the value at get be an Array, Iterator (uses .values()), or Object (uses Object.values()). The possible values are currently:

    • "any", "some": at least one member of the collection must pass the given predicate
    • "all", "every": all members of the collection must pass the given prediate
    • "no", "none": no members of the collection must pass the givne predicate
  • getInto: (string|Array<string>) -- a path of keys on the input objects to traverse collection items to find the value to be evaluated. For traversing nested keys, one may either provide an Array of keys, or use the intoSeparator in a single string, which defaults to :.

    If getInto is provided and no coll is provided, the collection predicate all will be specified automatically.

  • invert: (boolean, optional) -- if true, performs a boolean NOT for the leaf predicate, speficied below

  • predicate: (string) -- the test for which a leaf value -- either the end result of traversing get or the individual collection items should a collection modifier be provided and the item is a collection -- must pass for this predicate to be considered successful. Must match an available predicate name, currently only the list of default predicates.

  • target: (any|Array<any>, optional) -- the value against which the leaf values will be compared to using predicate. Currently, the only defalt predicate which has special handling for an array target is in; everything else will compare directly against the provided target value. This is only optional for the exists predicate.

    If target is a string value that starts with a $ character and whose subsequent word characters match a key in the Context's variables collection, the matching variable is substituted when the query is performed.

OrClause

This is an object with a single key:

AndClause

This is an object with a single key:

FilterStringClause

This is a string that will be parsed into a FilterPredicate by parseStringFilterLine. It is intended to read somewhat like natural english, for example:

  • date exists: {get: "date", predicate: ["exists"]}
  • date.year is greater than 2020: {get: ["date", "year"], predicate: "is greater than", target: "2020"}
  • any of tags equal poetry: {coll: "any", get: "tags", prediate: "equal", target: "poetry"}
  • type in poem, poetry: {get: "type", predicate: "in", target: ["poem", "poetry"]}
  • status not equal to draft: {get: "status", invert: true, predicate: "equal to", target: "draft"}
  • author.name equals 'Matthew Lyon': {get: ["author","name"], predicate: "equals", target: "Matthew Lyon"}
  • any authors:name starts with Matt: {coll: "any", get: ["authors"], getInto: ["name"], predicate: "starts with", target: "Matt"}

The string is parsed with the follwoing whitespace-separated segments per FilterPredicate IN ORDER:

  • coll, optional; may optionally include of to help with readability
  • get and optionally getInto
  • not or !, maps to invert: true, optional
  • predicate, must match one of the available predicate values verbatim but case-insensitive
  • target value, parsed to encompass quoted values first and then separate values by commas. Stray whitespace is trimmed.

SortClause

This is an object with the following keys and values:

  • get: (string|Array<string>) -- a path of keys on the input object to find the value to be compared for sorting. Works exactly the same as get for FilterPredicate.

  • dir: (asc|desc, case-insensitive, optional) -- defaults to asc or a direction provided for the getter in the Context's hints.

SortStringClause

This is a string that will be parsed into a SortClause by parseStringSortLine. This is a much more rudimentary parsing than done for FilterStringClauase, and currently simply trims surrounding whitespace, splits the string by whitespace, assigns the first value to get and the second value to dir.

Examples

Example: Coerce

const input = [
  {idx: 0, date: new Date(Date.parse('2025-01-01'))},
  {idx: 1, date: new Date(Date.parse('2025-01-02'))},
]

const query = {filter: [{get: "date", predicate: "equals", target: "2025-01-01"}]}

const context = {
  hints: {
    date: { coerce: d => d?.toISODate() }
  }
}

assert.strictEqual(entrails(input, query, context), [input[0]])

The coerce function on the date getter transforms the date leaf values into strings for easier comparisons. This can also used for sorting.

The coerce function is given a second argument: the root-level object on which this leaf function is being evaluated. This allows for additional context, default values, or synthesizing fields:

const input = [
  { id: 0, scores: [1, 10, 11] },
  { id: 1, scores: [20, 13, 2] },
  { id: 2, scores: [50, 0, -3] }
]

const query = {
  filter: [{get: "totalScore", predicate: ">=", target: 30}],
  sort: [{get: "totalScore"}]
}

const context = {
  hints: {
    totalScore: (_, { scores }) => scores.reduce((sum,n) => sum + n)
  }
}

assert.strictEqual(entrails(input, query, context), [input[2], input[1]])

Example: Calls

const input = [
  {idx: 0, name: "Alice"},
  {idx: 1, name: "Bob"},
  {idx: 2, name: "Carol"},
  {idx: 3, name: "Dave"},
  {idx: 4, name: "Eve"},
]

const query = {
  sort: [{get: "name.len", dir: "desc"}, {get: "name", dir: "desc"}],
}
// Sort by name length, descending; then name value, descending

const context = {
  hints: {
    name: { calls: { len: s => s.length } }
  }
}

assert.strictEqual(
  entrails(input, query, context),
  [input[2], input[0], input[3], input[4], input[1]]
)
// Carol, Alice, Dave, Eve, Bob

The calls object provides additional leaves on complex primitives; This example may be simplistic, but one could easily provide a whole set of sub-leaf values. For example with a Date, one could easily provide year, month, dayofmonth, weekday, or more additional leaf values, which are only evaluated on-demand.

Example: Traversing Collections

const input = [
  { authors: [{name: "Alice"}, {name: "Bob"}] },
  { authors: [{name: "Carol"}] },
  { authors: [{name: "Alice"}, {name: "Eve"}] },
]
const query = {
  filter: [{coll: "any", get: "authors:name", predicate: "equals", target: "Alice" }]
}

assert.strictEqual(entrails(input, query), [input[0], input[2]])

Here, the default intoSeparator, : is used in the get specifier to indicate that authors is the collection to process with any, and name is the value to travrse for each collection item to compare with equals. This filter predicate may also be written as:

  • { coll: "any", get: "authors", getInto: "name", predicate: "equals", target: "Alice" }
  • "any authors:name equals Alice"

Contributing

I am currently licensing entrails to a commercial organization, and as such I am not comfortable with accepting contributions without a contributor agreement. I am not currently setup to gather or store these, and as such I do not consider the project ready for contributions at this time.

TODO

  • top-level calls/hints

  • grouping

  • calls in hints, grouping
  • sort comparators in hints
  • scopes: default included and fragments

  • aggregates

  • window processing

  • fallback/default values for gets; either through hints or directly in the query

  • hints to cast numbers? doesn't seem to matter with localeCompare / weak equality
  • contributor agreement / code of conduct

Security

Security of any query language or DSL is subject to many things, primarily in how it is used. Entrails is designed to filter, sort, and otherwise organize information in a read-only manner from an in-memory store provided at call-time. It has a rudimentary variable system meant to provide call-time interpretation of variables; this is not like parameterized queries in database languages, rather it is meant for the calling context to provide dynamic values which the query may use, such as the current date.

This all said, entrails should not modify the data is is sifting through, nor provide direct access to potentially side-effecting functions on the objects it queries. If you believe you have discovered a bug of this nature, please feel free to, depending on perceived severity, file an issue or contact me. I am avaiable via Signal as well, please email for instructions.

License

ACSL © Matthew Lyon

0.5.0

6 months ago

0.3.0

6 months ago

0.2.0

6 months ago

0.0.2

6 months ago