Tiny-tree NPM | npm.io

TinyTree

TinyTree is a zero-dependency library that efficiently searches for uniquely-identified items in a range. It includes two implementations with different performance characteristics. ArrayTree provides extremely fast access for data that rarely changes. BTree provides slightly slower access but performs well with highly volatile data.

Both trees expose the same API, so you can use them interchangeably.

Motivation

Let's say you have a list of customers, and you want to find everyone with id between 320 and 400. If you had your customers in a database, you might write a query like SELECT * from customers where id >=320 and id <= 400;. If you wanted this query to run fast, you'd create an index on the id field.

TinyTree provides this kind of indexed lookup for JavaScript objects.

One way to do this is to create an array that is sorted by id. Then you can use binary search to find quickly find the items you want. This is roughly what ArrayTree does, but it includes a number of methods to make these queries more convenient to write and provides methods for modifying the list.

Sorted arrays perform extremely well for searches, but they are expensive to update, especially if you are inserting items in the middle. BTree provides a performant alternative for volatile data based on the B-Tree data structure.

Usage

Creating a new tree

// Node.js
const tinyTree = require("tiny-tree"); // or import {ArrayTree, BTree} from "tiny-tree" if you're using ES modules
const bTree = new tinyTree.BTree();
const arrayTree = new tinyTree.ArrayTree();

<!--Browser-->
<script type="text/javascript" src="tiny-tree.js"></script>
<script type="text/javascript">
  const bTree = new tinyTree.BTree();
  const arrayTree = new tinyTree.ArrayTree();
</script>

Options

const tree = new BTree({degree: 17});

BTree can create trees of any degree; by default, trees are of degree 15, which provides reasonable performance most of the time. You can change this by passing options to the constructor:

degree (integer; default 15; min 3)—determines how many items are stored at each node. See the performance section below for guidance on setting the degree.

ArrayTree does not take any options.

Adding/replacing entries

tree.set(key, value);

Keys must be strings or numbers. They are compared with javascript's default comparison operators (<, >, =, etc).

Values can be any data type.

Bulk loading items

let mySortedData = [
  ["key-a", "value-for-key-a"],
  ["key-b", "value-for-key-b"],
  ["key-c", "value-for-key-c"],
  ["key-d", "value-for-key-d"],
];

tree.bulkLoad(mySortedData);

You can get substantially better loading time by bulk loading a sorted list of entries. Bulk loading is only allowed on empty trees.

The data must be a sorted array of key/value arrays as shown in the example.

For BTrees, bulk loading also leads to better query performance (roughly 50%) and memory usage.

Removing entries

tree.delete(key);

Removing all entries

tree.clear();

Getting a single value by key or index

tree.get(key);
// returns value for the given key or `undefined` if no value has been set for this key

tree.getByIndex(4); // get the fifth entry from the sorted entries

Both functions return undefined if the index or key is not in the tree.

Getting all entries in order

// get all entries
tree.toArray();
// output [["key-a", "value-a"], ["key-b", "value-b"], ["key-c", "value-c"], ...]

// get all values (no keys)
tree.toArray(null, true);
// output ["value-a", "value-b", "value-c", ...]

Note that the first argument to toArray is an optional bounds (see below).

Getting values only is substantially faster than getting keys and values together. See the Performance section for more information.

Getting a range of entries based on key

// get all entries where key >= "key-a" and key < "key-c"
let bounds = {min: "key-a", max: "key-c"};
tree.toArray(bounds);
// output [["key-a", "value-a"], ["key-b", "value-b"]]

// get just values where key >= "key-a" and key < "key-c"
let bounds = {min: "key-a", max: "key-c"};
tree.toArray(bounds, true);
// output ["value-a", "value-b"]

You can limit your results to just the keys in a certain range. BTrees and ArrayTrees are especially well-suited to this kind of query.

tree.toArray(bounds optional object, valuesOnly optional boolean)—get key/value pairs or values based on bounds.

When setting bounds, min is inclusive and max is exclusive. If you need to change this behavior, set minExclusive and/or maxInclusive instead. All min/max properties are optional. It is equivalent to set min/minInclusive and max/maxExclusive.

Examples:

{min: 100, max: 200}: 100 ≤ key < 200
{minInclusive: 100, max: 200}: 100 ≤ key < 200
{minExclusive: 100, max: 200}: 100 < key < 200
{min: 100, maxInclusive: 200}: 100 ≤ key ≤ 200
{min: 100}: 100 ≤ key
{max: 200}: 200 < key

Getting a range of entries based on index (sort position)

// get the top two entries
tree.toArrayByIndex(0, 2, true);
// output ["value-a", "value-b"]

// get the bottom two entries
tree.toArrayByIndex(tree.size - 3, 2, true);
// output ["value-y", "value-z"]

tree.toArrayByIndex(start integer, count integer, valuesOnly optional boolean)

This works very much like toArray, but instead of querying by key, you are querying by sort position. This is useful for determining the top N or bottom N entries.

Size and other statistics

tree.size; // get total number of values stored

tree.getStats(); // get detailed information about the BTree

If you are trying to tune your tree for performance, detailed statistics might be helpful.

For BTree, getStats returns an object with the following fields:

degree—the BTree's degree, as set in the constructor.
size—the total number of values stored. This is the same value as tree.size.
depth—the depth of the tree structure. Deeper trees can take longer to traverse. In general, trees with higher degrees will be shallower.
fillFactor—the proportion of keys that are non-empty. A fillFactor of 1 means that all keys are filled and there is no empty space. For a large tree with random data, values of 0.6 to 0.7 are typical.
nodes—the total number of nodes in the tree. In general, trees with higher degrees will need fewer nodes.
saturationFactor—the proportion of nodes that are completely filled. Adding keys to a saturated node will cause the tree to rebalance. Conversely, trees with a low saturation factor can accommodate new entries with higher performance. In general, trees of higher degree have a lower saturation factor.
keySlots—the total number of keys that the tree can store with its current node configuration.
filledKeySlots—the number of these slots that are filled. Note that fillFactor = filledKeySlots / keySlots.
saturatedNodes—the number of saturated nodes. Note that saturationFactor = saturatedNodes / nodes.

For ArrayTree, getStats is included mostly for compatibility and returns an object with the following field:

size—the total number of values stored. This is the same value as tree.size.

Performance

Performance is highly dependent on your data and access patterns, but there are some general rules that can help.

For gets and queries, ArrayTree is always faster than BTree, in some cases by more than 50x. If your data doesn't change much, you should use ArrayTree.
If your data changes a lot, you should use BTree. If you have more than 50,000 items or so with lots of adds and removes, ArrayTree may be unusably slow.
For queries, you should use valuesOnly = true if possible. For ArrayTree, this provides an enormous performance benefit (often 10x). For BTree, the performance boost is modest (typically around 5%).
For both ArrayTree and BTree, bulk loads are always faster and provide equal or better query performance. If you can bulk load your data, you should.

BTree Benchmarks

For BTrees, degree affects performance in a complicated way that depends on how you access data (especially the number of items you query at a time). Up to a point, higher-degree trees have slower inserts/deletes but faster lookups. The following table also illustrates the substantial benefits of bulk loading.

On a 2017 15" MacBook Pro, here are operations per second for typical operations.

Test	Degree 3	Degree 7	Degree 15	Degree 31	Degree 63
Bulk load 10,000 items	142 ops/s	232 ops/s	216 ops/s	159 ops/s	91 ops/s
Fill factor	99.9%	99.8%	99.7%	99.5%	99.0%
Depth	9	5	4	3	3
Get 1000 items by key	2,329 ops/s	2,855 ops/s	2,469 ops/s	1,887 ops/s	985 ops/s
Get 1000 items by index	5,317	8,659	11,038	12,504	9,649
Query 1000 ranges of size 100 by key	212	336	352	327	237
Query 1000 ranges of size 100 by index	259	446	521	465	427
Query 1000 ranges of size 1000 by key	28	50	61	60	64
Query 1000 ranges of size 1000 by index	29	51	64	72	75
Load 10,000 items in random order	82 ops/s	150 ops/s	162 ops/s	133 ops/s
Fill factor	67.0%	68.3%	69.4%	69.2%	69.5%
Depth	11	6	4	3	3
Get 1000 items by key	2,289 ops/s	2,646 ops/s	2,392 ops/s	1,749 ops/s	1,131 ops/s
Get 1000 items by index	3,932	7,443	9,708	10,981	8,889
Query 1000 ranges of size 100 by key	132 ops/s	233 ops/s	274 ops/s	273 ops/s	218 ops/s
Query 1000 ranges of size 100 by index	148	284	363	410	357
Query 1000 ranges of size 1000 by key	17	33	43	50	51
Query 1000 ranges of size 1000 by index	17	34	45	53	58
Bulk load 1,000,000 items	0.57 ops/s	0.74 ops/s	0.72 ops/s	0.57 ops/s	0.42 ops/s
Fill factor	100%	100%	100%	100%	100%
Depth	13	8	6	5	4
Get 1000 items by key	329 ops/s	420 ops/s	305 ops/s	186 ops/s	114 ops/s
Get 1000 items by index	1,694	2,518	2,875	2,620	2,092
Query 1000 ranges of size 100 by key	49	84	86	69	46
Query 1000 ranges of size 100 by index	61	146	229	342	294
Query 1000 ranges of size 1000 by key	7.1	16	23	27	22
Query 1000 ranges of size 1000 by index	7.5	18	29	37	41
Load 1,000,000 items in random order	0.17 ops/s	0.25 ops/s	0.25 ops/s	0.21 ops/s	0.15 ops/s
Fill factor	67.0%	68.1%	68.6%	69.2%	69.4%
Depth	16	9	6	5	4
Get 1000 items by key	310 ops/s	386 ops/s	354 ops/s	226 ops/s	147 ops/s
Get 1000 items by index	614	1,542	1,998	2,271	2,364
Query 1000 ranges of size 100 by key	22	48	68	66	51
Query 1000 ranges of size 100 by index	24	60	116	171	221
Query 1000 ranges of size 1000 by key	2.7	6.7	13	16	20
Query 1000 ranges of size 1000 by index	2.8	6.9	14	22	29

BTree vs ArrayTree benchmarks

ArrayTree provides very close to the theoretical best-case performance for queries. If your data changes rarely, you can get an enormous boost in performance by using it. Also, if your data is fairly small (less than 10,000 items), you should consider using it even if your data is volatile, provided that queries are significantly more common than updates.

ArrayTree is especially efficient at getting contiguous arrays of values, where it achieves nearly the same performance as raw array access.

Test	BTree (degree 15)	ArrayTree	Speedup (slowdown)
Bulk load 10k items, add/remove 5k items	71 ops/s	37 ops/s	(1.9x)
Get 1000 items by key	2,100	3,700	1.8x
Get 1000 items by index	9,800	540,000	55x
Query 1000 ranges of size 100 by key	300	1,500	5x
Query 1000 ranges of size 100 by index	450	12,000	34x
Query 1000 ranges of size 1000 by key	55	730	13x
Query 1000 ranges of size 1000 by index	59	1,400	24x
Bulk load 100k items, add/remove 50k items	3.0 ops/s	0.07 ops/s	(42x)
Get 1000 items by key	880	1,600	1.8x
Get 1000 items by index	4,700	300,000	64x
Query 1000 ranges of size 100 by key	130	430	3.3x
Query 1000 ranges of size 100 by index	340	1,900	5.6x
Query 1000 ranges of size 1000 by key	31	160	5.2x
Query 1000 ranges of size 1000 by index	41	580	14x
Bulk load 1M items, add/remove 500k items	0.16 ops/s	Did not finish	-
Get 1000 items by key	305	n/a	-
Get 1000 items by index	2,069	n/a	-
Query 1000 ranges of size 100 by key	71	n/a	-
Query 1000 ranges of size 100 by index	150	n/a	-
Query 1000 ranges of size 1000 by key	17	n/a	-
Query 1000 ranges of size 1000 by index	20	n/a	-

b-tree btree binary tree binary-tree search-tree tree data-structure balanced tree index

@infinitebrahmanuniverse/nolb-tiny-t @everything-registry/sub-chunk-2949

4 years ago

4 years ago

4 years ago

6 years ago

6 years ago

6 years ago