0.3.0 • Published 3 years ago

tiny-tree v0.3.0

Weekly downloads
2
License
MIT
Repository
github
Last release
3 years ago

TinyTree

TinyTree is a zero-dependency library that efficiently searches for uniquely-identified items in a range. It includes two implementations with different performance characteristics. ArrayTree provides extremely fast access for data that rarely changes. BTree provides slightly slower access but performs well with highly volatile data.

Both trees expose the same API, so you can use them interchangeably.

Motivation

Let's say you have a list of customers, and you want to find everyone with id between 320 and 400. If you had your customers in a database, you might write a query like SELECT * from customers where id >=320 and id <= 400;. If you wanted this query to run fast, you'd create an index on the id field.

TinyTree provides this kind of indexed lookup for JavaScript objects.

One way to do this is to create an array that is sorted by id. Then you can use binary search to find quickly find the items you want. This is roughly what ArrayTree does, but it includes a number of methods to make these queries more convenient to write and provides methods for modifying the list.

Sorted arrays perform extremely well for searches, but they are expensive to update, especially if you are inserting items in the middle. BTree provides a performant alternative for volatile data based on the B-Tree data structure.

Usage

Creating a new tree

// Node.js
const tinyTree = require("tiny-tree"); // or import {ArrayTree, BTree} from "tiny-tree" if you're using ES modules
const bTree = new tinyTree.BTree();
const arrayTree = new tinyTree.ArrayTree();
<!--Browser-->
<script type="text/javascript" src="tiny-tree.js"></script>
<script type="text/javascript">
  const bTree = new tinyTree.BTree();
  const arrayTree = new tinyTree.ArrayTree();
</script>
Options
const tree = new BTree({degree: 17});

BTree can create trees of any degree; by default, trees are of degree 15, which provides reasonable performance most of the time. You can change this by passing options to the constructor:

  • degree (integer; default 15; min 3)—determines how many items are stored at each node. See the performance section below for guidance on setting the degree.

ArrayTree does not take any options.

Adding/replacing entries

tree.set(key, value);

Keys must be strings or numbers. They are compared with javascript's default comparison operators (<, >, =, etc).

Values can be any data type.

Bulk loading items

let mySortedData = [
  ["key-a", "value-for-key-a"],
  ["key-b", "value-for-key-b"],
  ["key-c", "value-for-key-c"],
  ["key-d", "value-for-key-d"],
];

tree.bulkLoad(mySortedData);

You can get substantially better loading time by bulk loading a sorted list of entries. Bulk loading is only allowed on empty trees.

The data must be a sorted array of key/value arrays as shown in the example.

For BTrees, bulk loading also leads to better query performance (roughly 50%) and memory usage.

Removing entries

tree.delete(key);

Removing all entries

tree.clear();

Getting a single value by key or index

tree.get(key);
// returns value for the given key or `undefined` if no value has been set for this key
tree.getByIndex(4); // get the fifth entry from the sorted entries

Both functions return undefined if the index or key is not in the tree.

Getting all entries in order

// get all entries
tree.toArray();
// output [["key-a", "value-a"], ["key-b", "value-b"], ["key-c", "value-c"], ...]

// get all values (no keys)
tree.toArray(null, true);
// output ["value-a", "value-b", "value-c", ...]

Note that the first argument to toArray is an optional bounds (see below).

Getting values only is substantially faster than getting keys and values together. See the Performance section for more information.

Getting a range of entries based on key

// get all entries where key >= "key-a" and key < "key-c"
let bounds = {min: "key-a", max: "key-c"};
tree.toArray(bounds);
// output [["key-a", "value-a"], ["key-b", "value-b"]]

// get just values where key >= "key-a" and key < "key-c"
let bounds = {min: "key-a", max: "key-c"};
tree.toArray(bounds, true);
// output ["value-a", "value-b"]

You can limit your results to just the keys in a certain range. BTrees and ArrayTrees are especially well-suited to this kind of query.

tree.toArray(bounds optional object, valuesOnly optional boolean)—get key/value pairs or values based on bounds.

When setting bounds, min is inclusive and max is exclusive. If you need to change this behavior, set minExclusive and/or maxInclusive instead. All min/max properties are optional. It is equivalent to set min/minInclusive and max/maxExclusive.

Examples:

  • {min: 100, max: 200}: 100 ≤ key < 200
  • {minInclusive: 100, max: 200}: 100 ≤ key < 200
  • {minExclusive: 100, max: 200}: 100 < key < 200
  • {min: 100, maxInclusive: 200}: 100 ≤ key ≤ 200
  • {min: 100}: 100 ≤ key
  • {max: 200}: 200 < key

Getting a range of entries based on index (sort position)

// get the top two entries
tree.toArrayByIndex(0, 2, true);
// output ["value-a", "value-b"]

// get the bottom two entries
tree.toArrayByIndex(tree.size - 3, 2, true);
// output ["value-y", "value-z"]

tree.toArrayByIndex(start integer, count integer, valuesOnly optional boolean)

This works very much like toArray, but instead of querying by key, you are querying by sort position. This is useful for determining the top N or bottom N entries.

Size and other statistics

tree.size; // get total number of values stored
tree.getStats(); // get detailed information about the BTree

If you are trying to tune your tree for performance, detailed statistics might be helpful.

For BTree, getStats returns an object with the following fields:

  • degree—the BTree's degree, as set in the constructor.
  • size—the total number of values stored. This is the same value as tree.size.
  • depth—the depth of the tree structure. Deeper trees can take longer to traverse. In general, trees with higher degrees will be shallower.
  • fillFactor—the proportion of keys that are non-empty. A fillFactor of 1 means that all keys are filled and there is no empty space. For a large tree with random data, values of 0.6 to 0.7 are typical.
  • nodes—the total number of nodes in the tree. In general, trees with higher degrees will need fewer nodes.
  • saturationFactor—the proportion of nodes that are completely filled. Adding keys to a saturated node will cause the tree to rebalance. Conversely, trees with a low saturation factor can accommodate new entries with higher performance. In general, trees of higher degree have a lower saturation factor.
  • keySlots—the total number of keys that the tree can store with its current node configuration.
  • filledKeySlots—the number of these slots that are filled. Note that fillFactor = filledKeySlots / keySlots.
  • saturatedNodes—the number of saturated nodes. Note that saturationFactor = saturatedNodes / nodes.

For ArrayTree, getStats is included mostly for compatibility and returns an object with the following field:

  • size—the total number of values stored. This is the same value as tree.size.

Performance

Performance is highly dependent on your data and access patterns, but there are some general rules that can help.

  • For gets and queries, ArrayTree is always faster than BTree, in some cases by more than 50x. If your data doesn't change much, you should use ArrayTree.
  • If your data changes a lot, you should use BTree. If you have more than 50,000 items or so with lots of adds and removes, ArrayTree may be unusably slow.
  • For queries, you should use valuesOnly = true if possible. For ArrayTree, this provides an enormous performance benefit (often 10x). For BTree, the performance boost is modest (typically around 5%).
  • For both ArrayTree and BTree, bulk loads are always faster and provide equal or better query performance. If you can bulk load your data, you should.

BTree Benchmarks

For BTrees, degree affects performance in a complicated way that depends on how you access data (especially the number of items you query at a time). Up to a point, higher-degree trees have slower inserts/deletes but faster lookups. The following table also illustrates the substantial benefits of bulk loading.

On a 2017 15" MacBook Pro, here are operations per second for typical operations.

TestDegree 3Degree 7Degree 15Degree 31Degree 63
Bulk load 10,000 items142 ops/s232 ops/s216 ops/s159 ops/s91 ops/s
Fill factor99.9%99.8%99.7%99.5%99.0%
Depth95433
Get 1000 items by key2,329 ops/s2,855 ops/s2,469 ops/s1,887 ops/s985 ops/s
Get 1000 items by index5,3178,65911,03812,5049,649
Query 1000 ranges of size 100 by key212336352327237
Query 1000 ranges of size 100 by index259446521465427
Query 1000 ranges of size 1000 by key2850616064
Query 1000 ranges of size 1000 by index2951647275
Load 10,000 items in random order82 ops/s150 ops/s162 ops/s133 ops/s
Fill factor67.0%68.3%69.4%69.2%69.5%
Depth116433
Get 1000 items by key2,289 ops/s2,646 ops/s2,392 ops/s1,749 ops/s1,131 ops/s
Get 1000 items by index3,9327,4439,70810,9818,889
Query 1000 ranges of size 100 by key132 ops/s233 ops/s274 ops/s273 ops/s218 ops/s
Query 1000 ranges of size 100 by index148284363410357
Query 1000 ranges of size 1000 by key1733435051
Query 1000 ranges of size 1000 by index1734455358
Bulk load 1,000,000 items0.57 ops/s0.74 ops/s0.72 ops/s0.57 ops/s0.42 ops/s
Fill factor100%100%100%100%100%
Depth138654
Get 1000 items by key329 ops/s420 ops/s305 ops/s186 ops/s114 ops/s
Get 1000 items by index1,6942,5182,8752,6202,092
Query 1000 ranges of size 100 by key4984866946
Query 1000 ranges of size 100 by index61146229342294
Query 1000 ranges of size 1000 by key7.116232722
Query 1000 ranges of size 1000 by index7.518293741
Load 1,000,000 items in random order0.17 ops/s0.25 ops/s0.25 ops/s0.21 ops/s0.15 ops/s
Fill factor67.0%68.1%68.6%69.2%69.4%
Depth169654
Get 1000 items by key310 ops/s386 ops/s354 ops/s226 ops/s147 ops/s
Get 1000 items by index6141,5421,9982,2712,364
Query 1000 ranges of size 100 by key2248686651
Query 1000 ranges of size 100 by index2460116171221
Query 1000 ranges of size 1000 by key2.76.7131620
Query 1000 ranges of size 1000 by index2.86.9142229

BTree vs ArrayTree benchmarks

ArrayTree provides very close to the theoretical best-case performance for queries. If your data changes rarely, you can get an enormous boost in performance by using it. Also, if your data is fairly small (less than 10,000 items), you should consider using it even if your data is volatile, provided that queries are significantly more common than updates.

ArrayTree is especially efficient at getting contiguous arrays of values, where it achieves nearly the same performance as raw array access.

TestBTree (degree 15)ArrayTreeSpeedup (slowdown)
Bulk load 10k items, add/remove 5k items71 ops/s37 ops/s(1.9x)
Get 1000 items by key2,1003,7001.8x
Get 1000 items by index9,800540,00055x
Query 1000 ranges of size 100 by key3001,5005x
Query 1000 ranges of size 100 by index45012,00034x
Query 1000 ranges of size 1000 by key5573013x
Query 1000 ranges of size 1000 by index591,40024x
Bulk load 100k items, add/remove 50k items3.0 ops/s0.07 ops/s(42x)
Get 1000 items by key8801,6001.8x
Get 1000 items by index4,700300,00064x
Query 1000 ranges of size 100 by key1304303.3x
Query 1000 ranges of size 100 by index3401,9005.6x
Query 1000 ranges of size 1000 by key311605.2x
Query 1000 ranges of size 1000 by index4158014x
Bulk load 1M items, add/remove 500k items0.16 ops/sDid not finish-
Get 1000 items by key305n/a-
Get 1000 items by index2,069n/a-
Query 1000 ranges of size 100 by key71n/a-
Query 1000 ranges of size 100 by index150n/a-
Query 1000 ranges of size 1000 by key17n/a-
Query 1000 ranges of size 1000 by index20n/a-