@keeex/merkle NPM

@keeex/merkle

Manipulate merkle hash trees. This library creates balanced binary merkle tree with padding. It is possible to extract proof leading to the root of a tree by only providing a subset of the leaves. These proof can later be matched against the expected root without rebuilding the whole tree.

Requirement for development

This package requires the development-tools scripts to be available on the path.

Usage

A tree is initialized with leaves; the type of the leaves is actually free. To handle any kind of input, a data model must be provided. It is this model that takes care of converting data into an exploitable format and perform actual hash computations.

Datamodel

A data model implements conversion between the raw leaves data and a byte buffer. It also provides a way to locate a leaf from a shorthand identifier. Finally, hash computation are also delegated to the data model.

A basic example using JSON as leaves data, number as identifier and SHA-256 for hashing:

import {DataModel} from "@keeex/merkle/types.js";
import {digestImmediate, Algorithms} from "@keeex/crypto/lib/digest.js";
import {jsonStringifyCanonical} from "@keeex/js-utils/lib/dict.js";
import {buf82utf8, utf82buf8} from "@keeex/js-utils/lib/uint8array.js";

export interface RawDataType {
  id: number;
  payload: string;
}

export const dataModel: DataModel<RawDataType, integer> = {
  hash: async (input): Promise<Uint8Array> => digestImmediate(Algorithms.SHA256, input),
  isLeafDataId: (id, leafData) => leafData.id === id,
  leafToInput: leafData => utf82buf8(jsonStringifyCanonical(leafData)),
  inputToLeafdata: input => JSON.parse(buf82utf8(input)) as RawDataType,
  hashAlgorithm: Algorithms.SHA256,
};

It is important that conversion for raw leaf data to input buffer remains stable, so that each call with the same effective input value produces the exact same output.

Creating a tree

Trees are created by calling Tree#create(), providing the raw leaves data and the data model.

import Tree from "@keeex/merkle/tree.js";
import {dataModel, RawDataType} from "./test.js";

const leaves: Array<RawDataType> = [
  {id: 0, payload: "zero"},
  {id: 1, payload: "a"},
  {id: 2, payload: "b"},
];

const tree = await Tree.create(leaves, dataModel);

// Trees can be saved and loaded quicker than they are recreated
const treeData = await tree.save();
const loadedTree = await Tree.load(treeData, dataModel, false);
// Loaded tree integrity can be checked if from unknown source
const loadedTreeChecked = await Tree.load(treeData, dataModel, true);

Checking the integrity of a loaded tree amounts to recomputing all hashes, which can be expensive and can be avoided if the data source is trusted.

There are provisions to provide a different datamodel depending on the hash algorithm used in the saved tree by providing a function instead of an object. Such function receive the hash algorithm name as its first argument and must return either a DataModel instance or a promise resolving with one.

Creating autonomous proof

An autonomous proof is a string that can be combined to any amount of initial raw leaf data and linked back to the root of the tree. The size of an autonomous proof for one leaf is only dependent on the tree height (and indirectly to the number of leaves). Multi-leaves proof's size are more variable.

Creating and checking a proof is done as follow:

// Continuation of tree creation sample

// Single leaf proof
const proof = await tree.getProof(1);
const proofIsValid = await checkProof(leaves[1], dataModel, proof);

// Multi-leaf proof
const proof2 = await tree.getProof([0, 2]);
const proof2IsValid = await checkProof([leaves[0], leaves[1]], dataModel, proof);

Proof encoding

Single leaf proof

When extracting a leaf proof, the encoding follows the following scheme using a mix of js-utils's Marshaller class and strings.

The beginning of the proof, up to the second dollar sign, is expected to be used as reference for the proof/tree. It can be anchored on ledgers and signed.

The whole proof is built this way:

The string 1$ indicate that this is a version 1 proof
The root value of the tree is appended as a base64 string followed by #
The height of the tree is appended as a base 10 string followed by #
The hash algorithm used in the tree is appended (using identifiers from @keeex/crypto)
A $ is appended to mark the beginning of the proof itself
The proof is a binary string and is appended here as a base64 string. The binary content is built using Marshaller:
- The leaf is added using a byte to store the F char code followed by a uint32 indicating the leaf ID (starting at 0)
- For each step above the leaf, a byte is used to hold either L or R, followed by a buffer containing the sibling's node hash value
- The stream is ended by a $ character (in the marshaller)

A single-leaf proof using SHA256 for a tree of height 14 would use:

2 bytes for the version prefix
44+1 bytes for the root hash value
2+1 bytes for the tree height
7 bytes for the hash algorithm
1 byte for the $ separator
4 bytes of header for Marshaller
5 bytes for the leaf reference
37*13 bytes for each intermediate nodes
1 byte for the final marshalled data separator

The Marshaller part is then encoded in base64, which give a final size of Encoded in base64 this lead to an ASCII proof of 714 bytes.

To validate the proof, one would have to provide the initial leaf value (or its hash).

Multi-leaf proof

Multi-leaf proof are able to produce a proof for multiple leaf without disclosing the whole tree. Their format is based on the single-leaf proof format but add more markers. In addition to the L and R markers the l and r markers are added. When present, they are not followed by a node hash but by a subtree encoded the same way as for a single proof (the part between #), ending with a #. The leaf order used when creating and verifying a proof must be the same.

merkle hash tree

@keeex/tslib @keeex/utils

5 months ago

8 months ago

8 months ago