3.2.3 • Published 10 months ago

@cto.af/unicode-trie v3.2.3

Weekly downloads
-
License
MIT
Repository
github
Last release
10 months ago

@cto.af/unicode-trie

A data structure for fast Unicode character metadata lookup, ported from ICU This version was copied from https://github.com/foliojs/unicode-trie and modernized slightly.

Background

When implementing many Unicode algorithms such as text segmentation, normalization, bidi processing, etc., fast access to character metadata is crucial to good performance. There over a million code points in the Unicode standard, many of which produce the same result when looked up, so an array or hash table is not appropriate - those data structures are fast but would require a lot of memory. The data is generally grouped in ranges, so you could do a binary search, but that is not fast enough for some applications.

The International Components for Unicode (ICU) project came up with a data structure based on a Trie that provides fast access to Unicode metadata. The range data is precompiled to a serialized and flattened trie, which is then used at runtime to lookup the necessary data. According to my own tests, this is generally at least 50% faster than binary search, with not too much additional memory required.

Installation

npm install @cto.af/unicode-trie

Building a Trie

Unicode Tries are generally precompiled from data in the Unicode database for faster runtime performance. To build a Unicode Trie, use the UnicodeTrieBuilder class.

import {writeFile} from '@cto.af/unicode-trie/file';
// This will download a local copy of LineBreak.txt, parse it, and add the
// first field to the trie for each range of characters.
await writeFile('LineBreak.txt', {
  // This is the default transform.
  transform(lineBreak) { return lineBreak };
});

You can also pass in string values to set and setRange:

import {UnicodeTrieBuilder} from '@cto.af/unicode-trie/builder';
const t = new UnicodeTrieBuilder('XX', 'ER');
t.set(0x4567, 'FOO')
t.setRange(0x40, 0xe7, 'BAR')

The intent is that you might use a small number of strings, such as the names of Unicode property values. These strings are converted to small integers, and the mapping is stored into the compressed trie.

Using a precompiled Trie

Use the `@cto.af/unicode-trie-runtime' package to load the precompiled trie into memory.

License

MIT


Tests codecov

3.2.2

10 months ago

3.2.1

10 months ago

3.2.0

10 months ago

3.0.2

10 months ago

3.1.0

10 months ago

3.2.3

10 months ago

3.0.0

10 months ago

2.0.2

1 year ago

2.0.1

2 years ago

2.0.0

2 years ago

1.2.3

2 years ago

1.2.2

2 years ago

1.2.1

3 years ago

1.2.0

3 years ago

1.1.0

3 years ago

1.0.0

3 years ago