1.0.2 • Published 2 years ago

trie-fuzzy v1.0.2

Weekly downloads
-
License
MIT
Repository
-
Last release
2 years ago

trie-fuzzy

A TS implementation of the Trie data structure, including fuzzy (approximate) string matching.

Powered by DTS.

Features

  • Exact string search
  • Prefix search
  • Fuzzy search
  • Generator interfaces - no need to store all results in memory all at once
  • Runs in both NodeJS and in browsers
  • No dependencies

Usage

import { Trie } from 'trie-fuzzy';

const trie = new Trie([
  'armor',
  'armadillo',
  'armageddon',
  'artisan',
  'timer',
  'time',
  'tier',
  'dime',
  'fiber',
  'mime',
  'miner',
]);

trie.has('armor'); // true
trie.has('arm'); // false

for (const result of trie.prefixSearch('arm')) {
  console.log(result);
}
/*
armadillo
armarmageddon
armor
*/

for (const { key, distance } of trie.fuzzySearch('timer', 2)) {
  console.log(key, distance);
}
/*
dime 2
fiber 2
mime 2
miner 2
tier 1
time 1
timer 0
*/

API


Constructor

constructor(words: string[])

Builds a new Trie indexing all words in words.

words: a list of words that will be inserted in the Trie.


has - exact string matching

has(word: string) => boolean

Verifies if a given word is contained in the Trie's word set.

word: the word to be queried in the Trie.

Returns: true if word exists in the Trie, false otherwise.


prefixSearch - prefix search

*prefixSearch(prefix: string) => Generator<string>

Searches for all words in the Trie's set with a given prefix.

prefix: the prefix to be queried.

Returns: A Generator that iterates through all words in the Trie's set that start with prefix.


fuzzySearch - approximate string matching

*fuzzySearch(word: string, maxDistance: number = 1) => Generator<{ key: string, distance: number }>

Searches for all words in the Trie's set that are similar to a given word. Uses the Damerau-Levenshtein distance (or edit distance with character transposition) to measure similarity.

word: the word to be queried.

maxDistance: the threshold that defines the maximal edit distance between word and the returned results.

Returns: A Generator that iterates through all words in the Trie's set where the edit distance to word is lower than or equal to maxDistance. Each result is an object containing two keys: key holds the word from the Trie set that was matched, and distance holds the edit distance between the result and word.


Implementation Details

  • Trie is an immutable class - after a trie is built, no other words can be added to it nor removed from it. It was designed like this to speed up prefix search and also to leverage the benefits of immutability.

  • Every query operation in Trie is case sensitive - meaning that a Trie that contains the word KERFUFFLE will not return it if the user searches for kerfuffle (either through has, prefixSearch or fuzzySearch). It was designed like this for the sake of simplicity and to avoid the many edge cases that might arise - it's up to the user to clean up the keys before building a Trie and querying it. The code below is an example of how to perform case-insensitive queries in a Trie:

import { Trie } from 'trie-fuzzy';

const keys = [
  'Timer',
  'Time',
  'Tier',
  'Dime',
  'Fiber',
  'Mime',
  'Miner',
];

const cleanKey = (key: string) => key.toUpperCase();

const trie = new Trie(keys.map(cleanKey));

trie.has('timer'); // false
trie.has(cleanKey('timer')); // true
trie.has(cleanKey('TiMeR')); // true

Author


License: MIT