trie-louds v1.1.0
Trie-LOUDS
Readonly but memory-sufficient data structure for dictionaries by utilizing LOUDS.
Install
$ npm install --save trie-loudsAPI
static fromKeywordList(keys, verbose=false)
Build a tree from list of keywords.
Parameters
keys: string[]List of keywords.verbose: booleanIf true, some debug logs will be printed.
Returns
tree: ReadonlyTrieTree
Example
const {ReadonlyTrieTree} = require("trie-louds");
const tree = ReadonlyTrieTree.fromKeywordList(["She", "sells", "seashells", "by", "the", "seashore"]);contains(word)
Returns true if it contains the given word.
This is case-sensitive.
Parameters
word: string
Returns
answer: boolean
Example
const {ReadonlyTrieTree} = require("trie-louds");
const tree = ReadonlyTrieTree.fromKeywordList(["She", "sells", "seashells", "by", "the", "seashore"]);
console.log(tree.contains("She")); // true
console.log(tree.contains("she")); // falsegetValue(word)
Returns word's index in the keyword list.
If it doesn't contain word, this returns null.
Parameters
word: string
Returns
index: number|null
Example
const {ReadonlyTrieTree} = require("trie-louds");
const tree = ReadonlyTrieTree.fromKeywordList(["She", "sells", "seashells", "by", "the", "seashore"]);
console.log(tree.getValue("She")); // 0
console.log(tree.getValue("sells")); // 1
console.log(tree.getValue("seashells")); // 2
console.log(tree.getValue("sell")); // null (not found)getWords(prefix, limit=1000)
Search the words which have the given prefix.
If more than limit words are found, property hasMore become true.
Parameters
prefix: stringlimit: numberThe maximum number of words in the result.
Returns
result: SearchResultwords: string[]Found words.values: number[]Indices of found words.hasMore: booleanIf true, there are unsearched words.temporaryInfo?: TempInfoThis exists iffhasMoreistrue. SeegetMoreWords().
Example
const {ReadonlyTrieTree} = require("trie-louds");
const tree = ReadonlyTrieTree.fromKeywordList(["She", "sells", "seashells", "by", "the", "seashore"]);
console.log(tree.getWords("").words); // [ 'She', 'by', 'seashells', 'seashore', 'sells', 'the' ] (searched words are sorted)
const limited = tree.getWords("", 3);
console.log(limited.words); // [ 'She', 'by', 'seashells' ]
console.log(limited.hasMore); // truegetWords(prefix, setting)
You can set more detailed search settings.
Parameters
prefix: stringsetting: OptSearchSettinglimit?: numberDefault is1000. Same aslimitingetWords(prefix, limit).maxLength?: numberIf this exists, words longer than this will be excluded.minLength?: numberIf this exists, words shorter than this will be excluded.
Returns
result: SearchResultSame as the output ofgetWords(prefix, limit).
getMoreWords(temporaryInfo, limit=1000)
You can continue searching by calling this with temporaryInfo returned by getWords function.
Parameters
temporaryInfo: TempInfolimit: numberThe maximum number of words in the result.
Returns
result: SearchResultwords: string[]Found words.values: number[]Indices of found words.hasMore: booleanIf true, there are unsearched words.temporaryInfo?: TempInfoThis exists iffhasMoreistrue.
Example
const {ReadonlyTrieTree} = require("trie-louds");
const tree = ReadonlyTrieTree.fromKeywordList(["She", "sells", "seashells", "by", "the", "seashore"]);
const limited = tree.getWords("", 3);
console.log(limited.words); // [ 'She', 'by', 'seashells' ]
console.log(limited.hasMore); // true
console.log(tree.getMoreWords(limited.temporaryInfo).words); // [ 'seashore', 'sells', 'the' ]getMoreWords(temporaryInfo, setting)
You can set more detailed search settings.
Parameters
temporaryInfo: TempInfosetting: OptSearchSetting
Returns
result: SearchResult
countWords(prefix, setting)
Count the words which meets the given setting and prefix.
It will take the same computational cost as getWords function.
If you don't need detailed settings, countWordsFaster is suitable.
Parameters
prefix: stringsetting: OptSearchSettinglimit?: numberThis option doesn't work in this function. It will search all.maxLength?: numberIf this exists, words longer than this will be excluded.minLength?: numberIf this exists, words shorter than this will be excluded.
Returns
count: number
Example
const {ReadonlyTrieTree} = require("trie-louds");
const tree = ReadonlyTrieTree.fromKeywordList(["She", "sells", "seashells", "by", "the", "seashore"]);
console.log(tree.countWords("")); // 6
console.log(tree.countWords("s")); // 3countWordsFaster(prefix)
Counts the words which have the given prefix. This is much faster than countWords().
Parameters
prefix: string
Returns
count: number
dump()
You can dump the tree to Buffer.
Returns
buf: Buffer
Example
const fs = require("fs");
fs.writeFileSync("tree.dat", tree.dump());static load(buffer)
You can load the tree from dumped data.
Parameters
buffer: Buffer
Returns
tree: ReadonlyTrieTree
Example
const loadedTree = ReadonlyTrieTree.load(fs.readFileSync("tree.dat"));
console.log(loadedTree.getWords("sea").words); // [ 'seashells', 'seashore' ]Command
You can dump the tree data by command.
example
- run
trie-dump --input examples/keyword.txt --output examples/trie.dat - then you have
trie.datinexamples/folder. - execute:
const {ReadonlyTrieTree} = require("trie-louds");
const tree = ReadonlyTrieTree.loadFileSync("examples/trie.dat");
console.log(tree.getWords(""));enwiki trie tree
You can create the trie tree of wikipedia-en keywords.
> cat enwiki-20210220-pages-articles-multistream-index.txt | sed -e 's/.*://g' > enwiki-keywords.txt
> trie-dump --input ..\loudstest\enwiki-keywords.txt --output enwiki.datIn this case, we can store 20993072 words in this trie tree and dump it.
The size of enwiki-keywords.txt is about 495MiB and the size of enwiki.dat is about 512MiB.
const {ReadonlyTrieTree} = require("trie-louds");
const {readFileSync} = require("fs");
const tree = ReadonlyTrieTree.load(readFileSync("./enwiki.dat"));
console.log(process.memoryUsage());
console.log(tree.getWords("Undertale"));
--- output ---
{ rss: 682528768,
heapTotal: 10731520,
heapUsed: 5398648,
external: 658583438 }
{ words:
[ 'Undertale',
'Undertale (game)',
'Undertale (video game)',
'Undertale - Hopes and Dreams.ogg',
'Undertale 2',
'Undertale Combat Example.png',
'Undertale Kickstarter Promotional Art.png',
'Undertale character redirects to lists',
'Undertale fandom',
'Undertale soundtrack' ],
values:
[ 19574893,
15834775,
16217668,
15772089,
18495783,
15198496,
19989624,
17975151,
20721313,
18488239 ],
hasMore: false }And it takes about 651MiB when you load this trie tree on memory.