1.0.11 • Published 7 months ago

tynisearch v1.0.11

Weekly downloads
-
License
GPL-3.0-or-later
Repository
github
Last release
7 months ago

About

  • Tiny, and simple search module based on trie and aho-corasick using TypeScript.
  • Also, it supports serialization and de-serialization into string type.

Use case

  • I made this library for searching keywords my users enrolled from a title of a post without search engine library.

For me?

  • Making trie based on keywords of users
  • Save the serialized trie into in memory k-v storage like Redis.
  • Retrieve the serialized trie from the storage and de-serialize it when.
    • User enrolled new keywords
    • User deleted existed keywords
  • Search keywords in a title of a post using de-serialized trie

Installation

npm install tynisearch

Benchmark

TL;DR

result-of-dataset

In table

CalculationElapsed time (ms)
Searching1
Inserting (including building failure links)60
Serialization and de-serialization90

Dataset Description

  • 10,000 words, which consisted of words longer than 5 characters

Test Description

  • You can see the performance test code here

Example

Insert and delete

import { TyniSearch } from 'tynisearch';

const tyniSearch = new TyniSearch();

const words = ["fox", "dog"]
const titleToSearch = "The quick brown fox jumps over the lazy dog";

// insert words as a list
tyniSearch.insert(words);

const result = tyniSearch.searchInSentence(titleToSearch); 
console.log(result); // ["fox", "dog"]

// delete words as a list
tyniSearch.delete(words);

const resultAfterDeletion = tyniSearch.searchInSentence(titleToSearch);
console.log(resultAfterDeletion); // []

Serialization and De-serialization (SerDe)

const tyniSearch = new TyniSearch();

const wordList = ["fox", "dog"]
const titleToSearch = "The quick brown fox jumps over the lazy dog";

// insert words as a list
tyniSearch.insert(words);

// serialize to string
const ser = tyniSearch.serialize();

// de-serialize from serialized string
const de = TyniSearch.deserialize(ser);

// search in de-serialized trie
const result = de.serialize(titleToSearch);
console.log(result); // ["fox", "dog"]

Optional failure link calculation

  • By default, failure links are calculated when inserting, parsing, and deleting.
  • You can disable this behavior by passing false to the second argument() of their methods.
const wordList = ["fox", "dog", "cat", "cow", "doll"];
const anotherWordList = ["elephant", "tiger", "lion", "wolf", "bear"];

const tyniSearch = new TyniSearch();

// If you should insert and delete keywords frequently,
// you can disable failure link calculation to enhance performance.
tyniSearch.insert(keywordList, false);
tynisearch.delete(["fox", "cat"], false);
tynisearch.delete(["dog", "cow"], false);
tyniSearch.insert(anotherWordList, false);

// after all, you should build failure links manually
tyniSearch.buildFailureLinks();
const wordList = ["fox", "dog", "cat", "cow", "doll"];
const tyniSearch = new TyniSearch();

tyniSearch.insert(wordList);

const ser = tyniSearch.serialize();

// If you have to insert deserialize trie frequently,
// also you can disable failure link calculation to enhance performance.
const de = TyniSearch.deserialize(ser, false);

const anotherWordList = ["elephant", "tiger", "lion", "wolf", "bear"];
de.insert(anotherWordList, false);

const secondSer = de.serialize();

// like above, deserialize without buildFailureLinks param automatically builds failure links
const secondDe = TyniSearch.deserialize(secondSer);

Misc.

const tyniSearch = new TyniSearch();
const words = ["fox", "dog"]
tyniSearch.insert(words);

// getting the number of all nodes
tyniSearch.getNumberOfNodes(); // 2

// getting all saved keywords in trie
tyniSearch.getAllKeywords(); // ["fox", "dog"]

Points should be enhanced

Can not save failure links in serialization and de-serialization

Because of the circular reference problem, I can not save failure links in serialization and de-serialization. And I think there must be a better way to solve this problem.

You should build failure links using .buildFailureLinks() again after de-serialization.

Or, you can contribute to this project to solve this problem (I really want it!)

Calculating the failure links is not efficient

I think it is also better way to calculate failure graph when building failure links.

Currently, building failure graph is required after every insertion. In other words, you should call .buildFailureLinks() before run searchInSentence(sentence: string) method after any new words are inserted into.

Or, you can contribute to this project to solve this problem (I really want it!)(2)


Contacts

Email

  • pravda.kracota@gmail.com

github

LinkedIn

1.0.11

7 months ago

1.0.10

7 months ago

1.0.9

7 months ago

1.0.8

7 months ago

1.0.7

7 months ago

1.0.6

7 months ago

1.0.5

7 months ago

1.0.4

7 months ago

1.0.3

7 months ago

1.0.2

7 months ago

1.0.1

7 months ago

1.0.0

7 months ago