synchronous-autocomplete v3.0.0
synchronous-autocomplete
Fast, simple autocompletion. Also supports Levenshtein-based fuzzy search. Uses precomputed indexes to be fast.
Installing
npm install synchronous-autocompleteUsage
Let's build a simple search for our fruit stand. We assign a weight property to each of them because some are bought more often and we want to push their ranking in the search results.
const items = [ {
id: 'apple',
name: 'Juicy sour Apple.',
weight: 3
}, {
id: 'banana',
name: 'Sweet juicy Banana!',
weight: 2
}, {
id: 'pome',
name: 'Sour Pomegranate',
weight: 5
} ]Let's understand the terminology used by this tool:
- item: A thing to search for. In our example, apple, banana and pomegranate each are an item.
- weight: How important an item is.
- token: A word from the fully normalized item name. For example, to find an item named
Hey There!, you may process its name into the tokenshey&there. - fragment: A word from the normalized search query, which may partially match a token. E.g. the fragment
ther(from the search queryHey Ther) partially matches the tokenthere. - relevance: How well an item fits to the search query.
- score: A combination of an item's weight and relevance. Used to rank search results.
In order to be as fast and disk-space-efficient as possible, synchronous-autocomplete requires five indexes to be prebuilt from the list of items. Check the example code for more details on how to build them. For our example, they would look like this:
const tokens = { // internal item IDs, by token
juicy: [0, 1],
sour: [0, 3],
apple: [0],
sweet: [1],
banana: [1],
pomegranate: [3]
}
const weights = [ // item weights, by internal item ID
3, // apple
2, // banana
5 // pome
]
const nrOfTokens = [ // nr of tokens, by internal item ID
3, // apple
3, // banana
2 // pome
]
const scores = { // "uniqueness" of each token, by token
juicy: 2 / 3, // 2 out of 3 items have the token "juicy"
sour: 2 / 3,
apple: 1 / 3,
sweet: 1 / 3,
banana: 1 / 3,
pomegranate: 1 / 3
}
// In order to create smaller search indexes, we use numerical item IDs
// internally and maintain a mapping to their "real"/original IDs.
const originalIds = [
'apple',
'banana',
'pome'
]Next, we must define a function that normalizes search input into a list of fragments. Consider using this simple function:
import normalize from 'normalize-for-search'
const tokenize = (str) => {
return normalize(str).replace(/[^\w\s]/g, '').split(/\s+/g)
}Of course, you don't have to calculate the tokens & scores! Instead, use buildIndex to generate the data:
import {buildIndex} from 'synchronous-autocomplete/build.js'
const index = buildIndex(tokenize, items)Now, we can query our index:
import {createAutocomplete} from 'synchronous-autocomplete'
const autocomplete = createAutocomplete(index, tokenize)
autocomplete('bana')
// [ {
// relevance: 0.6666665555555555,
// score: 0.8399472266053544,
// weight: 2,
// } ]
autocomplete('sour')
// [ {
// id: 'pome',
// relevance: 1.8333335,
// score: 3.134956187236602,
// weight: 5,
// }, {
// id: 'apple',
// relevance: 1.2222223333333333,
// score: 1.762749635070118,
// weight: 3,
// } ]
autocomplete('aplle', 3, true) // note the typo
// [ {
// id: 'apple',
// relevance: 0.22222216666666667,
// score: 0.3204998243877813,
// weight: 3,
// } ]API
const index = buildIndex(tokenize, items)
const {tokens, scores, weights, nrOfTokens, originalIds} = indextokenizemust be a function that, given a search query, returns an array of fragments.itemsmust be an array of objects, each withid,name&weight.
const autocomplete = createAutocomplete(index, tokenize)
autocomplete(query, limit = 6, fuzzy = false, completion = true)tokensmust be an object with an array of internal item IDs per token.scoresmust be an object with a token score per token.weightsmust be an array with an item weight per internal item ID.nrOfTokensmust be an array with the number of tokens per internal item ID.originalIdsmust be an array with the (real) item ID per internal item ID.tokenizeis the same as withbuildIndex().
Storing the index as protocol buffer
Protocol buffers (a.k. protobufs) are a compact binary format for structured data serialization.
import {encodeIndex} from 'synchronous-autocomplete/encode.js'
import {writeFileSync, readFileSync} from 'node:fs'
// encode & write the index
const encoded = encodeIndex(index)
writeFileSync('index.pbf', encoded)
// read & decode the index
const decoded = decode(readFileSync('index.pbf'))Contributing
If you have a question or have difficulties using synchronous-autocomplete, please double-check your code and setup first. If you think you have found a bug or want to propose a feature, refer to the issues page.