0.0.4 • Published 29 days ago

vectorstore v0.0.4

Weekly downloads
-
License
MIT
Repository
github
Last release
29 days ago

Pure JavaScript implementation of a vector store with similarity search. Runs locally, in Node/Bun/Deno or even in your browser. Supports various embedding models. Open-source and free, no-cost.

  • ✅ Search for text similarities, locally, without API key, free of charge
  • ✅ Best in class performance; better than OpenAI (see "The Science" section)
  • ✅ Downloads the model automatically, caches it, executes offline afterwards
  • ✅ Runs Node.js and (soon) in the browser (large download though ~500 MB)
  • ✅ Uses the open-source nomic-embed-text-v1 text embedding model, 8192 token context window
  • ✅ Benchmarked: ~1 GiB memory usage at runtime
  • ✅ Fast! Inference < 0.05 sec. on average (per document)
  • ✅ Available as a simple API
  • ✅ Tree-shakable and side-effect free
  • ✅ Runs on Windows, Mac, Linux, CI tested
  • ✅ First class TypeScript support
  • ✅ Well tested (soon to be... ;-)
npm install
npm run demo

If you came here to understand the math behind the scenes, please head on to: https://towardsdatascience.com/text-embeddings-comprehensive-guide-afd97fce8fb5 where Mariya Mansurova wrote an excellent article on Text Embeddings.

Now let's dive deeper into metrics and open-source models: https://towardsdatascience.com/openai-vs-open-source-multilingual-embedding-models-e5ccb7c90f05

This is why I decided to use nomic-embed-text-v1. (Nomic-Embed): The model was designed by Nomic, and claims better performances than OpenAI Ada-002 and text-embedding-3-small while being only 0.55GB in size. Interestingly, the model is the first to be fully reproducible and auditable (open data and open-source training code).

https://huggingface.co/nomic-ai/nomic-embed-text-v1

  • yarn: yarn add vectorstore
  • npm: npm install vectorstore
import { createDocument, search, type Document } from "vectorstore";

// your text haystack to search for similarities ("database", "store")
const myDocuments = [
  {
    text: "foo",
    metaData: {
      id: 1,
    },
  },
  {
    text: "bar",
    metaData: {
      id: 2,
    },
  },
];

// vectorized documents to search in
const haystack: Array<Document> = [];

// first we need to turn the document text into vector emebeddings
for (const doc of myDocuments) {
  haystack.push(await createDocument(doc.text, doc.metaData));
}

// put the search string here
const needle = await createDocument("bar");

// now we can search for similarities between searchDocument and the haystack
const searchResults = await search(haystack, needle);

// search results come sorted, with a .doc (Document) and a .score
// if you want to keep track of the original text,
// just add the original text to the metaData
console.log(
  searchResults.map((result) => ({
    score: result.score,
    id: result.doc.metadata.id,
  })),
);

/** Prints:
 * [
  { score: 0.9999999999999999, id: 2 }, // "bar"
  { score: 0.3897944998952487, id: 1 }  // "foo"
]
 */

You can run this exact code as a demo when checking out this repository using git clone, run npm i followed by npm run demo

const { createDocument, search } = require('vectorstore')

// same API like ESM variant
0.0.4

29 days ago

0.0.3

29 days ago

0.0.2

29 days ago

0.0.1

29 days ago