5.44.0 • Published yesterday

@memberjunction/ai-vectors-memory

Licence

ISC

Version

5.44.0

Deps

Size

207 kB

Vulns

Weekly

Stars

Summary Dependency Versions

@memberjunction/ai-vectors-memory

An in-memory vector similarity search and clustering service for MemberJunction. Provides six distance metrics, two clustering algorithms (K-Means and DBSCAN), and comprehensive utility methods for vector analysis -- all without requiring an external vector database.

Architecture

graph TD
    subgraph MemoryPkg["@memberjunction/ai-vectors-memory"]
        SVS["SimpleVectorService&lt;TMetadata&gt;"]

        subgraph Search["Similarity Search"]
            FN["FindNearest"]
            FS["FindSimilar"]
            FAT["FindAboveThreshold"]
        end

        subgraph Metrics["Distance Metrics"]
            COS["Cosine"]
            EUC["Euclidean"]
            MAN["Manhattan"]
            DOT["Dot Product"]
            JAC["Jaccard"]
            HAM["Hamming"]
        end

        subgraph Clustering["Clustering"]
            KM["K-Means (K-Means++)"]
            DBS["DBSCAN"]
            EM["Elbow Method"]
        end

        subgraph Evaluation["Evaluation"]
            SIL["Silhouette Score"]
            WCD["Within-Cluster Distance"]
            BCD["Between-Cluster Distance"]
            CENT["Find Centroid"]
        end
    end

    FN --> Metrics
    FS --> FN
    FAT --> FN
    KM --> Metrics
    DBS --> FN

    style MemoryPkg fill:#2d6a9f,stroke:#1a4971,color:#fff
    style Search fill:#2d8659,stroke:#1a5c3a,color:#fff
    style Metrics fill:#b8762f,stroke:#8a5722,color:#fff
    style Clustering fill:#7c5295,stroke:#563a6b,color:#fff
    style Evaluation fill:#2d8659,stroke:#1a5c3a,color:#fff

Installation

npm install @memberjunction/ai-vectors-memory

Overview

Unlike the other vector packages that depend on external vector databases (Pinecone, etc.), this package operates entirely in-memory. It is ideal for:

Lightweight similarity search without infrastructure overhead
AI agent note retrieval and session memory
Clustering analysis and data exploration
Prototyping and testing before deploying to a full vector database
Scenarios where the vector count fits comfortably in memory (tens of thousands)

The SimpleVectorService class is generic (SimpleVectorService<TMetadata>) for type-safe metadata access.

Quick Start

import { SimpleVectorService, VectorEntry } from '@memberjunction/ai-vectors-memory';

const service = new SimpleVectorService();

// Load vectors
service.LoadVectors([
    { key: 'doc1', vector: [0.1, 0.2, 0.3], metadata: { title: 'Document 1' } },
    { key: 'doc2', vector: [0.4, 0.5, 0.6], metadata: { title: 'Document 2' } },
    { key: 'doc3', vector: [0.7, 0.8, 0.9], metadata: { title: 'Document 3' } }
]);

// Find nearest neighbors
const results = service.FindNearest([0.15, 0.25, 0.35], 2);
results.forEach(r => console.log(`${r.key}: ${r.score.toFixed(3)}`));

Core Types

classDiagram
    class SimpleVectorService~TMetadata~ {
        +LoadVectors(entries) void
        +AddVector(key, vector, metadata?) void
        +AddOrUpdateVector(key, vector, metadata?) boolean
        +UpdateVector(key, updates) boolean
        +FindNearest(query, topK, threshold?, metric?, filter?) VectorSearchResult[]
        +FindSimilar(key, topK, threshold?, metric?, filter?) VectorSearchResult[]
        +FindAboveThreshold(query, threshold, metric?, filter?) VectorSearchResult[]
        +Similarity(key1, key2) number
        +CalculateDistance(a, b, metric?) number
        +KMeansCluster(k, maxIter?, metric?, tolerance?) ClusterResult
        +DBSCANCluster(epsilon, minPoints, metric?, filter?) ClusterResult
        +ElbowMethod(minK, maxK, metric?) Map
        +SilhouetteScore(result, metric?) number
        +WithinClusterDistance(result, metric?) number
        +BetweenClusterDistance(result, metric?) number
        +FindCentroid(vectors) number[]
        +Size : number
        +ExpectedDimensions : number
        +GetVector(key) number[]
        +GetMetadata(key) TMetadata
        +RemoveVector(key) boolean
        +ExportVectors() VectorEntry[]
        +Clear() void
        +Has(key) boolean
        +GetAllKeys() string[]
    }

    class VectorEntry~TMetadata~ {
        +key : string
        +vector : number[]
        +metadata? : TMetadata
    }

    class VectorSearchResult~TMetadata~ {
        +key : string
        +score : number
        +metadata? : TMetadata
    }

    class ClusterResult~TMetadata~ {
        +clusters : Map~number, string[]~
        +centroids? : Map~number, number[]~
        +outliers? : string[]
        +metadata? : ClusterMetadata
    }

    SimpleVectorService --> VectorEntry : stores
    SimpleVectorService --> VectorSearchResult : returns
    SimpleVectorService --> ClusterResult : returns

    style SimpleVectorService fill:#2d6a9f,stroke:#1a4971,color:#fff
    style VectorEntry fill:#2d8659,stroke:#1a5c3a,color:#fff
    style VectorSearchResult fill:#2d8659,stroke:#1a5c3a,color:#fff
    style ClusterResult fill:#7c5295,stroke:#563a6b,color:#fff

DistanceMetric Type

type DistanceMetric = 'cosine' | 'euclidean' | 'manhattan' | 'dotproduct' | 'jaccard' | 'hamming';

Distance Metrics

All metrics are normalized to a 0-1 range where 1 = most similar.

Metric	Best For	Formula
`cosine` (default)	Text embeddings, semantic search	`(dot(A,B) / (norm(A) * norm(B)) + 1) / 2`
`euclidean`	Physical measurements, specs	`1 / (1 + sqrt(sum((a-b)^2)))`
`manhattan`	Grid navigation, time series	`1 / (1 + sum(abs(a-b)))`
`dotproduct`	Recommendations, weighted scoring	`(tanh(dot(A,B) / sqrt(n)) + 1) / 2`
`jaccard`	Categorical/binary data, set comparison	`intersection / union`
`hamming`	Configuration drift, error detection	`1 - (differences / length)`

Similarity Search

FindNearest

K-nearest neighbor search with optional threshold and metadata pre-filtering.

const results = service.FindNearest(
    queryVector,    // vector to search for
    10,             // topK results
    0.7,            // minimum similarity threshold
    'cosine',       // distance metric
    (meta) => meta.status === 'active'  // pre-filter by metadata
);

Pre-filtering happens before similarity calculation, making filtered searches significantly faster than post-filtering.

FindSimilar

Find vectors similar to an existing stored vector (excludes the source vector from results).

const similar = service.FindSimilar('doc-123', 5, 0.8, 'cosine');

FindAboveThreshold

Return all vectors above a similarity threshold (no topK limit).

const matches = service.FindAboveThreshold(queryVector, 0.9, 'cosine');

Vector Management

// Add individual vectors
service.AddVector('key1', [0.1, 0.2, 0.3], { category: 'A' });

// Add or update (upsert)
const wasUpdate = service.AddOrUpdateVector('key1', [0.4, 0.5, 0.6]);

// Update in place (vector, metadata, or both)
service.UpdateVector('key1', { metadata: { category: 'B' } });

// Remove
service.RemoveVector('key1');

// Bulk load from array or Map
service.LoadVectors(new Map([['k1', [1, 2, 3]], ['k2', [4, 5, 6]]]));

// Export for persistence
const allVectors = service.ExportVectors();

Dimension validation is automatic -- all vectors must have the same dimensionality.

Clustering Algorithms

K-Means (with K-Means++ Initialization)

Partitions vectors into K clusters by minimizing within-cluster variance.

const result = service.KMeansCluster(3, 100, 'euclidean', 0.0001);

result.clusters.forEach((members, clusterId) => {
    const centroid = result.centroids.get(clusterId);
    console.log(`Cluster ${clusterId}: ${members.length} members`);
});

console.log(`Silhouette: ${result.metadata.silhouetteScore.toFixed(3)}`);
console.log(`Converged in ${result.metadata.iterations} iterations`);

DBSCAN

Density-based clustering that automatically determines the number of clusters and identifies outliers.

const result = service.DBSCANCluster(
    0.3,            // epsilon (max distance for neighbors)
    3,              // minPoints (minimum cluster density)
    'euclidean',    // metric
    (meta) => meta.active  // optional pre-filter
);

console.log(`Found ${result.clusters.size} clusters`);
console.log(`Outliers: ${result.outliers?.length ?? 0}`);

Elbow Method

Find the optimal number of clusters by testing a range of K values.

const elbowData = service.ElbowMethod(2, 10, 'euclidean');
elbowData.forEach((inertia, k) => {
    console.log(`k=${k}: inertia=${inertia.toFixed(2)}`);
});

Clustering Evaluation

graph LR
    CR["ClusterResult"] --> SIL["SilhouetteScore<br/>-1 to 1<br/>(higher = better)"]
    CR --> WCD["WithinClusterDistance<br/>0 to 1<br/>(lower = tighter)"]
    CR --> BCD["BetweenClusterDistance<br/>0 to 1<br/>(higher = more separated)"]
    CR --> CENT["FindCentroid<br/>mean vector"]

    style CR fill:#2d6a9f,stroke:#1a4971,color:#fff
    style SIL fill:#2d8659,stroke:#1a5c3a,color:#fff
    style WCD fill:#b8762f,stroke:#8a5722,color:#fff
    style BCD fill:#b8762f,stroke:#8a5722,color:#fff
    style CENT fill:#7c5295,stroke:#563a6b,color:#fff

Method	Returns	Interpretation
`SilhouetteScore`	-1 to 1	> 0.7 strong, 0.5-0.7 reasonable, < 0.25 no structure
`WithinClusterDistance`	0 to 1	Lower = tighter clusters (more cohesive)
`BetweenClusterDistance`	0 to 1	Higher = better separated clusters
`FindCentroid`	number[]	Mean position of a vector set

Typed Metadata

Use TypeScript generics for type-safe metadata access:

interface ProductMetadata {
    name: string;
    category: string;
    price: number;
}

const service = new SimpleVectorService<ProductMetadata>();

service.AddVector('prod1', embedding, { name: 'Widget', category: 'Tools', price: 29.99 });

const results = service.FindNearest(queryVector, 5);
results.forEach(r => {
    // TypeScript knows r.metadata is ProductMetadata
    console.log(`${r.metadata.name}: $${r.metadata.price}`);
});

Performance Characteristics

Operation	Complexity	Notes
AddVector / LoadVectors	O(1) per vector	Map-based storage
FindNearest (no filter)	O(n)	Linear scan with sort
FindNearest (with filter)	O(m) where m < n	Filter reduces candidate set
KMeansCluster	O(n * k * iterations)	K-Means++ initialization
DBSCANCluster	O(n^2)	Neighborhood pre-computation

Memory usage: approximately 8 bytes * dimensions + ~100 bytes per vector. Example: 10,000 vectors at 384 dimensions is roughly 31 MB.

VectorDBBase Providers

This package ships two VectorDBBase driver implementations so the in-memory primitive can be consumed by the broader vector-sync / EntityDocument infrastructure without standing up a remote store:

`SimpleVectorDatabase`

In-process VectorDBBase driver that reads from an MJ: Vector Indexes row configured to point at any entity and field. Use when you have arbitrary entity rows with embeddings stored in a column and want to make them queryable through the SearchEngine cross-scope fusion path.

`SimpleVectorServiceProvider` (new in v5.38)

EntityDocument-keyed in-process driver, purpose-built for Provider.SearchEntities() and any other EntityDocument-backed search. Each "index" corresponds to one MJ: Entity Documents row; vectors come from MJ: Entity Record Documents.VectorJSON filtered by EntityDocumentID, and matches surface the underlying entity record's RecordID in their metadata (not the EntityRecordDocument PK).

import { SimpleVectorServiceProvider } from '@memberjunction/ai-vectors-memory';

const provider = new SimpleVectorServiceProvider();
const result = await provider.QueryIndex(
    { id: entityDocumentId, vector: queryEmbedding, topK: 10 },
    contextUser
);
// result.data.matches[i].metadata.RecordID is the parent record's ID

Lazy cache: Map<EntityDocumentID, LoadedIndex> with TTL eviction (default 15 minutes). After the vector-sync pipeline writes back fresh embeddings, call SimpleVectorServiceProvider.InvalidateIndex(entityDocumentId) for deterministic cache refresh; TTL is the safety net.

Read-only: ingestion methods (CreateRecord, UpdateRecord, etc.) throw via the unsupported() path. The vector-sync pipeline writes EntityRecordDocument.VectorJSON directly; this driver just rehydrates from those rows.

When NOT to use: > a few thousand EntityRecordDocument rows per EntityDocument, multi-process deployments, scenarios that need a real ANN index (HNSW / IVF). For those, configure a remote provider (Pinecone, Qdrant, pgvector) on the EntityDocument's VectorDatabaseID instead.

Dependencies

Package	Purpose
`@memberjunction/core`	`LogError`, `RunView`, `UserInfo`
`@memberjunction/global`	`RegisterClass` for VectorDBBase registrations
`@memberjunction/ai-vectordb`	`VectorDBBase` contract that the two providers implement

This package has minimal dependencies, making it lightweight and suitable for both server-side and client-side use.

Development

# Build
npm run build

# Development mode
npm run start

License

ISC