@memberjunction/ai-vector-sync v2.48.0
@memberjunction/ai-vector-sync
A robust MemberJunction package for synchronizing entities with vector databases by transforming entity records into vector representations using embedding models.
Overview
The @memberjunction/ai-vector-sync
package provides a comprehensive solution for:
- Converting MemberJunction entities into vector embeddings
- Storing embeddings in vector databases (currently supports Pinecone)
- Managing the synchronization lifecycle between entities and their vector representations
- Supporting batch processing for large datasets
- Providing template-based document generation for vectorization
Installation
npm install @memberjunction/ai-vector-sync
Prerequisites
Before using this package, ensure you have:
SQL Database with MemberJunction Framework
A properly configured SQL database with the MemberJunction framework installed.API Keys
- Embedding model API key (supports OpenAI, Mistral, etc.)
- Vector database API key (currently supports Pinecone)
Entity Configuration
- Entity Document record defined in MemberJunction
- Associated template for specifying which entity properties to vectorize
Core Features
Entity Vectorization
Transform entity records into high-dimensional vectors that capture the semantic meaning of the data.
Batch Processing
Efficiently handle large datasets with configurable batch sizes for:
- Record fetching
- Vectorization
- Database upsertion
Template-Based Processing
Use MemberJunction templates to define which entity fields and relationships to include in vectorization.
Vector Database Integration
Seamlessly integrate with vector databases through the MemberJunction AI infrastructure.
Usage
Basic Entity Vectorization
import { EntityVectorSyncer } from '@memberjunction/ai-vector-sync';
import { UserInfo } from '@memberjunction/core';
// Initialize the syncer
const syncer = new EntityVectorSyncer();
// Configure the syncer (required before first use)
await syncer.Config(false, contextUser);
// Vectorize an entity
const params = {
entityID: 'your-entity-id',
entityDocumentID: 'your-entity-document-id',
listBatchCount: 50, // Optional: records per batch (default: 50)
VectorizeBatchCount: 50, // Optional: vectorization batch size (default: 50)
UpsertBatchCount: 50, // Optional: upsert batch size (default: 50)
StartingOffset: 0 // Optional: skip records for resuming
};
// Start vectorization (runs asynchronously)
syncer.VectorizeEntity(params, contextUser);
Vectorizing a Specific List
// Vectorize only records within a specific list
const params = {
entityID: 'your-entity-id',
entityDocumentID: 'your-entity-document-id',
listID: 'your-list-id', // Only vectorize records in this list
listBatchCount: 100
};
await syncer.VectorizeEntity(params, contextUser);
Working with Entity Documents
// Get entity document by ID
const entityDoc = await syncer.GetEntityDocument('document-id');
// Get entity document by name
const entityDoc = await syncer.GetEntityDocumentByName('Document Name', contextUser);
// Get all active entity documents
const activeDocs = await syncer.GetActiveEntityDocuments();
// Get active documents for specific entities
const specificDocs = await syncer.GetActiveEntityDocuments(['Entity1', 'Entity2']);
Creating Default Entity Documents
import { VectorDatabaseEntity, AIModelEntity } from '@memberjunction/core-entities';
// Create a default entity document when one doesn't exist
const entityDoc = await syncer.CreateDefaultEntityDocument(
entityID,
vectorDatabase, // VectorDatabaseEntity instance
aiModel // AIModelEntity instance
);
API Reference
EntityVectorSyncer
The main class for entity vectorization operations.
Methods
Config(forceRefresh: boolean, contextUser?: UserInfo): Promise<void>
Configures the syncer and initializes required engines.
forceRefresh
: Force refresh of caches and enginescontextUser
: User context for operations
VectorizeEntity(params: VectorizeEntityParams, contextUser?: UserInfo): Promise<VectorizeEntityResponse>
Vectorizes entities based on provided parameters.
params
: Configuration for vectorizationcontextUser
: Required user context
GetEntityDocument(entityDocumentID: string): Promise<EntityDocumentEntity | null>
Retrieves an entity document by ID.
GetEntityDocumentByName(entityDocumentName: string, contextUser?: UserInfo): Promise<EntityDocumentEntity | null>
Retrieves an entity document by name.
GetActiveEntityDocuments(entityNames?: string[]): Promise<EntityDocumentEntity[]>
Gets all active entity documents, optionally filtered by entity names.
CreateDefaultEntityDocument(entityID: string, vectorDatabase: VectorDatabaseEntity, aiModel: AIModelEntity): Promise<EntityDocumentEntity>
Creates a default entity document for the specified entity.
Types
VectorizeEntityParams
type VectorizeEntityParams = {
entityID: string; // Required: Entity to vectorize
entityDocumentID?: string; // Entity document configuration
listID?: string; // Optional: Specific list to vectorize
listBatchCount?: number; // Records per fetch batch (default: 50)
VectorizeBatchCount?: number; // Vectorization batch size (default: 50)
UpsertBatchCount?: number; // Database upsert batch size (default: 50)
StartingOffset?: number; // Skip records for resuming
CurrentUser?: UserInfo; // User context
options?: any; // Additional options
}
EntitySyncConfig
type EntitySyncConfig = {
EntityDocumentID: string; // Entity document to use
Interval: number; // Sync interval in seconds
RunViewParams: RunViewParams; // View parameters for fetching records
IncludeInSync: boolean; // Include in sync process
LastRunDate: string; // Last sync timestamp
VectorIndexID: number; // Vector index ID
VectorID: number; // Vector database ID
}
Architecture
Process Flow
- Entity Document Retrieval: Fetches configuration from Entity Document record
- Model and Database Configuration: Sets up embedding model and vector database
- Data Fetching: Retrieves entity records in batches
- Vectorization: Transforms records using embedding model
- Vector Upsertion: Stores vectors in database
- EntityRecordDocument Creation: Creates tracking records
Worker Architecture
The package uses a multi-worker architecture for efficient processing:
- VectorizeTemplates Worker: Handles template-based text generation and embedding
- UpsertVectors Worker: Manages vector database operations
- EntityRecordDocument Worker: Tracks vector-entity relationships
Configuration
Environment Variables
Create a .env
file with:
# Database Configuration
DB_HOST=your-database-host
DB_PORT=1433
DB_USERNAME=your-username
DB_PASSWORD=your-password
DB_DATABASE=your-database
# API Keys
OPENAI_API_KEY=your-openai-key
MISTRAL_API_KEY=your-mistral-key
PINECONE_API_KEY=your-pinecone-key
PINECONE_HOST=your-pinecone-host
PINECONE_DEFAULT_INDEX=your-default-index
# User Configuration
CURRENT_USER_EMAIL=user@example.com
Performance Considerations
- Long-Running Processes: Vectorization can take hours for large datasets
- Batch Sizes: Adjust batch sizes based on your system resources
- Asynchronous Processing: Consider running vectorization in background processes
- Memory Usage: Monitor memory usage for large batch sizes
Integration with MemberJunction
This package integrates seamlessly with:
@memberjunction/core
: Core entity and metadata functionality@memberjunction/ai
: AI model abstractions@memberjunction/ai-vectordb
: Vector database abstractions@memberjunction/templates
: Template processing engine
Error Handling
The package includes comprehensive error handling:
- Validation of entity documents and templates
- Graceful handling of API failures
- Detailed logging through MemberJunction's logging system
Best Practices
- Start with Small Batches: Test with small batch sizes before processing large datasets
- Monitor Progress: Use MemberJunction's logging to track vectorization progress
- Handle Interruptions: Use
StartingOffset
to resume interrupted processes - Template Design: Design templates to include relevant fields for semantic search
- Resource Management: Consider database and API rate limits when setting batch sizes
License
ISC - See LICENSE file for details
Author
MemberJunction.com
8 months ago
8 months ago
5 months ago
8 months ago
8 months ago
6 months ago
7 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
6 months ago
9 months ago
6 months ago
9 months ago
5 months ago
5 months ago
9 months ago
9 months ago
5 months ago
9 months ago
8 months ago
8 months ago
6 months ago
9 months ago
9 months ago
9 months ago
9 months ago
6 months ago
6 months ago
10 months ago
9 months ago
5 months ago
5 months ago
8 months ago
8 months ago
8 months ago
8 months ago
4 months ago
7 months ago
7 months ago
7 months ago
9 months ago
10 months ago
6 months ago
11 months ago
10 months ago
11 months ago
6 months ago
11 months ago
5 months ago
9 months ago
9 months ago
9 months ago
9 months ago
8 months ago
5 months ago
8 months ago
8 months ago
7 months ago
12 months ago
5 months ago
9 months ago
6 months ago
6 months ago
9 months ago
5 months ago
5 months ago
8 months ago
12 months ago
12 months ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago