0.0.8 β€’ Published 7 months ago

@cmmv/ai v0.0.8

Weekly downloads
-
License
MIT
Repository
-
Last release
7 months ago

@cmmv/ai is a module for integrating LLMs (Large Language Models) with tokenization, dataset creation for RAG (Retrieval-Augmented Generation), and FAISS-based vector search. It allows efficient code indexing and semantic search for models like CodeLlama and DeepSeek Code.

πŸš€ Features

βœ… Tokenization & Code Mapping – Extracts structured tokens from TypeScript/JavaScript files.
βœ… RAG Dataset Creation – Generates binary datasets for vector search.
βœ… Vector Search with FAISS & Vector Databases – Supports Qdrant, Milvus, Neo4j.
βœ… Hugging Face Integration – Uses transformers for embeddings.
βœ… Custom Embedding Models – Supports WhereIsAI/UAE-Large-V1, MiniLM, CodeLlama, DeepSeek, and others.
βœ… Database Integration – Supports Elasticsearch, Pinecone, Qdrant, PGVector, and others.
βœ… LLM Integration - Supports OpenAI, Hugging Face, Ollama, DeepSeek, Groq, Gemini, and others.

βš™ Configuration

The module is configured via a .cmmv.config.cjs file (or equivalent environment variables).

require('dotenv').config();

module.exports = {
    env: process.env.NODE_ENV,

    ai: {
        huggingface: {
            token: process.env.HUGGINGFACE_HUB_TOKEN,
            localModelPath: './models',
            allowRemoteModels: true
        },
        tokenizer: {
            provider: "huggingface",
            model: "sentence-transformers/distilbert-base-nli-mean-tokens",
            indexSize: 768,
            useKeyBERT: false,
            chunkSize: 1000,
            chunkOverlap: 0,
            patterns: [
                //'../cmmv/**/*.ts',
                //'../cmmv/src/**/*.ts',
                //'../cmmv/packages/**/*.ts',
                //'../cmmv-*/**/*.ts',
                //'../cmmv-*/src/*.ts',
                //'../cmmv-*/src/**/*.ts',
                //'../cmmv-*/packages/**/*.ts',
                '../cmmv-*/**/*.md',
                '../cmmv-docs/docs/en/**/*.md'
            ],
            output: "./samples/data.bin",
            ignore: [
                "node_modules", "*.d.ts", "*.cjs",
                "*.spec.ts", "*.test.ts", "/tools/gulp/"
            ],
            exclude: [
                "cmmv-formbuilder", "cmmv-ui",
                "cmmv-language-tools", "cmmv-vue",
                "cmmv-reactivity", "cmmv-vite-plugin",
                "eslint.config.ts", "vitest.config.ts",
                "auto-imports.d.ts", ".d.ts", ".cjs",
                ".spec.ts", ".test.ts", "/tools/gulp/",
                "node_modules"
            ]
        },
        vector: {
            provider: "neo4j",
            qdrant: {
                url: 'http://localhost:6333',
                collection: 'embeddings'
            },
            neo4j: {
                url: "bolt://localhost:7687",
                username: process.env.NEO4J_USERNAME,
                password: process.env.NEO4J_PASSWORD,
                indexName: "vector",
                keywordIndexName: "keyword",
                nodeLabel: "Chunk",
                embeddingNodeProperty: "embedding"
            }
        },
        llm: {
            provider: "google",
            embeddingTopk: 10,
            model: "gemini-1.5-pro",
            textMaxTokens: 2048,
            apiKey: process.env.GOOGLE_API_KEY,
            language: 'pt-br'
        }
    }
};
PathDescriptionDefault Value / Example
ai.huggingface.tokenAPI token for Hugging Face Hubprocess.env.HUGGINGFACE_HUB_TOKEN
ai.huggingface.localModelPathPath for local models./models
ai.huggingface.allowRemoteModelsAllow downloading models from Hugging Face Hubtrue
ai.tokenizer.providerTokenizer provider"huggingface"
ai.tokenizer.modelTokenizer model"sentence-transformers/distilbert-base-nli-mean-tokens"
ai.tokenizer.indexSizeToken embedding index size768
ai.tokenizer.useKeyBERTEnable KeyBERT for keyword extractionfalse
ai.tokenizer.chunkSizeSize of text chunks for processing1000
ai.tokenizer.chunkOverlapOverlap size between text chunks0
ai.tokenizer.patternsFile patterns to scan for tokenization['../cmmv-*/**/*.md', '../cmmv-docs/docs/en/**/*.md']
ai.tokenizer.outputOutput file for tokenized data"./samples/data.bin"
ai.tokenizer.ignoreFile patterns to ignore["node_modules", "*.d.ts", "*.cjs", "*.spec.ts", "*.test.ts", "/tools/gulp/"]
ai.tokenizer.excludeFiles and directories to exclude["cmmv-formbuilder", "cmmv-ui", "cmmv-language-tools", "cmmv-vue", "cmmv-reactivity", "cmmv-vite-plugin", "eslint.config.ts", "vitest.config.ts", "auto-imports.d.ts", ".d.ts", ".cjs", ".spec.ts", ".test.ts", "/tools/gulp/", "node_modules"]
ai.vector.providerProvider for vector storage"neo4j"
ai.vector.qdrant.urlQdrant service URL"http://localhost:6333"
ai.vector.qdrant.collectionCollection name for Qdrant"embeddings"
ai.vector.neo4j.urlNeo4j database URL"bolt://localhost:7687"
ai.vector.neo4j.usernameNeo4j usernameprocess.env.NEO4J_USERNAME
ai.vector.neo4j.passwordNeo4j passwordprocess.env.NEO4J_PASSWORD
ai.vector.neo4j.indexNameIndex name for vector storage"vector"
ai.vector.neo4j.keywordIndexNameIndex name for keyword search"keyword"
ai.vector.neo4j.nodeLabelLabel for vectorized nodes"Chunk"
ai.vector.neo4j.embeddingNodePropertyProperty storing vector embeddings"embedding"
ai.llm.providerLLM provider"google"
ai.llm.embeddingTopkNumber of top-k results for embeddings10
ai.llm.modelLLM model name"gemini-1.5-pro"
ai.llm.textMaxTokensMaximum tokens per request2048
ai.llm.apiKeyAPI key for the LLM providerprocess.env.GOOGLE_API_KEY
ai.llm.languageDefault language"pt-br"

Download Models

1️⃣ Install Python

Before installing the Hugging Face CLI, ensure that Python is installed on your system.

Run the following command to install Python on Ubuntu:

sudo apt update && sudo apt install python3 python3-pip -y

For other operating systems, refer to the official Python download page.

2️⃣ Install Hugging Face CLI

Once Python is installed, install the Hugging Face CLI using pip:

pip3 install -U "huggingface_hub[cli]"

3️⃣ Ensure the CLI is Recognized

If your terminal does not recognize huggingface-cli, add ~/.local/bin to your system PATH:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Run the following command to verify installation:

huggingface-cli --help

If the command works, the installation was successful! πŸŽ‰

4️⃣ Authenticate with Hugging Face

To access and download models, you need to authenticate.

Run:

huggingface-cli login

You will be prompted to enter your Hugging Face access token.
Generate one at: Hugging Face Tokens
Ensure the token has READ permissions.

πŸ“₯ Downloading Models

To download a model, use the following command:

huggingface-cli download meta-llama/CodeLlama-7B-Python-hf --local-dir ./models/CodeLlama-7B

This will download the CodeLlama 7B Python model into the ./models/CodeLlama-7B directory.

For CMMV, set the model path in .cmmv.config.cjs:

huggingface: {
    token: process.env.HUGGINGFACE_HUB_TOKEN,
    localModelPath: './models',
    allowRemoteModels: false
},
tokenizer: {
    provider: "huggingface",
    model: "sentence-transformers/distilbert-base-nli-mean-tokens",
    indexSize: 768,
    chunkSize: 1000,
    chunkOverlap: 0,
},
llm: {
    provider: "google",
    embeddingTopk: 10,
    model: "gemini-1.5-pro",
    textMaxTokens: 2048,
    apiKey: process.env.GOOGLE_API_KEY,
    language: 'pt-br'
}

Now your environment is set up to use Hugging Face models with CMMV! πŸš€

πŸ”„ Converting Models

Some LLMs (Large Language Models) are not natively compatible with all inference frameworks. A key example is Google’s Gemma, which is not directly supported by many tools. To use such models efficiently, you need to convert them to ONNX format.

ONNX (Open Neural Network Exchange) is an open format that optimizes models for efficient inference across multiple platforms. Many inference frameworks, such as ONNX Runtime, TensorRT, and OpenVINO, support ONNX for faster and more scalable deployment.

Before converting, install the necessary packages:

pip install -U "optimum[exporters]" onnx onnxruntime

To convert Google's Gemma 2B model, run:

python3 -m optimum.exporters.onnx --model google/gemma-2b ./models/gemma-2b-onnx

Common Embedding Models

EmbeddingDefault ModelRequires API Key
Bedrockamazon.titan-embed-text-v1Yes
Cohereembed-english-v3.0No
DeepInfra-Yes
Doubao-Yes
Fireworksnomic-ai/nomic-embed-text-v1.5Yes
HuggingFaceXenova/all-MiniLM-L6-v2No
LlamaCpp- (requires local model file)No
OpenAItext-embedding-3-largeYes
Pineconemultilingual-e5-largeNo
Tongyi-Yes
Watsonx-Yes
Jinajina-clip-v2Yes
MiniMaxembo-01No
Premai-No
Hunyuan-Yes
TensorFlow-No
TogetherAItogethercomputer/m2-bert-80M-8k-retrievalYes
Voyagevoyage-01Yes
ZhipuAIembedding-2Yes

https://huggingface.co/models?pipeline_tag=feature-extraction&library=transformers.js&sort=downloads https://v03.api.js.langchain.com/index.html

🧠 Tokenization - Extracting Code for RAG

The Tokenizer class scans directories, extracts tokens, and generates vector embeddings using a transformers model.

πŸ“Œ Example Usage:

import { Application, Hook, HooksType } from '@cmmv/core';

class TokenizerSample {
    @Hook(HooksType.onInitialize)
    async start() {
        const { Tokenizer } = await import('@cmmv/ai');
        const tokenizer = new Tokenizer();
        tokenizer.start();
    }
}

Application.exec({
    services: [TokenizerSample],
});

πŸ”Ή How It Works

  1. Scans project directories based on the patterns config.
  2. Parses TypeScript/JavaScript/Markdown files, extracting functions, classes, enums, interfaces, constants, and decorators.
  3. Generates embeddings using Hugging Face models.
  4. Stores the dataset in a binary .bin file.

πŸ” Using KeyBERT

KeyBERT is an optional feature that enhances indexing by extracting relevant keywords. It helps refine search results in FAISS or vector databases, improving the accuracy of LLM queries.

Unlike TF-IDF, YAKE!, or RAKE, which rely on statistical methods, KeyBERT leverages BERT embeddings to generate more meaningful keywords. This results in better search filtering, leading to more precise LLM-based responses.

If KeyBERT is not enabled, the default keyword extraction method will be TF-IDF, which may not be as accurate but is significantly faster.

Before using KeyBERT, ensure you have Python 3 installed. Then, install KeyBERT using pip:

pip install keybert

Once installed, KeyBERT will be used during tokenization to generate filtering keywords. These keywords improve the ranking of indexed content, making vector-based search results more relevant.

If you prefer faster processing, you can disable KeyBERT, and the system will fall back to TF-IDF.

To enable KeyBERT, update your .cmmv.config.cjs file:

module.exports = {
    ai: {
        tokenizer: {
            useKeyBERT: true // Set to false to use TF-IDF instead
        }
    }
};

With KeyBERT enabled, search filtering becomes more context-aware, leading to more accurate LLM responses.

For more details on KeyBERT, visit: KeyBERT Documentation.

πŸ“‚ Dataset - FAISS & Vector Storage

The Dataset class manages vectorized storage for quick retrieval.

πŸ”Ή Current Functionality

βœ… Saves embeddings in binary format (.bin).
βœ… In-memory FAISS-based search.
βœ… Support for Neo4j, Elasticsearch, PgVector, Qdrant.

πŸ“Œ Dataset Storage Example

const dataset = new Dataset();
dataset.save(); // Saves the dataset in binary format
dataset.load(); // Loads the dataset into memory

🧠 Vector Database Integration

To efficiently store and search embeddings @cmmv/ai.

πŸ”Ή Supported Vector Databases

DatabaseOpen SourceNode.js SupportStorage BackendSimilarity Search
Qdrantβœ… Yesβœ… Yes (@qdrant/js-client-rest)Disk/MemoryCosine, Euclidean, Dot Product
Milvusβœ… Yesβœ… Yes (@zilliz/milvus2-sdk-node)Disk/MemoryIVF_FLAT, HNSW, PQ
Neo4jβœ… Yes (Community)βœ… Yes (neo4j-driver)GraphDBCypher-based vector search
Elasticsearchβœ… Yesβœ… Yes (@elastic/elasticsearch)Diskk-NN, Approximate Nearest Neighbors (ANN)
PGVectorβœ… Yesβœ… Yes (pg)PostgreSQLCosine, Euclidean, Inner Product

To run these databases locally, use the following Docker commands:

πŸ”Ή Qdrant

docker run -p 6333:6333 --name qdrant-server qdrant/qdrant
  • Runs a Qdrant server on port 6333.
  • API available at http://localhost:6333.

πŸ”Ή Milvus

docker run -p 19530:19530 --name milvus-server milvusdb/milvus
  • Runs Milvus on port 19530.
  • Requires Python/Node SDK for interaction.

πŸ”Ή Neo4j

docker run --publish=7474:7474 --publish=7687:7687 --volume=$HOME/neo4j/data:/data --name neo4j-server neo4j
  • Runs Neo4j on ports 7474 (HTTP) and 7687 (Bolt).
  • Data is stored persistently in $HOME/neo4j/data.

πŸ”Ή PGVector

docker run --name pgvector-db -e POSTGRES_USER=admin -e POSTGRES_PASSWORD=admin -e POSTGRES_DB=vector_db -p 5432:5432 -d ankane/pgvector
  • Runs PostgreSQL with PGVector on port 5432.
  • Default database is vector_db with user admin and password admin.

πŸ”Ή Elasticsearch

docker run -d --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:8.5.1
  • Runs Elasticsearch on port 9200.
  • Single-node mode is enabled for local use.

πŸ€– LLMs (Large Language Models)

The @cmmv/ai module includes support for multiple LLMs (Large Language Models), allowing flexible integration with different providers. Currently, the following models are supported:

  • βœ… DeepSeek – Optimized for programming and technical research.
  • βœ… Gemini (Google) – A multimodal LLM with advanced reasoning capabilities.
  • βœ… Hugging Face – Compatible with open-source models such as CodeLlama, MiniLM, DeepSeek, and more.
  • βœ… OpenAI (ChatGPT) – Integration with models like GPT-4 and GPT-3.5.
  • βœ… Ollama (Facebook) – Local model execution for privacy-focused applications.
  • βœ… Groq (X) – High-speed inference with LLama-3, Mixtral, and Gemma models.
LLM ProviderDefault ModelRequires API Key
AI21 Labsj1-jumbo, j1-largeYes
Aleph Alphaluminous-base, luminous-extendedYes
Anthropicclaude-3-haiku-20240307Yes
AWS BedrockVarious models (Claude, Mistral, etc.)Yes
Coherecommand-xlarge-nightly, command-mediumYes
DeepInfraVarious modelsYes
DeepSeekdeepseek-ai/deepseek-coder-7bNo
FireworksVarious modelsYes
Google Geminigemini-1.5-proYes
Google Vertex AItext-bison@001Yes
Groqllama3-8b, mixtralYes
Hugging Facecode-llama, MiniLM, etc.No
Mistral AImistral-7b, mixtralYes
Ollamallama3, mistral, gemmaNo (local execution)
OpenAIgpt-4, gpt-3.5Yes
Together AIGPT-JT-6B-v1Yes
Vertex AItext-bison@001Yes

The search interface is accessible via the Search class, which performs semantic search using embeddings and generates context-aware responses.

https://v03.api.js.langchain.com/index.html

LLM Configuration

The LLM (Large Language Model) configuration is set within the .cmmv.config.cjs file. This section controls which LLM provider is used, the model parameters, and API credentials.

module.exports = {
    ai: {
        llm: {
            provider: "google",  // Options: "openai", "deepseek", "huggingface", "gemini", "ollama", "groq"
            model: "gemini-1.5-pro", // Default model for the selected provider
            embeddingTopk: 10, // Number of top-k results used for context retrieval
            textMaxTokens: 2048, // Maximum tokens per response
            apiKey: process.env.GOOGLE_API_KEY, // API key for the selected provider (if required)
            language: 'pt-br' // Default response language
        }
    }
}
PathDescriptionDefault Value / Example
llm.providerLLM provider to use"google" ("openai", "ollama", "huggingface", "groq")
llm.modelLLM model used for responses"gemini-1.5-pro" ("gpt-4", "deepseek-coder-7b")
llm.embeddingTopkNumber of relevant embeddings to retrieve10
llm.textMaxTokensMaximum tokens per request2048
llm.apiKeyAPI key for accessing the LLM providerprocess.env.GOOGLE_API_KEY (if required)
llm.languageDefault language for responses"pt-br" ("en", "es", etc.)

Integration with Search

The Search class enables queries in a vector database and returns LLM-based responses with contextual information.

import { Application, Hook, HooksType } from '@cmmv/core';

import {
    PromptTemplate,
    RunnableSequence,
    RunnablePassthrough,
    StringOutputParser,
    Embedding,
    Dataset,
    Search,
} from '@cmmv/ai';

class SearchSample {
    @Hook(HooksType.onInitialize)
    async start() {
        const question = 'How to create a CMMV controller?';

        const search = new Search();
        await search.initialize();

        const finalResult = await search.invoke(question);
        console.log(`LLM Response: `, finalResult.content);
    }
}

Application.exec({
    services: [SearchSample],
})

How the integration works

  • Vector search: Search queries the vector database (FAISS, Qdrant, Neo4j, etc.).
  • Context retrieval: The most relevant context is extracted and sent to the LLM.
  • Model execution: The LLM processes the query using the retrieved context and generates a response.
  • JSON response: The answer is formatted in JSON for easy manipulation.