1.0.10 • Published 3 months ago

ragify v1.0.10

Weekly downloads
-
License
ISC
Repository
github
Last release
3 months ago

šŸ“Œ Ragify

Ragify is a CLI tool that allows you to upload PDF documents and store their embeddings in a vector database (Pinecone or ChromaDB) for Retrieval-Augmented Generation (RAG) applications. It also provides a function to retrieve relevant responses from the vector database and pass them through an LLM to generate answers based on the uploaded document.


šŸš€ Features

āœ… Supports Pinecone and ChromaDB as vector databases.
āœ… Splits PDF documents into chunks using LangChain.
āœ… Generates embeddings using OpenAI's text-embedding-3-large model.
āœ… Stores embeddings in the selected vector database for efficient retrieval.


šŸ“¦ Installation

Install Ragify using npm:

npm i ragify

šŸ› ļø What This Library Provides

This package provides two key functions:

  • uploadFile(filePath): Uploads a PDF file, generates embeddings, and stores them in the selected vector database.
  • askQuestion(query): Retrieves relevant information from the stored embeddings and uses an LLM to generate a response.

Currently, Pinecone and ChromaDB are the supported vector databases.


šŸŒŽ Environment Variables

Before using the library, set up your .env file with the required credentials.

For Pinecone

DB_TYPE=pinecone
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=your_index_name
PINECONE_ENV=your_pinecone_environment
OPENAI_API_KEY=your_open_ai_api_key
OPENAI_MODEL=your_desired_model by default the 'gpt-4' model will be used
OPENAI_EMBEDDING_MODEL=your_desired_model by default the text-embedding-3-large will be used

For ChromaDB

DB_TYPE=chroma
CHROMA_DB_URL=http://localhost:8000
COLLECTION_NAME=pdf_embeddings
OPENAI_API_KEY=your_open_ai_api_key
OPENAI_MODEL=your_desired_model by default the 'gpt-4' model will be used

šŸš€ Usage

To run the CLI tool:

node cli.js

Follow the prompts to select a database and provide the necessary details.

Alternatively, you can use the functions in your Node.js project:

import { uploadFile, askQuestion } from "ragify";

// Upload a PDF file
await uploadFile("./documents/example.pdf");

// Ask a question based on the document
const response = await askQuestion("What is the summary of the document?");
console.log(response);

šŸ“ How It Works

1ļøāƒ£ User selects a vector database (Pinecone/ChromaDB).
2ļøāƒ£ User provides the necessary database details.
3ļøāƒ£ PDF file is loaded and split into chunks using LangChain.
4ļøāƒ£ Embeddings are generated using the OpenAI API.
5ļøāƒ£ Embeddings are stored in the selected vector database.
6ļøāƒ£ When a query is made, relevant embeddings are retrieved and passed through an LLM to generate a response.


šŸ” Debugging Tips

If embeddings are not being stored correctly in Pinecone:

1ļøāƒ£ Check API Key

curl -X GET "https://api.pinecone.io/v1/whoami" -H "Api-Key: ${PINECONE_API_KEY}"

2ļøāƒ£ Check if Pinecone index exists

curl -X GET "https://controller.${PINECONE_ENV}.pinecone.io/databases" -H "Api-Key: ${PINECONE_API_KEY}"

3ļøāƒ£ Print Loaded Document Chunks

Modify uploadFile() to inspect document chunks:

console.log(allSplits[0]);

šŸ¤ Contributing

Contributions are welcome! Feel free to submit issues and pull requests to improve this library.


šŸ“œ License

This project is licensed under the MIT License.