@sammyl/ai-url-query NPM

AI URL Query

AI URL Query is a Node.js and TypeScript library that enables you to ask questions about the content of any web URL using OpenAI’s embeddings and chat completion APIs. It fetches a webpage, extracts and segments its text content, generates embeddings for each segment, and then uses a vector database to efficiently retrieve and answer queries based on that content.

Features

Content Extraction & Segmentation:
Uses axios and cheerio to fetch a URL and extract its text, then segments the text into chunks using a content segmentation agent powered by OpenAI’s chat completion API.
Embeddings Generation:
Wraps OpenAI’s embeddings API to generate vector representations of text chunks using the text-embedding-3-small model.
Vector Database Integration:
Stores and queries embeddings using a simple interface over vectra.
Intelligent Query Tool:
Provides a high-level tool (QueryURLTool) that integrates the content processor and OpenAI to answer natural language questions about the content of a given URL.
TypeScript First:
Built with TypeScript, featuring clear interface definitions (e.g., IContentProcessor, IContentChunker, IVectorDatabase, ITool) that make it easy to understand and extend the library’s functionality.

Installation

Clone the Repository:

git clone https://github.com/sammyl720/ai-url-query.git
cd ai-url-query

Install Dependencies:
```
npm install
```
Build the Project (if needed):
```
npm run build
```
Run in Development Mode:
```
npm run dev
```

Configuration

Create a .env file in the root directory of the project and add your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

This key is required for both generating embeddings and making chat completion requests.

Usage

The library can be used as a standalone CLI tool or integrated into other applications. Below is an example of using the CLI tool provided in src/examples/cli.ts.

CLI Example

The CLI example demonstrates how to:

Instantiate OpenAI and the required agents/tools.
Fetch a URL's content.
Segment the content and store embeddings.
Accept a natural language question from the user and retrieve an answer based on the URL’s content.

Checkout the cli example here.

Project Structure

├── src
│   ├── agents
│   │   └── content-segmentation.ts      # Implements content segmentation using OpenAI
│   ├── database
│   │   └── vector
│   │       └── vector-database.ts       # Implements vector database using vectra
│   ├── tools
│   │   ├── embedding-generator.ts      # OpenAI embeddings wrapper
│   │   └── query-url.ts                # Tool to query URLs using the processed content
│   ├── content-processor.ts           # Core logic for processing URL content (fetch, chunk, embed, store)
│   ├── helpers.ts                     # Helper constants and methods (e.g., INDEX_PATH)
│   ├── types.ts                       # TypeScript interface definitions
│   └── consts.ts                      # Various constants used across the project
├── .env                               # Environment configuration (OpenAI API key)
├── package.json
├── tsconfig.json
└── README.md

API Overview

The project defines several key interfaces to standardize functionality:

IContentProcessor:
Provides methods to store URL content embeddings and retrieve relevant matches for a query.
IContentChunker:
Splits long text into manageable chunks suitable for embedding.
IVectorDatabase:
Abstracts the storage and querying of embeddings.
ITool & IToolHandler:
Define how tools (like the URL query tool) are structured and executed, making it easier to integrate with OpenAI’s function calling mechanism.
IEmbeddingsGenerator:
A simple interface to generate embeddings for a given text input.

These abstractions allow you to easily swap out implementations or extend functionality.