Knowledge-base-indexer NPM

Knowledge Base Indexer

A powerful tool for indexing, searching, and extracting content from local knowledge bases with LLM integration.

Features

File Structure Indexing: Recursively scan directories and build a searchable index of files
Fuzzy Search: Find relevant files using fuzzy search algorithms (similar to fzf)
Content Extraction: Extract and process content from various file formats
LLM Integration: Designed to work with LLMs for knowledge extraction and question answering
CLI Tools: Command-line tools for indexing, searching, and content extraction
Performance Options: In-memory file caching for faster processing of large knowledge bases
API: Programmatic access for integration with other tools

Installation

# Install globally
npm install -g knowledge-base-indexer

# Or install locally
npm install knowledge-base-indexer

Usage

Command Line Interface

Indexing a Knowledge Base

# Index a directory
kb-index --dir /path/to/knowledge/base

# Index with custom options
kb-index --dir /path/to/knowledge/base --exclude "*.tmp,*.log" --watch

# Index with in-memory caching for faster processing
kb-index --dir /path/to/knowledge/base --in-memory

Searching the Knowledge Base

# Search for files
kb-search --query "machine learning"

# Search with fuzzy matching
kb-search --query "machine learning" --fuzzy --limit 10

# Search with in-memory caching for faster processing
kb-search --query "machine learning" --in-memory

Extracting Content

# Extract content from a file
kb-extract --file /path/to/file.md

# Extract content and answer a question
kb-extract --query "What is machine learning?" --context-files "ml-intro.md,deep-learning.md"

# Extract with in-memory caching for faster processing
kb-extract --file /path/to/file.md --in-memory

Programmatic Usage

const { indexKnowledgeBase, searchKnowledgeBase, extractContent } = require('knowledge-base-indexer');

// Index a knowledge base
const index = await indexKnowledgeBase('/path/to/knowledge/base', {
  inMemory: true // Enable in-memory caching for faster processing
});

// Search for files
const results = await searchKnowledgeBase(index, 'machine learning', { inMemory: true });

// Extract content
const content = await extractContent('/path/to/file.md', { inMemory: true });

// Answer a question using content from relevant files
const answer = await extractKnowledge('What is machine learning?', { files: relevantFiles, inMemory: true });

Supported File Formats

Markdown (.md)
Text (.txt)
PDF (.pdf)
HTML (.html, .htm)
JSON (.json)
YAML (.yml, .yaml)
And more...

LLM Integration

Knowledge Base Indexer is designed to work with Large Language Models:

File Discovery: LLMs can use the indexer to find relevant files based on user queries
Content Extraction: Extract content from files to provide context to LLMs
Knowledge Synthesis: LLMs can synthesize information from multiple sources
Question Answering: Answer questions using knowledge extracted from files

License

MIT

knowledge-base indexing search llm fuzzy-search content-extraction ai-assistant

chokidar commander fast-glob fuse.js gray-matter js-yaml jsdom markdown-it node-fetch pdf-parse sqlite3 turndown yargs

1.1.0

5 months ago

1.0.0

5 months ago