1.1.0 • Published 5 months ago

knowledge-base-indexer v1.1.0

Weekly downloads
-
License
MIT
Repository
github
Last release
5 months ago

Knowledge Base Indexer

A powerful tool for indexing, searching, and extracting content from local knowledge bases with LLM integration.

Features

  • File Structure Indexing: Recursively scan directories and build a searchable index of files
  • Fuzzy Search: Find relevant files using fuzzy search algorithms (similar to fzf)
  • Content Extraction: Extract and process content from various file formats
  • LLM Integration: Designed to work with LLMs for knowledge extraction and question answering
  • CLI Tools: Command-line tools for indexing, searching, and content extraction
  • Performance Options: In-memory file caching for faster processing of large knowledge bases
  • API: Programmatic access for integration with other tools

Installation

# Install globally
npm install -g knowledge-base-indexer

# Or install locally
npm install knowledge-base-indexer

Usage

Command Line Interface

Indexing a Knowledge Base

# Index a directory
kb-index --dir /path/to/knowledge/base

# Index with custom options
kb-index --dir /path/to/knowledge/base --exclude "*.tmp,*.log" --watch

# Index with in-memory caching for faster processing
kb-index --dir /path/to/knowledge/base --in-memory

Searching the Knowledge Base

# Search for files
kb-search --query "machine learning"

# Search with fuzzy matching
kb-search --query "machine learning" --fuzzy --limit 10

# Search with in-memory caching for faster processing
kb-search --query "machine learning" --in-memory

Extracting Content

# Extract content from a file
kb-extract --file /path/to/file.md

# Extract content and answer a question
kb-extract --query "What is machine learning?" --context-files "ml-intro.md,deep-learning.md"

# Extract with in-memory caching for faster processing
kb-extract --file /path/to/file.md --in-memory

Programmatic Usage

const { indexKnowledgeBase, searchKnowledgeBase, extractContent } = require('knowledge-base-indexer');

// Index a knowledge base
const index = await indexKnowledgeBase('/path/to/knowledge/base', {
  inMemory: true // Enable in-memory caching for faster processing
});

// Search for files
const results = await searchKnowledgeBase(index, 'machine learning', { inMemory: true });

// Extract content
const content = await extractContent('/path/to/file.md', { inMemory: true });

// Answer a question using content from relevant files
const answer = await extractKnowledge('What is machine learning?', { files: relevantFiles, inMemory: true });

Supported File Formats

  • Markdown (.md)
  • Text (.txt)
  • PDF (.pdf)
  • HTML (.html, .htm)
  • JSON (.json)
  • YAML (.yml, .yaml)
  • And more...

LLM Integration

Knowledge Base Indexer is designed to work with Large Language Models:

  1. File Discovery: LLMs can use the indexer to find relevant files based on user queries
  2. Content Extraction: Extract content from files to provide context to LLMs
  3. Knowledge Synthesis: LLMs can synthesize information from multiple sources
  4. Question Answering: Answer questions using knowledge extracted from files

License

MIT