1.0.0 • Published 4 months ago

codebase-indexer v1.0.0

Weekly downloads
-
License
MIT
Repository
github
Last release
4 months ago

Codebase Indexer

A tool for indexing, searching, and analyzing large codebases for LLM integration.

Features

  • Codebase Structure Indexing: Recursively scan directories and build a searchable index of files
  • Code Analysis: Parse source code to identify functions, classes, methods, and other structures
  • Semantic Search: Find relevant code using fuzzy search algorithms
  • Context Extraction: Extract code with surrounding context for better understanding
  • LLM Integration: Format code and context for optimal LLM consumption
  • Performance Optimization: In-memory caching for faster processing

Installation

# Clone the repository
git clone https://github.com/yourusername/codebase-indexer.git
cd codebase-indexer

# Install dependencies
npm install

# Install globally (optional)
npm install -g .

Usage

Command Line Interface

Indexing a Codebase

# Index a codebase
codebase-index --dir /path/to/your/codebase

# Index with custom options
codebase-index --dir /path/to/your/codebase --exclude "node_modules/**,dist/**,*.test.js" --watch

# Index with in-memory caching for faster processing
codebase-index --dir /path/to/your/codebase --in-memory

Searching a Codebase

# Search for files
codebase-search --dir /path/to/your/codebase --query "user authentication"

# Find a function definition
codebase-search --dir /path/to/your/codebase --function "getUserData"

# Find a class definition
codebase-search --dir /path/to/your/codebase --class "UserManager"

# Search with fuzzy matching and limit results
codebase-search --dir /path/to/your/codebase --query "auth" --fuzzy --limit 5

Extracting Code

# Extract code from a file
codebase-extract --file /path/to/your/codebase/src/user.js

# Extract a function with context
codebase-extract --dir /path/to/your/codebase --function "getUserData"

# Extract a class with context
codebase-extract --dir /path/to/your/codebase --class "UserManager"

# Answer a query using codebase knowledge
codebase-extract --dir /path/to/your/codebase --query "How does user authentication work?"

Programmatic Usage

const { indexCodebase, searchCodebase, extractFunction } = require('codebase-indexer');

// Index a codebase
const index = await indexCodebase('/path/to/your/codebase', {
  inMemory: true // Enable in-memory caching for faster processing
});

// Search for files
const results = await searchCodebase(index, 'user authentication', { fuzzy: true });

// Find a function definition
const functions = await findFunctionDefinition(index, 'getUserData');

// Extract a function with context
const extractedFunctions = await extractFunction(index, 'getUserData', {
  contextLines: 3
});

// Extract knowledge to answer a query
const knowledge = await extractKnowledge(index, 'How does user authentication work?', {
  maxFiles: 3,
  maxFunctions: 5
});

MCP Server for LLM Integration

The codebase indexer includes an MCP (Model Context Protocol) server for direct integration with LLMs.

# Start the MCP server
node mcp/server.js

This exposes the following tools to LLMs:

  • index_codebase: Index a codebase directory
  • search_codebase: Search an indexed codebase
  • find_function: Find a function definition
  • find_class: Find a class definition
  • extract_code: Extract code from a file
  • extract_knowledge: Extract knowledge from codebase to answer a query
  • list_folder_structure: List the folder structure of a codebase
  • list_file_functions: List all functions in a file

Supported Languages

The codebase indexer currently supports the following languages:

  • JavaScript/TypeScript (full support)
  • Python (basic support)
  • Java (basic support)
  • C# (basic support)
  • Go (basic support)
  • Ruby (basic support)
  • PHP (basic support)

License

MIT