@chris.troutner/plugin-llama v1.0.0-beta.1
@elizaos/plugin-llama
Core LLaMA plugin for Eliza OS that provides local Large Language Model capabilities.
Overview
The LLaMA plugin serves as a foundational component of Eliza OS, providing local LLM capabilities using LLaMA models. It enables efficient and customizable text generation with both CPU and GPU support.
Features
- Local LLM Support: Run LLaMA models locally
- GPU Acceleration: CUDA support for faster inference
- Flexible Configuration: Customizable parameters for text generation
- Message Queuing: Efficient handling of multiple requests
- Automatic Model Management: Download and verification systems
Installation
npm install @elizaos/plugin-llama
Configuration
The plugin can be configured through environment variables:
Core Settings
LLAMALOCAL_PATH=your_model_storage_path
OLLAMA_MODEL=optional_ollama_model_name
Usage
import { createLlamaPlugin } from "@elizaos/plugin-llama";
// Initialize the plugin
const llamaPlugin = createLlamaPlugin();
// Register with Eliza OS
elizaos.registerPlugin(llamaPlugin);
Services
LlamaService
Provides local LLM capabilities using LLaMA models.
Technical Details
- Model: Hermes-3-Llama-3.1-8B (8-bit quantized)
- Source: Hugging Face (NousResearch/Hermes-3-Llama-3.1-8B-GGUF)
- Context Size: 8192 tokens
- Inference: CPU and GPU (CUDA) support
Features
Text Generation
- Completion-style inference
- Temperature control
- Stop token configuration
- Frequency and presence penalties
- Maximum token limit control
Model Management
- Automatic model downloading
- Model file verification
- Automatic retry on initialization failures
- GPU detection for acceleration
Performance
- Message queuing system
- CUDA acceleration when available
- Configurable context size
Troubleshooting
Common Issues
- Model Initialization Failures
Error: Model initialization failed
- Verify model file exists and is not corrupted
- Check available system memory
- Ensure CUDA is properly configured (if using GPU)
- Performance Issues
Warning: No CUDA detected - local response will be slow
- Verify CUDA installation if using GPU
- Check system resources
- Consider reducing context size
Debug Mode
Enable debug logging for detailed troubleshooting:
process.env.DEBUG = "eliza:plugin-llama:*";
System Requirements
- Node.js 16.x or higher
- Minimum 8GB RAM recommended
- CUDA-compatible GPU (optional, for acceleration)
- Sufficient storage for model files
Performance Optimization
Model Selection
- Choose appropriate model size
- Use quantized versions when possible
- Balance quality vs speed
Resource Management
- Monitor memory usage
- Configure appropriate context size
- Optimize batch processing
GPU Utilization
- Enable CUDA when available
- Monitor GPU memory
- Balance CPU/GPU workload
Support
For issues and feature requests, please:
- Check the troubleshooting guide above
- Review existing GitHub issues
- Submit a new issue with:
- System information
- Error logs
- Steps to reproduce
Credits
This plugin integrates with and builds upon:
- LLaMA - Base language model
- node-llama-cpp - Node.js bindings
- GGUF - Model format
Special thanks to:
- The LLaMA community for model development
- The Node.js community for tooling support
- The Eliza community for testing and feedback
License
This plugin is part of the Eliza project. See the main project repository for license information.
7 months ago
7 months ago