0.2.6 • Published 7 months ago

@novastera-oss/llamarn v0.2.6

Weekly downloads
-
License
Apache-2.0
Repository
-
Last release
7 months ago

LlamaRN

⚠️ WORK IN PROGRESS: This package is currently under active development. Community help and feedback are greatly appreciated, especially in the areas mentioned in What Needs Help.

Goals

  • Provide a thin, reliable wrapper around llama.cpp for React Native
  • Maintain compatibility with llama.cpp server API where possible
  • Make it easy to run LLMs on mobile devices with automatic resource management
  • Keep the codebase simple and maintainable

Current Features

  • Basic model loading and inference
  • Metal support on iOS
  • OpenCL/Vulkan support on Android (in progress)
  • Automatic CPU/GPU detection
  • Chat completion with templates (including Jinja template support)
  • Embeddings generation
  • Function/tool calling support

What Needs Help

We welcome contributions, especially in these areas:

  1. Android GPU Testing and Detection:

    • Development of reliable GPU detection mechanism in React Native
    • Implementation of proper backend initialization verification
    • Creation of robust testing framework for GPU availability
    • Integration of OpenCL and Vulkan acceleration once detection is stable
    • Performance benchmarking and optimization for mobile GPUs
  2. CI Improvements:

    • Adding automated Android GPU tests to CI pipeline
    • Implementing device-specific testing strategies
    • Adding performance benchmarks to CI
  3. Tool Support:

    • Improving tool calling functionality for complex interactions
    • Better JSON validation and error handling
  4. Testing:

    • Automated testing using the example project
    • More comprehensive unit tests
    • Cross-device compatibility tests
  5. Documentation:

    • Improving examples and usage guides
    • More detailed performance considerations
  6. Performance:

    • Optimizing resource usage on different devices
    • Memory management improvements
    • Startup time optimization

If you're interested in helping with any of these areas, please check our Contributing Guide.

Installation

npm install @novastera-oss/llamarn

Developer Setup

If you're contributing to the library or running the example project, follow these setup steps:

Prerequisites

  1. Clone the repository and navigate to the project directory
  2. Ensure you have React Native development environment set up for your target platform(s)

Initial Setup

# Install dependencies
npm install

# Optional if you already had previous version of llamacpp
npm run clean-llama

# Initialize llama.cpp submodule and dependencies
npm run setup-llama-cpp

Android Development

  1. Build the native Android libraries:
# Build the external native libraries for Android
./scripts/build_android_external.sh
  1. Run the example project:
cd example
npm run android

iOS Development

  1. Navigate to the example project and install iOS dependencies:
cd example
cd ios

# Install CocoaPods dependencies
bundle exec pod install

# Or if not using Bundler:
# pod install

cd ..
  1. Run the example project:
npm run ios

Development Notes

  • Android: The build_android_external.sh script compiles llama.cpp for Android architectures and sets up the necessary native libraries. This step is required before running the Android example.

  • iOS: The iOS setup uses CocoaPods to manage native dependencies. The prebuilt llama.cpp framework is included in the repository.

  • Troubleshooting: If you encounter build issues, try cleaning your build cache:

    • Android: cd android && ./gradlew clean
    • iOS: cd example/ios && rm -rf build && rm Podfile.lock && pod install

Basic Usage

Simple Completion

import { initLlama } from '@novastera-oss/llamarn';

// Initialize the model
const context = await initLlama({
  model: 'path/to/model.gguf',
  n_ctx: 2048,
  n_batch: 512
});

// Generate a completion
const result = await context.completion({
  prompt: 'What is artificial intelligence?',
  temperature: 0.7,
  top_p: 0.95
});

console.log('Response:', result.text);

Chat Completion

import { initLlama } from '@novastera-oss/llamarn';

// Initialize the model
const context = await initLlama({
  model: 'path/to/model.gguf',
  n_ctx: 4096,
  n_batch: 512,
  use_jinja: true  // Enable Jinja template parsing
});

// Chat completion with messages
const result = await context.completion({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Tell me about quantum computing.' }
  ],
  temperature: 0.7,
  top_p: 0.95
});

console.log('Response:', result.text);
// For OpenAI-compatible format: result.choices[0].message.content

Chat with Tool Calling

import { initLlama } from '@novastera-oss/llamarn';

// Initialize the model with appropriate parameters for tool use
const context = await initLlama({
  model: 'path/to/model.gguf',
  n_ctx: 2048,
  n_batch: 512,
  use_jinja: true  // Enable template handling for tool calls
});

// Create a chat with tool calling
const response = await context.completion({
  messages: [
    { role: 'system', content: 'You are a helpful assistant that can access weather data.' },
    { role: 'user', content: 'What\'s the weather like in Paris?' }
  ],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get the current weather in a location',
        parameters: {
          type: 'object',
          properties: {
            location: {
              type: 'string',
              description: 'The city and state, e.g. San Francisco, CA'
            },
            unit: {
              type: 'string',
              enum: ['celsius', 'fahrenheit'],
              description: 'The unit of temperature to use'
            }
          },
          required: ['location']
        }
      }
    }
  ],
  tool_choice: 'auto',
  temperature: 0.7
});

// Check if the model wants to call a tool
if (response.choices?.[0]?.finish_reason === 'tool_calls' || response.tool_calls?.length > 0) {
  const toolCalls = response.choices?.[0]?.message?.tool_calls || response.tool_calls;
  
  // Process each tool call
  if (toolCalls && toolCalls.length > 0) {
    console.log('Function call:', toolCalls[0].function.name);
    console.log('Arguments:', toolCalls[0].function.arguments);
    
    // Here you would handle the tool call and then pass the result back in a follow-up completion
  }
}

Generating Embeddings

import { initLlama } from '@novastera-oss/llamarn';

// Initialize the model in embedding mode
const context = await initLlama({
  model: 'path/to/embedding-model.gguf',
  embedding: true,
  n_ctx: 2048
});

// Generate embeddings
const embeddingResponse = await context.embedding({
  input: "This is a sample text to embed"
});

console.log('Embedding:', embeddingResponse.data[0].embedding);

Model Path Handling

The module accepts different path formats depending on the platform:

iOS

  • Bundle path: models/model.gguf (if added to Xcode project)
  • Absolute path: /path/to/model.gguf

Android

  • Asset path: asset:/models/model.gguf
  • File path: file:///path/to/model.gguf

Documentation

About

This library is currently being used in Novastera's mobile application, demonstrating its capabilities in production environments. We're committed to enabling on-device LLM inference with no data leaving the user's device, helping developers build AI-powered applications that respect user privacy.

License

Apache 2.0

Acknowledgments

We acknowledge the following projects and communities that have contributed to the development of this library:

  • mybigday/llama.rn - A foundational React Native binding for llama.cpp that demonstrated the viability of on-device LLM inference in mobile applications.

  • ggml-org/llama.cpp - The core C++ library that enables efficient LLM inference, serving as the foundation for this project.

  • The test implementation of the Android Turbo Module (react-native-pure-cpp-turbo-module-library) provided valuable insights for our C++ integration.

These projects have significantly contributed to the open-source ecosystem, and we are committed to building upon their work while maintaining the same spirit of collaboration and innovation.

0.2.6

7 months ago

0.2.5

7 months ago

0.2.4

7 months ago

0.2.3

7 months ago

0.2.2

7 months ago

0.2.1

7 months ago

0.1.5-beta.3

7 months ago

0.1.4-beta.2

7 months ago

0.1.4-beta.1

7 months ago

0.1.3-beta.7

7 months ago

0.1.3-beta.6

7 months ago

0.1.3-beta.5

7 months ago

0.1.3-beta.4

7 months ago

0.1.3-beta.3

7 months ago

0.1.3-beta.2

7 months ago

0.1.3-beta.1

7 months ago

0.1.1-alpha.8

7 months ago

0.1.1-alpha.7

7 months ago

0.1.1-alpha.6

7 months ago

0.1.1-alpha.5

7 months ago

0.1.1-alpha.4

7 months ago

0.1.1-alpha.3

7 months ago

0.1.1-alpha.1

7 months ago

0.0.1-alpha.4

7 months ago