0.0.4 โ€ข Published 4 months ago

agentops v0.0.4

Weekly downloads
-
License
MIT
Repository
github
Last release
4 months ago

AgentOps BETA๐Ÿ•ต๏ธ

AI agents suck. Weโ€™re fixing that.

Build your next agent with evals, observability, and replay analytics. AgentOps is the toolkit for evaluating and developing robust and reliable AI agents.

License: MIT

Quick Start

Install AgentOps npm install agentops

Add AgentOps to your code. Check out an example.

import OpenAI from "openai";
import { Client } from 'agentops';

const openai = new OpenAI();                        // Add your API key here or in the .env

const agentops = new Client({
    apiKey: "<Insert AgentOps API Key>",            // Add your API key here or in the .env
    tags: ["abc", "success"],                       // Optionally add tags to your run
    patchApi: [openai]                              // Record LLM calls automatically (Only OpenAI is currently supported)
});

// agentops.patchApi(openai)                        // Alternatively, you can patch API calls later

// Sample OpenAI call (automatically recorded if specified in "patched")
async function chat() {
    const completion = await openai.chat.completions.create({
        messages: [{ "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "Who won the world series in 2020?" },
        { "role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020." },
        { "role": "user", "content": "Where was it played?" }],
        model: "gpt-3.5-turbo",
    });

    return completion
}

// Sample other function
function orignal(x: string) {
    console.log(x);
    return 5;
}

// You can track other functions by wrapping the function.
const wrapped = agentops.wrap(orignal);
wrapped("hello");


chat().then(() => {
    agentops.endSession("Success"); // Make sure you end your session when your agent is done.
});

Time travel debugging ๐Ÿ”ฎ

(coming soon!)

Agent Arena ๐ŸฅŠ

(coming soon!)

Evaluations Roadmap ๐Ÿงญ

PlatformDashboardEvals
โœ… Python SDKโœ… Multi-session and Cross-session metrics๐Ÿšง Evaluation playground + leaderboard
๐Ÿšง Evaluation builder APIโœ… Custom event tag trackingย ๐Ÿ”œ Agent scorecards
โœ… Javascript/Typescript SDK๐Ÿšง Session replays๐Ÿ”œ Custom eval metrics

Debugging Roadmap ๐Ÿงญ

Performance testingEnvironmentsLAA (LLM augmented agents) specific testsReasoning and execution testing
โœ… Event latency analysis๐Ÿ”œ Non-stationary environment testing๐Ÿ”œ LLM non-deterministic function detection๐Ÿšง Infinite loops and recursive thought detection
โœ… Agent workflow execution pricing๐Ÿ”œ Multi-modal environments๐Ÿ”œ Token limit overflow flags๐Ÿ”œ Faulty reasoning detection
๐Ÿ”œ Success validators (external)๐Ÿ”œ Execution containers๐Ÿ”œ Context limit overflow flags๐Ÿ”œ Generative code validators
๐Ÿ”œ Agent controllers/skill tests๐Ÿ”œ Honeypot and prompt injection evaluation๐Ÿ”œ API bill tracking๐Ÿ”œ Error breakpoint analysis
๐Ÿ”œ Information context constraint testing๐Ÿ”œ Anti-agent roadblocks (i.e. Captchas)
๐Ÿ”œ Regression testing

Why AgentOps? ๐Ÿค”

Our mission is to make sure your agents are ready for production.

Agent developers often work with little to no visibility into agent testing performance. This means their agents never leave the lab. We're changing that.

AgentOps is the easiest way to evaluate, grade, and test agents. Is there a feature you'd like to see AgentOps cover? Just raise it in the issues tab, and we'll work on adding it to the roadmap.

0.0.4

4 months ago

0.0.3

5 months ago

0.0.2

5 months ago

0.0.1

5 months ago