0.0.2 • Published 15 days ago

evalz v0.0.2

Weekly downloads
-
License
MIT
Repository
github
Last release
15 days ago

evalz

evalz is a TypeScript package designed to facilitate model-graded evaluations with a focus on structured output. Leveraging Zod schemas, evalz streamlines the evaluation of AI-generated responses. It provides a set of tools to assess the quality of responses based on custom criteria such as relevance, fluency, and completeness. The package leverages OpenAI and Instructor js (@instructor-ai/instructor) to perform structured model-graded evaluations, offering both simple and weighted evaluation mechanisms.

Features

  • Structured Evaluation Models: Define your evaluation logic using Zod schemas to ensure data integrity throughout your application.
  • Flexible Evaluation Strategies: Supports various evaluation strategies, including score-based and binary evaluations, with customizable evaluators.
  • Easy Integration: Designed to integrate seamlessly with existing TypeScript projects, enhancing AI and data processing workflows with minimal setup.
  • Custom Evaluations: Define evaluation criteria tailored to your specific requirements.
  • Weighted Evaluations: Combine multiple evaluations with custom weights to calculate a composite score.

Installation

Install evalz using your preferred package manager:

npm install evalz openai zod @instructor-ai/instructor

bun add evalz openai zod @instructor-ai/instructor

pnpm add evalz openai zod @instructor-ai/instructor

Basic Usage

Creating an Evaluator

First, create an evaluator for assessing a single aspect of a response, such as its relevance:

import { createEvaluator } from "evalz";
import OpenAI from "openai";

const oai = new OpenAI({
  apiKey: process.env["OPENAI_API_KEY"],
  organization: process.env["OPENAI_ORG_ID"]
});

function relevanceEval() {
  return createEvaluator({
    client: oai,
    model: "gpt-4",
    evaluationDescription: "Rate the relevance from 0 to 1."
  });
}

Conducting an Evaluation

Evaluate AI-generated content by passing the response data to your evaluator:

const evaluator = relevanceEval();

const result = await evaluator({ data: yourResponseData });
console.log(result.scoreResults);

Weighted Evaluation

Combine multiple evaluators with specified weights for a comprehensive assessment:

import { createWeightedEvaluator } from "evalz";

const weightedEvaluator = createWeightedEvaluator({
  evaluators: {
    relevance: relevanceEval(),
    fluency: fluencyEval(),
    completeness: completenessEval()
  },
  weights: {
    relevance: 0.25,
    fluency: 0.25,
    completeness: 0.5
  }
});

const result = await weightedEvaluator({ data: yourResponseData });
console.log(result.scoreResults);

Contributing

Contributions are welcome! Please submit a pull request or open an issue to propose changes or additions.

0.0.1

15 days ago

0.0.2

15 days ago

0.0.1--alpha.5

1 month ago

0.0.1--alpha.4

3 months ago

0.0.1--alpha.3

3 months ago

0.0.1--alpha.2

3 months ago

0.0.1--alpha.1

3 months ago