evalz v0.0.2
evalz
evalz is a TypeScript package designed to facilitate model-graded evaluations with a focus on structured output. Leveraging Zod schemas, evalz streamlines the evaluation of AI-generated responses. It provides a set of tools to assess the quality of responses based on custom criteria such as relevance, fluency, and completeness. The package leverages OpenAI and Instructor js (@instructor-ai/instructor) to perform structured model-graded evaluations, offering both simple and weighted evaluation mechanisms.
Features
- Structured Evaluation Models: Define your evaluation logic using Zod schemas to ensure data integrity throughout your application.
- Flexible Evaluation Strategies: Supports various evaluation strategies, including score-based and binary evaluations, with customizable evaluators.
- Easy Integration: Designed to integrate seamlessly with existing TypeScript projects, enhancing AI and data processing workflows with minimal setup.
- Custom Evaluations: Define evaluation criteria tailored to your specific requirements.
- Weighted Evaluations: Combine multiple evaluations with custom weights to calculate a composite score.
Installation
Install evalz
using your preferred package manager:
npm install evalz openai zod @instructor-ai/instructor
bun add evalz openai zod @instructor-ai/instructor
pnpm add evalz openai zod @instructor-ai/instructor
Basic Usage
Creating an Evaluator
First, create an evaluator for assessing a single aspect of a response, such as its relevance:
import { createEvaluator } from "evalz";
import OpenAI from "openai";
const oai = new OpenAI({
apiKey: process.env["OPENAI_API_KEY"],
organization: process.env["OPENAI_ORG_ID"]
});
function relevanceEval() {
return createEvaluator({
client: oai,
model: "gpt-4",
evaluationDescription: "Rate the relevance from 0 to 1."
});
}
Conducting an Evaluation
Evaluate AI-generated content by passing the response data to your evaluator:
const evaluator = relevanceEval();
const result = await evaluator({ data: yourResponseData });
console.log(result.scoreResults);
Weighted Evaluation
Combine multiple evaluators with specified weights for a comprehensive assessment:
import { createWeightedEvaluator } from "evalz";
const weightedEvaluator = createWeightedEvaluator({
evaluators: {
relevance: relevanceEval(),
fluency: fluencyEval(),
completeness: completenessEval()
},
weights: {
relevance: 0.25,
fluency: 0.25,
completeness: 0.5
}
});
const result = await weightedEvaluator({ data: yourResponseData });
console.log(result.scoreResults);
Contributing
Contributions are welcome! Please submit a pull request or open an issue to propose changes or additions.
15 days ago
15 days ago
1 month ago
3 months ago
3 months ago
3 months ago
3 months ago