0.0.5 • Published 10 months ago

llm-rehearsal v0.0.5

Weekly downloads
-
License
UNLICENSED
Repository
github
Last release
10 months ago

Rehearsal

Prompt evaluation and regression testing

Modifying a prompt, sometimes even the smallest details, can have a big impact on the output. Rehearsal makes it easy to perform various tests or evaluations against LLM output. Use cases for Rehearsal include:

  • regression testing
  • QA
  • helping with prompt iteration

Installation

yarn add -D llm-rehearsal

Usage

One important aspect of Rehearsal is that it's completely agnostic of what's used to generate the text. Simply provide an async function that returns a {text: "llm response"} object:

import { rehearsal, expectations } from 'llm-rehearsal';

const { includesString } = expectations;

// Provide an LLM function
const { testCase, run } = rehearsal(async (input: { country: string }) => {
  // your custom code to call LLM here
  const textResponse = await callLLM({
    prompt: `What is the capital of ${country}?`,
  });
  return { text: textResponse }; // only requirement is to return llm response in `text` property
});

// Define test cases
testCase('France', {
  input: { country: 'France' },
  expect: [includesString('paris')],
});
testCase('Germany', {
  input: { country: 'Germany' },
  expect: [includesString('berlin')],
});

// Start test suite
run();

To run the tests, don't forget to call run() at the end and execute your file (with plain node for JS or ts-node for TS).

Expectations for all test cases

To run expectations on all test cases, use expectForAll():

const { testCase, run, expectForAll } = rehearsal(
  async (input: { country: string }) => {
    // your custom code to call LLM here
    const textResponse = await callLLM({
      prompt: `What is the capital of ${country}?`,
    });
    return { text: textResponse }; // only requirement is to return llm response in `text` property
  },
);

// This expectation will be run for all testCase
expectForAll([not(includesString('as a large language model'))]);

Mixing expectations

Expectations can be composed with boolean logic:

import { rehearsal, expectations } from 'llm-rehearsal';
const { includesString, not, and, or } = expectations;

const { testCase } = rehearsal(llmFunction);

testCase("don't say yellow", {
  input: {
    /* input variables */
  },
  expect: [not(includesString('yellow'))],
});

testCase('potato/tomato', {
  input: {
    /* input variables */
  },
  expect: [or(includesString('potato'), includesString('tomato'))],
});

testCase('the cake is a lie', {
  input: {
    /* input variables */
  },
  expect: [and(includesString('cake'), includesString('lie'))],
});

Built-in expectations

  • includesString - checks if the LLM response contains a given string
  • matchesRegex - checks if the LLM response matches a given regular expression
  • not - negates an expectation
  • and - compose multiple expectations with AND logic
  • or - compose multiple expectations with OR logic

Coming soon:

  • includesWord - check for separate words, not just substrings
  • askGPT - perform evaluation through a GPT prompt

Custom expectations

Custom expctations can be easily created:

import { createExpectation } from 'llm-rehearsal';

const { isLongerThan } = createExpectation(
  'isLongerThan',
  (count: number) => (output) => {
    return output.text.length > count
      ? { pass: true }
      : {
          pass: false,
          message: `Expected output text to be > ${count} characters, but instead is ${output.text.length}`,
        };
  },
);

// use it as the built-in expectations
testCase('long output', {
  input: {
    /* input variables */
  },
  expect: [isLongerThan(9000)],
});

// custom expectations can also be composed with boolean logic:
testCase('long output with sandwich in it', {
  input: {
    /* input variables */
  },
  expect: [and(isLongerThan(9000), includesString('sandwich'))],
});

If your function returns more than just a text (such as metadata or results of intermediate steps), you can create type-safe expectations:

import { rehearsal, expectations } from 'llm-rehearsal';

// notice that `createExpectation` is returned by the rehearsal() function,
// and is typed according to the input/output of the LLM function
const { testCase, createExpectation } = rehearsal(
  async (input: { country: string }) => {
    // your custom code to call LLM here
    const { textResponse, documents } = await callLLMChain({
      prompt: `What is the capital of ${country}?`,
    });
    return { text: textResponse, documents }; // we return more than just `text`
  },
);

const { usesDocuments } = createExpectation('usesDocuments', () => (output) => {
  return output.documents.length > 0 // output is properly typed
    ? { pass: true }
    : { pass: false, message: 'Expected documents to be returned, found none' };
});

Labels for expectations

To make test results more readable, expectations can attached a label:

testCase('my test case', {
  input: {},
  expect: [
    [includesString('banana'), 'include banana'],
    [matchesRegex(/^hello/), 'starts with "hello"'],
    // also works with composed expectations:
    [
      not(
        or(
          includesString('hamburger'),
          includesString('fries'),
          includesString('hotdog'),
          includesString('chicken nuggets'),
          includesString('burritos'),
        ),
      ),
      'no fastfood',
    ],
  ],
});

Describe

Just like most testing library, you can group test cases using describe:

import { rehearsal, expectations, describe } from 'llm-rehearsal';

const { includesString } = expectations;
const { testCase, run } = rehearsal(async (input: { country: string }) => {
  // your custom code to call LLM here
  const textResponse = await callLLM({
    prompt: `What is the capital of ${country}?`,
  });
  return { text: textResponse };
});

describe('Countries', () => {
  testCase('France', {
    input: { country: 'France' },
    expect: [includesString('paris')],
  });
  testCase('Germany', {
    input: { country: 'Germany' },
    expect: [includesString('berlin')],
  });
});

Note: describe does not support only. This should be supported in the future.

Only

To isolate a test case and run only this one (or only a few), use textCase.only:

testCase('France', {
  input: { country: 'France' },
  expect: [includesString('paris')],
});
testCase.only('Germany', {
  input: { country: 'Germany' },
  expect: [includesString('berlin')],
});

This will only run the Germany test case. Multiple test case can be marked "only" to run a selected set.

Local development

To install a local build of Rehearsal, the recommended method is to use Yalc. Make sure to install yalc globally.

  1. Build the library: yarn build
  2. Publish to the yalc local store (does not leave your computer): yarn publish-local
  3. On the consuming side (the NodeJS project where you want to install Rehearsal): yalc install llm-rehearsal

Note
Keep in mind that Yalc will copy the package to the store, and then copy it again when installed on the consuming side. After a new build, you'll need to run yarn publish-local in this repository and also yalc update on the consuming side.

0.0.5

10 months ago

0.0.4

11 months ago

0.0.3

11 months ago

0.0.2

11 months ago

0.0.1

11 months ago

0.0.0

11 months ago