npm.io
0.9.0 • Published yesterday

@vedmalex/ai-connect

Licence
MIT
Version
0.9.0
Deps
0
Size
5.4 MB
Vulns
0
Weekly
0

ai-connect

ai-connect is a Bun-first TypeScript library for unified access to AI providers from browser and local runtimes.

It models routes as provider + transport + account + credential + model, so one client can combine:

  • direct APIs for OpenAI, Anthropic, and Gemini
  • local-only ACP harness routes for Claude Code and Codex
  • key rotation, account rotation, cooldowns, retries, and fallback chains
  • portable file, PDF document, and image inputs across direct API and ACP paths
  • cooperative cancellation, pause-with-partial, and per-operation timeouts
  • incremental streaming deltas, live health checks, and read-only model probes
  • context-window resolution, client-safe route projection, and fan-out throttling

Status

Implemented today:

  • OpenAI API, Anthropic API, Gemini API
  • Claude Code ACP, Codex ACP
  • agy CLI, pi CLI, Claude/OpenClaude CLI, Codex CLI
  • OpenCode Server
  • browser and local client factories
  • env-backed key pools with delimiter-based rotation
  • image generation helpers for OpenAI and Gemini
  • text, image, and PDF document attachment inlining for direct API prompts
  • portable file normalization for paths, File, Blob, data URLs, and remote URIs
  • cancellation, pause, and per-operation timeouts on generate() and stream()
  • incremental streaming deltas for OpenAI SSE and ACP routes
  • live two-stage health checks and read-only broken-model probes
  • context-window resolution with a browser-safe curated model reference
  • client-safe route/candidate projection for untrusted UI/agent surfaces
  • per-route model allowlist modes, unknown-selector degrade policy, and a pluggable model selector
  • client-side fan-out throttling (concurrency, rate, lifetime call ceiling)

Built-In Provider Scope

Built-in HTTP handlers exist for:

  • openai
  • anthropic
  • gemini

gemini is the canonical provider id for the Google Gemini stack.

google is not a supported provider id. Use gemini.

Custom provider ids are accepted by config normalization, but they are not automatically backed by built-in HTTP handlers.

Use these rules:

  • for OpenAI-compatible APIs such as OpenRouter, keep provider: "openai" and override transport.baseUrl
  • for Anthropic-compatible APIs, keep provider: "anthropic" and override transport.baseUrl
  • for Gemini-compatible APIs, keep provider: "gemini" and override transport.baseUrl
  • use a truly custom provider id only when you also supply a custom handler, or when the route uses cli, acp, or server

Install

This package is published to the npm registry as the public scoped package @vedmalex/ai-connect.

npm install @vedmalex/ai-connect

With Bun:

bun add @vedmalex/ai-connect

You can still consume it directly from GitHub if you need a specific unpublished commit:

{
  "dependencies": {
    "@vedmalex/ai-connect": "git+ssh://git@github.com/vedmalex/ai-connect.git#<commit-sha>"
  }
}

If your consumer uses Bun against the GitHub form, also add:

{
  "trustedDependencies": [
    "@vedmalex/ai-connect"
  ]
}

Full integration notes:

To build the workspace locally from source:

bun install

Reference Demos

This repository includes two full reference applications in monorepo form:

They are intended as copyable blueprints for real products. The web demo exposes settings through explicit windows and form controls. The local demo exposes settings through JSONC and a TUI workflow.

Run them from the repository root:

bun run dev:web-demo
bun run dev:local-demo

The shared contract used by both demos lives in:

Full demo guide:

Quick Start

import { createBrowserClient, defineConfig } from "@vedmalex/ai-connect/browser";

const client = createBrowserClient(
  defineConfig({
    providers: {
      openai: {
        accounts: [
          {
            id: "main",
            transport: "api",
            models: ["gpt-4.1"],
            credentials: [{ apiKeyEnv: "OPENAI_API_KEY" }],
          },
        ],
      },
    },
  }),
  {
    runtime: {
      getEnv: (name) => import.meta.env[name],
    },
  },
);

const result = await client.generate({
  messages: [{ role: "user", content: "Summarize this design brief." }],
});

console.log(result.text);

For built-in API providers, transport.baseUrl is optional. If you omit it, ai-connect uses the official upstream defaults:

  • openai -> https://api.openai.com/v1/...
  • anthropic -> https://api.anthropic.com/v1/...
  • gemini -> https://generativelanguage.googleapis.com/v1beta/...

Local ACP Example

import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";

const client = createLocalClient(
  defineConfig({
    providers: {
      anthropic: {
        accounts: [
          {
            id: "subscription",
            transport: {
              kind: "acp",
              id: "claude-code-acp",
            },
            models: ["claude-sonnet-4"],
          },
        ],
      },
    },
  }),
  {
    acp: {
      permissionMode: "approve-reads",
      commands: {
        "anthropic:claude-code-acp":
          "npx -y @agentclientprotocol/claude-agent-acp@^0.25.0",
      },
    },
  },
);

const result = await client.generate({
  messages: [{ role: "user", content: "Review this repository layout." }],
});

console.log(result.text);

If the host application is running in one folder but the inference should use another project folder as local context, pass workingDirectory per request:

const result = await client.generate({
  workingDirectory: "/Users/vedmalex/work/scancheck-target",
  messages: [{ role: "user", content: "Review this repository layout." }],
});
ACP model selection + headless harness-noise suppression

For headless / batch prompts (e.g. per-document extraction) the ACP transport drives an interactive coding agent. Two behaviours make that robust by default:

  • Model via the protocol. The route's model (routeHints.model ?? account model) is selected through a session/set_model call after session/new — you do not need to inject an ANTHROPIC_MODEL env var, and there is no env-driven "model switched" announcement leaking into the output. The call is sent only when the agent advertises a model catalog, the requested model is in its availableModels, and it differs from the current model; a model the agent does not advertise is surfaced as a warning (it is not silently replaced by the agent default).
  • Harness-noise suppression + guard. Known interactive-harness marker lines (a model-switch announcement, a Готов к работе / Жду … idle greeting, <local-command-caveat> commentary) are filtered out of the answer text on both the generate and the streaming (delta) paths. If a turn yields only such harness chatter and no task output, it is surfaced as temporary_unavailable so the consumer can retry / fall back rather than receiving the greeting as a successful generation.

All three are on by default and can be toggled via acp client options:

const client = createLocalClient(config, {
  acp: {
    selectModel: true,            // session/set_model from the route model (default true)
    suppressHarnessNoise: true,   // filter harness marker lines from text + deltas (default true)
    failOnHarnessOnlyTurn: true,  // harness-only turn → temporary_unavailable (default true)
  },
});

Limitation: the harness-noise filter / guard recognises a curated, locale-specific marker set (Claude harness, RU greetings). A reworded or other-locale greeting that still carries non-marker text is not classified as harness-only. The model-switch root cause is removed independently by selectModel.

Dedicated provider-specific ACP examples:

Local CLI And Server Presets

Built-in local transport presets are available both as catalog entries and as exported preset metadata:

import {
  AI_CONNECT_DEFAULT_CLI_PRESETS,
  AI_CONNECT_DEFAULT_SERVER_PRESETS,
  getTextTransportPresetById,
  listTextProviderCatalog,
} from "@vedmalex/ai-connect";

const localCatalog = listTextProviderCatalog({ runtime: "local" });
const codexCli = getTextTransportPresetById("openai", "codex-cli");
const opencodeServer = AI_CONNECT_DEFAULT_SERVER_PRESETS.opencode;

For built-in CLI routes the shortest form is still the route id:

transport: {
  kind: "cli",
  id: "codex-cli",
}

If you want a custom route id but still want the built-in argv/parser/command defaults, set transport.cli.preset explicitly:

transport: {
  kind: "cli",
  id: "my-codex-wrapper",
  cli: {
    preset: "codex",
  },
}

CLI command resolution order is:

  1. transport.command
  2. createLocalClient(..., { cli: { commands } })
  3. transport.cli.preset
  4. built-in command mapped from provider + transport.id

Known local presets now include:

  • openai:codex-cli
  • anthropic:claude-cli
  • openclaude:openclaude-cli
  • pi:pi-cli
  • anthropic:claude-code-acp
  • opencode:opencode-server
  • opencode:opencode-acp

Custom CLI Providers

Custom CLI providers can be connected by describing the argv template and the parser:

import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";

const client = createLocalClient(
  defineConfig({
    providers: {
      customcli: {
        accounts: [
          {
            id: "my-cli",
            transport: {
              kind: "cli",
              id: "my-company-cli",
              command: "my-agent",
              cli: {
                argsTemplate: [
                  "run",
                  "--prompt",
                  "{prompt}",
                  "--model",
                  "{model}",
                  "--format",
                  "json",
                ],
                parser: {
                  kind: "json",
                  textPath: "payload.message",
                  usagePath: "metrics",
                  errorPath: "error.message",
                },
              },
            },
            models: ["my-model-v1"],
          },
        ],
      },
    },
  }),
);

The parser supports three kinds:

  • kind: "json" — parse stdout as a single JSON object; read the answer from textPath (plus optional usagePath / errorPath).

  • kind: "jsonl" — parse stdout as newline-delimited JSON; select the answer/usage/error lines with { path, wherePath, whereEquals } selectors.

  • kind: "text" — treat stdout as raw plain text and return it as result.text. For print-mode coding-agent CLIs that emit plain text, not JSON (no --output-format json flag, no ACP mode). Options:

    • trim (default true) — trim leading/trailing whitespace.
    • stripAnsi (default false) — strip ANSI escape sequences (spinner/color noise) before returning.

    Print-mode plain text carries no token information, so result.usage is absent (none is fabricated). An empty stdout on a non-zero exit still rejects with temporary_unavailable.

import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";

// A custom print-mode coding-agent CLI ("agy") that writes the answer as raw text.
const client = createLocalClient(
  defineConfig({
    providers: {
      agy: {
        accounts: [
          {
            id: "local",
            transport: {
              kind: "cli",
              id: "agy-cli",
              command: "agy",
              cli: {
                argsTemplate: ["-p", "{prompt}", "--model", "{model}"],
                parser: { kind: "text" }, // raw stdout -> result.text
              },
            },
            models: ["default"],
          },
        ],
      },
    },
  }),
);

pi has a built-in CLI preset (pi-cli). The preset supplies the default command (pi), argsTemplate (["--print","--model","{model}","{prompt}"]), parser (kind: "text"), and discovery (via: "none"). The minimal config is therefore:

import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";

const client = createLocalClient(
  defineConfig({
    providers: {
      pi: {
        accounts: [
          {
            id: "local",
            transport: {
              kind: "cli",
              id: "pi-cli", // selects the built-in pi-cli preset
            },
            models: ["gemini-3.1-pro-low"],
          },
        ],
      },
    },
  }),
);

Supported placeholders in argsTemplate:

  • {prompt}
  • {model}
  • {output_file}

{output_file} is useful for CLIs like codex exec that stream JSONL to stdout but write the final assistant message to a file.

CLI Model Discovery

A cli route exposes the same management interfaces as other providers — discoverModels / checkHealth / probeModels / listCandidateModels. The discovery source is resolved from transport.cli.discovery.via; when via is omitted it is chosen by the chain commandacpstaticnone:

  • command — run a configured CLI sub-command that lists models and parse its stdout. Reuses the same json | jsonl | text formats; model fields are mapped with a models selector. Falls back to the static source on empty/failed output (fallback: "static" by default when models[] is present; set fallback: "none" to fail loud):

    transport: {
      kind: "cli",
      command: "agy",
      cli: {
        argsTemplate: ["-p", "{prompt}", "--model", "{model}"],
        parser: { kind: "text" },
        discovery: {
          command: {
            argsTemplate: ["models", "list", "--json"],
            parser: { kind: "json" },
            models: { path: "data", idPath: "id", namePath: "display_name", contextLengthPath: "context_window" },
          },
        },
      },
    },
    models: ["agy-pro", { id: "agy-fast", contextWindow: 200_000 }],
  • acp — delegate discovery to an ACP sidecar (the default for the built-in coding-agent presets).

  • static — build the catalog from the account's configured models[] (+ contextWindow). This is the default for a preset-less custom CLI that declares models[] (it previously reported not_supported); opt out with discovery: { via: "none" }.

  • none — no discovery.

Configured context window (GAP-A). A model entry's contextLength is surfaced from the route's configured contextWindow only when discovery did not already report one (monotonic — a live discovered value always wins). Provenance is exposed on the typed field ModelInfo.contextWindowSource: "discovered" (read from a live API/cli list-command record), "configured" (surfaced from the route's contextWindow), or undefined (unknown). A consumer mapping catalog.contextLength into resolveModelContextWindow's discovered slot should do so ONLY when contextWindowSource === "discovered" — treat "configured" as the configured input and undefined as unknown (do not promote it to the discovered slot) — so the precedence discovered > reference > configured > default stays honest. (metadata.contextWindowSource: "configured" is retained as a back-compat alias on the configured-fill path.)

Discovery diagnostics. A model-discovery route report (and its catalog) may carry a warnings: string[]. In particular, a cli discovery.via: "command" route whose list command fails/times out/returns nothing and falls back to its static models[] catalog records a warning there, so a degraded fallback is distinguishable from a healthy static catalog.

Current local transport scope:

  • cli: text generation, plus model discovery via a list command, a static config catalog, or an ACP sidecar
  • server: text generation plus provider-native model discovery
CLI File and Image Input

CLI routes can stage local attachment files into a temp directory and pass them to the subprocess as argv tokens or inline prompt references.

Client-level staging is configured under cli.staging:

const client = createLocalClient(config, {
  cli: {
    staging: {
      dir: "/tmp/my-staging",  // default: os.tmpdir()
      prefix: "ai-connect-",   // temp dir name prefix
      keep: true,               // retain per-invocation temp dir (debug only)
    },
  },
});

Staged files are written under <stagingDir>/attachments/ with sanitized basenames (path-traversal safe) and removed after the call completes (unless keep: true). Attachments that carry only a remote URI and no bytes degrade to the raw URI reference.

Per-route file input is declared under transport.cli.fileInput:

transport: {
  kind: "cli",
  id: "my-route",
  cli: {
    fileInput: {
      placement: "args",          // "args" (default) | "prompt"
      perFileArgs: ["@{path}"],   // argv tokens per file; {path} and {name} placeholders
      // mentionTemplate: "@{path}", // prompt placement: inserted per file (default "@{path}")
      // separator: " ",
      categories: ["image", "document", "text", "other"], // accepted categories (default: all)
      stagingDir: "/custom/dir",  // per-route override of cli.staging.dir
    },
  },
}

A {files} placeholder in argsTemplate expands to the full per-file argv block; it records a single telemetry key, not the staged absolute paths.

Capability gate. A route advertises supportsImageInput only when its fileInput stages the image category, and supportsFileUpload only when it stages the document category. A route that does not accept a category rejects the attachment with unsupported_capability at routing time, before any subprocess is spawned.

Built-in preset behavior:

  • pi preset — accepts attachments from all categories (image, document, text, other). Each file is passed as an @{path} argv token (perFileArgs: ["@{path}"]). Pass an image attachment and pi receives @/tmp/.../attachments/photo.png as an argument. PDFs are staged as-is with no document extraction; pi receives raw bytes — prefer images or text files for reliable results.

    const result = await client.generate({
      messages: [{ role: "user", content: "Describe this image." }],
      attachments: ["/path/to/photo.png"],
      // route is pi:cli:local or routeHints selects pi-cli
    });
  • codex preset — accepts images only (categories: ["image"]). Each image is passed as --image {path} (perFileArgs: ["--image", "{path}"]). A non-image attachment (e.g. a PDF) on a codex route is rejected with unsupported_capability before any spawn.

  • claude / openclaude presets — do not accept CLI file input by default. File input is opt-in: add an explicit transport.cli.fileInput block to your route config.

Mock Gateway

For API-level debugging you can run a local mock backend that behaves like a small OpenAI/Anthropic/Gemini proxy and captures the real finalized wire payloads:

bun run mock-gateway

It prints base URLs for:

  • OpenAI: http://127.0.0.1:8046/v1
  • Anthropic: http://127.0.0.1:8046/v1/messages
  • Gemini: http://127.0.0.1:8046/v1beta/models

The mock backend accepts any API key value and logs each captured request after ai-connect has already normalized it. Set MOCK_GATEWAY_VERBOSE=1 to print full request snapshots instead of only summaries.

To run it as a transparent MITM in front of a real upstream proxy:

MITM_UPSTREAM_ORIGIN=http://127.0.0.1:8045 bun run mock-gateway

In that mode it keeps the same local URLs, forwards requests upstream, and logs:

  • the finalized request payload
  • the upstream response payload
  • per-request total latency and upstream latency

This is useful both for direct API routes and for ACP harnesses that support gateway-style HTTP upstream configuration, because the harness can point at the MITM URL while ai-connect stays attached to the same local endpoint.

Rotation and Fallback

import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";

const client = createLocalClient(
  defineConfig({
    providers: {
      openai: {
        accounts: [
          {
            id: "main",
            transport: "api",
            models: ["gpt-4.1"],
            credentials: [
              {
                id: "pool",
                apiKeyEnv: "OPENAI_API_KEYS",
                apiKeyDelimiter: ",",
              },
            ],
          },
        ],
      },
      anthropic: {
        accounts: [
          {
            id: "subscription",
            transport: { kind: "acp", id: "claude-code-acp" },
            models: ["claude-sonnet-4"],
          },
        ],
      },
    },
    routing: {
      strategy: "round-robin",
      shuffleOnInit: true,
      fallback: {
        on: {
          rate_limit: [
            "rotate-credential",
            "rotate-account",
            "fallback-transport",
            "fallback-provider",
          ],
        },
      },
    },
  }),
);

Route pools accept several selector forms, but the safest form is:

  • provider:transport:account:model
  • or the full concrete route.id

Shorter selectors such as provider:account:model are convenience aliases. If the same account+model exists on multiple transports, the shorter form can match more than one route.

Three error codes are intentionally hard-terminal: they never rotate, retry, or fall back, and they never pollute route health:

  • aborted — the caller cancelled the operation
  • timeout — an operation deadline elapsed
  • fanout_limit — a client-side fan-out ceiling was exhausted

All other normalized error codes (rate_limit, quota_exhausted, temporary_unavailable, etc.) remain eligible for the rotation/retry/fallback chain you configure under routing.fallback.

Cancellation, Pause, and Timeouts

generate(request, opts?) and stream(request, opts?) accept an optional second argument:

type GenerateCallOptions = {
  signal?: AbortSignal;
  pauseSignal?: AbortSignal;
  timeoutMs?: number;
};

Cancellation with an AbortSignal discards any in-flight partial and throws an AiConnectError with code aborted:

const controller = new AbortController();
setTimeout(() => controller.abort(), 5_000);

try {
  const result = await client.generate(
    { messages: [{ role: "user", content: "Long task..." }] },
    { signal: controller.signal },
  );
  console.log(result.text);
} catch (error) {
  if (error instanceof AiConnectError && error.code === "aborted") {
    console.log("cancelled");
  }
}

pauseSignal is a separate, cooperative signal. In stream(), firing it stops reading and yields a terminal { type: "paused", result } event that keeps everything accumulated so far:

const pause = new AbortController();

for await (const event of client.stream(
  { messages: [{ role: "user", content: "Stream a draft." }] },
  { pauseSignal: pause.signal },
)) {
  if (event.type === "delta") {
    process.stdout.write(event.text);
  } else if (event.type === "paused") {
    console.log("\npaused with partial:", event.result.text);
  } else if (event.type === "result") {
    console.log("\ndone:", event.result.text);
  }
}

In generate() a mid-call pause degenerates to aborted, because a non-streamed call cannot retain a partial. Abort always throws and discards; pause in stream() is the only way to keep a partial.

timeoutMs overrides the per-operation timeout tier for a single call. Setting <= 0 or Infinity disables the timer. A fired timeout throws AiConnectError with code timeout. You can also set client-wide tier defaults:

const client = createLocalClient(config, {
  timeouts: {
    generateMs: 120_000, // generate / stream (default 120000)
    probeMs: 12_000,     // verify / discover* / checkHealth / probeModels (default 12000)
  },
});

verify(), discoverModels(), and discoverAcpModels() accept the signal/timeoutMs subset of these options as their own second argument.

Files and Images

The unified request format supports:

  • attachments for text, image, and PDF document prompt inputs
  • image.size and image.rawPrompt for image generation routes
  • portable file inputs:
    • absolute local paths
    • browser File
    • browser Blob
    • data: URLs
    • remote file references with uri or a provider providerFileId

Example:

const result = await client.generate({
  operation: "image",
  messages: [{ role: "user", content: "Create a lotus architecture diagram" }],
  attachments: [
    new File(["project outline"], "brief.md", { type: "text/markdown" }),
  ],
  image: {
    size: "1280x720",
  },
});

console.log(result.attachments);
PDF and Document Input

PDF attachments (application/pdf) now route across the api transport family, not just ACP. Each built-in API handler maps a document attachment to its provider-native content block:

  • anthropic — a document block (base64 inline, Files-API file_id, or url)
  • openai — a file content block (inline file data or an uploaded Files-API file_id) alongside image_url for images
  • geminiinlineData for inline bytes or fileData for an uploaded file URI

Oversize PDFs are uploaded to the provider's Files API and referenced by id (providerFileId); if that upload fails the handler falls back to the inline base64 path and records a warning. A route that cannot carry a document at all fails with a clean AiConnectError whose code is unsupported_capability.

const result = await client.generate({
  messages: [{ role: "user", content: "Summarize the attached report." }],
  attachments: ["/Users/vedmalex/work/reports/q3.pdf"],
});

console.log(result.text);

A previously-uploaded document can be referenced directly by its provider file id, skipping re-upload:

const result = await client.generate({
  messages: [{ role: "user", content: "What changed since the last revision?" }],
  attachments: [{ providerFileId: "file_abc123", mimeType: "application/pdf", name: "spec.pdf" }],
});

The portable-file primitives used for this are exported and browser-safe where the source allows it:

  • SUPPORTED_DOCUMENT_MIME_TYPES — the set of MIME types treated as documents (currently application/pdf)
  • portableFileCategory(file) — coarse "image" | "document" | "text" | "other" classification
  • isPortableDocumentFile(file) — convenience predicate for the document category
  • materializePortableFile(file) — one decode pass producing a PortableFilePayload (base64, dataUrl, uri, text, providerFileId carriers)
  • portableFileToBase64(file) — raw base64 of the file bytes (no data: prefix)

Path-based file access requires a local runtime; in a browser bundle use File, Blob, data: URLs, or remote references.

Wide Event Logging

The client supports opt-in structured logging in the "log once per request lifecycle" style described at loggingsucks.com.

import {
  createConsoleWideEventLogger,
  createLocalClient,
  defineConfig,
} from "@vedmalex/ai-connect";

const client = createLocalClient(
  defineConfig({
    providers: {
      openai: {
        accounts: [
          {
            id: "main",
            transport: "api",
            models: ["gpt-4.1"],
            credentials: [{ apiKeyEnv: "OPENAI_API_KEY" }],
          },
        ],
      },
    },
  }),
  {
    logging: {
      logger: createConsoleWideEventLogger(),
      sampling: {
        sampleRate: 0.1,
        slowOperationMs: 2_000,
        keepErrors: true,
        keepWarnings: true,
      },
      baseContext: {
        service_name: "my-app",
        environment: "production",
      },
    },
  },
);

await client.generate({
  messages: [{ role: "user", content: "Summarize this request." }],
  logContext: {
    request_id: "req-123",
    tenant_id: "acme",
    user_id: "u-42",
  },
});

What gets logged:

  • one canonical event per generate, stream, verify, discoverModels, discoverAcpModels, checkHealth, or probeModels call
  • request shape summary, not raw prompt content
  • selected route plus full fallback/retry attempt chain
  • duration, usage (including usage.calls), warnings, and verification issue codes
  • per-operation summaries: verification, modelDiscovery, health, and probe
  • caller-provided logContext for business identifiers

Helpers:

  • createConsoleWideEventLogger()
  • shouldEmitWideEvent()

Streaming Deltas

stream() yields a GenerateStreamEvent union:

type GenerateStreamEvent =
  | { type: "delta"; text: string }
  | { type: "result"; result: GenerateResult }
  | { type: "paused"; result: GenerateResult };

For routes with a real incremental producer (the OpenAI SSE handler and the ACP delta producer), stream() emits { type: "delta", text } tokens as they arrive and then a terminal { type: "result", result }. Routes without an incremental producer still yield a single terminal result. A cooperative pauseSignal ends the stream with a terminal { type: "paused", result } that keeps the accumulated partial (see Cancellation, Pause, and Timeouts).

for await (const event of client.stream({
  messages: [{ role: "user", content: "Write a haiku." }],
})) {
  if (event.type === "delta") {
    process.stdout.write(event.text);
  } else if (event.type === "result") {
    console.log("\n", event.result.usage);
  }
}

delta and result may interleave; paused and result are mutually exclusive terminals. Abort, by contrast, throws and discards partials — it never yields paused.

Health Checks and Model Probes

Two read-only diagnostics complement verify() and discoverModels(). Neither mutates router health (no recordFailure/recordSuccess).

checkHealth(target?) runs a live two-stage check per route:

  1. endpoint reachability (api GET /models via discovery; acp/cli/server session via verify)
  2. a minimal bounded chat ping (max one token) that captures latencyMs

A Stage-1 failure short-circuits Stage-2 with detail "skipped: endpoint unreachable". Pass reachabilityOnly: true for the cheap Stage-1-only check on hot paths.

const report = await client.checkHealth({ transports: ["api"] });

for (const route of report.routes) {
  console.log(route.routeId, route.ok, route.model.latencyMs);
}

probeModels(target?, opts?) classifies each route::model tuple as broken vs transient. For api transports broken is HTTP-status-driven (400 <= status < 500 and status !== 429); 429, 5xx, and status-less transport errors are transient (broken: false). Results are served from a per-route TTL cache (default 5 minutes), with bounded concurrency (default 4), a per-probe timeout (default 8s), and opts.signal support to stop a fan-out mid-flight. probeModelsStream(target?, opts?) yields each ProbeModelResult as it settles.

const results = await client.probeModels(
  { transports: ["api"] },
  { concurrency: 6, timeoutMs: 5_000, forceRefresh: false },
);

const broken = results.filter((r) => r.broken);

The classification primitive is exported as classifyProbeOutcome, with the defaults PROBE_DEFAULT_CONCURRENCY, PROBE_DEFAULT_TIMEOUT_MS, and PROBE_DEFAULT_TTL_MS.

Context Window and Model Discovery

resolveModelContext(input, options?) returns the effective context window for a model (synchronous, no I/O), with a clear precedence: discovered > reference (curated table) > configured (per-model/route config) > default (8192). Results are cached per (baseUrl|transportId)::model; a cache hit returns the same value and source and ignores options.discovered.

const ctx = client.resolveModelContext(
  { provider: "openai", model: "gpt-4.1" },
  { discovered: 1_047_576 },
);

console.log(ctx.contextWindow, ctx.source, ctx.cached);

Configure a per-model context window in account config either at the account level (inherited by string-form models) or per model:

{
  id: "main",
  transport: "api",
  contextWindow: 128_000,
  models: ["gpt-4o", { id: "gpt-4.1", contextWindow: 1_047_576 }],
  credentials: [{ apiKeyEnv: "OPENAI_API_KEY" }],
}

Model discovery now also surfaces typed contextLength, free, and pricing fields on each discovered ModelInfo. The curated reference table and its parsers are browser-safe exports:

  • MODEL_REFERENCE and lookupModelRef(model)
  • resolveModelContextWindow({ discovered?, reference?, configured?, defaultContextWindow? })
  • extractModelContextLength(rawModelRecord)
  • detectModelFree(modelId, pricing?, rawModelRecord?)
  • parseModelPricing(rawModelRecord)
  • DEFAULT_CONTEXT_WINDOW (8192), normalizeModelKey, modelContextCacheKey

Client-Safe Projection and Flexible Routing

For untrusted UI or agent-discovery surfaces, project routes without ever exposing credentials or baseUrl:

const publicRoutes = client.listPublicRoutes({ operation: "text" });
const candidates = client.listCandidateModels({ provider: "openai" });

listPublicRoutes() returns PublicRoute DTOs (built by explicit construction, never by spreading an internal route), and listCandidateModels() returns the same secret-free CandidateModel list offered to a model selector.

Per-route routing flexibility is configured on the account:

  • modelAllowlistMode: "strict" | "shortlist"strict (default) drops an undeclared routeHints.model; shortlist passes a verbatim requested model through on a synthetic copy that preserves the route id (never fragments health)
  • defaultResponseFormat — injected only when the caller did not supply parameters.responseFormat
  • systemPrompt — injected as a leading system message only when the caller authored no system message
  • contextMode: "workspace" | "clean" — execution-context mode (see Clean Context Mode)

Unmatched route selectors are governed by routing.resolution.unknownSelector:

  • "error" (default) — throw on an unmatched selector
  • "default" — substitute the configured defaultRouteId for each unmatched selector
  • "off" — silently drop the unmatched selector (degrade)
defineConfig({
  providers: {
    openai: {
      accounts: [
        {
          id: "main",
          transport: "api",
          models: ["gpt-4.1"],
          modelAllowlistMode: "shortlist",
          defaultResponseFormat: { type: "json_object" },
          systemPrompt: "You are a concise assistant.",
          credentials: [{ apiKeyEnv: "OPENAI_API_KEY" }],
        },
      ],
    },
  },
  routing: {
    resolution: {
      unknownSelector: "default",
      defaultRouteId: "openai:api:main:gpt-4.1",
    },
  },
});

Fan-Out Throttling

Client-side fan-out throttling bounds how aggressively a client issues calls. It is configured at the client level and can be overridden per request:

type FanoutPolicy = {
  maxConcurrency?: number;   // simultaneous in-flight calls (semaphore + FIFO fairness)
  requestsPerSecond?: number; // deterministic token bucket driven by runtime.now()
  maxCalls?: number;          // hard LIFETIME ceiling
};

Any unset field is unbounded. Exhausting maxCalls throws AiConnectError with code fanout_limit before route selection, so it never pollutes route health.

const client = createLocalClient(config, {
  fanout: { maxConcurrency: 4, requestsPerSecond: 10 },
});

await client.generate({
  messages: [{ role: "user", content: "..." }],
  fanout: { maxCalls: 100 }, // request-scoped, merged per-field over the client default
});

A per-request fanout merges per-field over the client default into a request-scoped limiter that never mutates the shared client limiter. The standalone limiter primitive is exported as createFanoutLimiter(policy, runtime); normalize a raw policy first with normalizeFanoutPolicy() (and mergeFanoutPolicy() to combine a base and override).

Model Selector Hook

A consumer-supplied modelSelector runs before normal routing and picks a model from the eligible candidates:

const client = createLocalClient(config, {
  routeHints: {
    modelSelector: (question, candidateModels) => {
      // question carries text/messages/operation/routeHints (no secrets);
      // candidateModels is the secret-free CandidateModel list.
      if (question.text.length > 4_000) {
        return candidateModels.find((c) => c.model.includes("4.1"))?.model;
      }
      return undefined; // defer to normal routing
    },
    failOpen: false,
  },
});

Returning undefined defers to normal routing. An explicit routeHints.model always beats the selector (the hook is not even invoked). A thrown or rejected selector fails closed to validation_error by default; set failOpen: true to ignore it and fall through to normal routing instead. The selector may be async and LLM-backed.

Clean Context Mode

contextMode is now generalized across all transports (previously ACP-only), set per account or per ACP launch:

  • "workspace" (default) — ai-connect may inject its ambient launch context (cwd/skills/rules for ACP)
  • "clean"ai-connect injects nothing ambient; only the consumer messages/attachments plus explicit route config (systemPrompt, defaultResponseFormat) reach the wire

Clean mode suppresses ambient context, not explicit configuration: a route's systemPrompt and defaultResponseFormat are still applied in clean mode.

Usage Accounting

result.usage.calls counts the successful, usage-bearing model calls behind a result. It is seeded as +1 per reporting call (only when usage is actually reported) and summed across usage merges, so a result assembled from multiple rounds or a fallback chain reports the true call count. It is never fabricated — a route that reports no usage contributes no calls.

const result = await client.generate({
  messages: [{ role: "user", content: "Multi-round task." }],
});

console.log(result.usage?.calls, result.usage?.totalTokens);

Robustness

Two robustness behaviors apply on the API path:

  • Strict structured outputparameters.responseFormat of { type: "json_schema", strict: true, ... } requests strict schema enforcement. If the upstream rejects the request with a 400 specifically because of response_format, the handler performs a one-shot graceful retry with the format dropped, records a warning, and continues.
  • Deep error unwrapping — upstream error payloads are unwrapped up to three levels deep (cycle-safe, JSON-decoding stringified .error/.message payloads along the way) so the surfaced AiConnectError message is the real provider message, not an opaque envelope.

Cross-Project Reuse

Several primitives are intentionally provider-agnostic, client-free where possible, and free of node:* imports so they ship cleanly in browser bundles:

  • Model referenceMODEL_REFERENCE, lookupModelRef, resolveModelContextWindow, extractModelContextLength, detectModelFree, parseModelPricing (pure data + functions, no client instance)
  • Probe classificationclassifyProbeOutcome plus the PROBE_DEFAULT_* constants (HTTP-status-driven, provider-blind; the cache is owned and passed in by the caller)
  • Fan-out limitercreateFanoutLimiter(policy, runtime) (a deterministic token bucket + semaphore driven by runtime.now(), standalone with no client)
  • Abort context — the AbortContext/GenerateCallOptions contract and mapAbortError(reason) for deterministic aborted/timeout mapping
  • Usage accounting — the UsageInfo.calls summing rule (carried on the flat UsageInfo shape; any new transport adds calls: 1 in its usage guard and aggregation is automatic)

These are exported from both the default and @vedmalex/ai-connect/browser entry points (everything except createLocalClient).

bs-search Migration

When consuming ai-connect from bs-search:

  • depend on the published package @vedmalex/ai-connect, or pin an unpublished commit via file:../ai-connect for local development
  • the cancellation contract mirrors the engine's existing convention: a caller signal aborts (discarding partials → aborted), while a separate pauseSignal cooperatively pauses a stream and keeps the partial — the same split as the engine's signal vs _pauseSignal
  • prefer the provider-agnostic primitives above (model reference, probe, fan-out, usage accounting) over re-implementing them, since they are client-free and browser-safe

Browser vs Local

Capability Browser API routes Local API routes Local ACP routes
Text generation yes yes yes
Image generation yes yes depends on harness
Text attachments yes yes yes
Image attachments yes yes yes
PDF document attachments yes (data URL / remote ref) yes depends on harness
Streaming deltas yes (OpenAI SSE) yes (OpenAI SSE) yes (ACP delta producer)
Cancellation / pause / timeout yes yes yes
Health check / model probe yes yes yes
Context-window resolution yes yes yes
Local file paths no yes yes
Local command/session verification no yes yes
Claude/Codex ACP no yes yes

Runtime Entry Points

Use explicit runtime entry points when you know the target in advance:

  • @vedmalex/ai-connect/browser
  • @vedmalex/ai-connect/node
  • @vedmalex/ai-connect/bun
  • @vedmalex/ai-connect/local

Notes:

  • @vedmalex/ai-connect/browser is the browser-safe bundle.
  • @vedmalex/ai-connect defaults to the full Node/Bun-oriented entry. Its package.json exports already resolve the node and bun conditions to the respective builds, so @vedmalex/ai-connect/node and @vedmalex/ai-connect/bun are just explicit aliases of . — both are built from the same src/index.ts entry (esbuild, node20 target). The node and bun outputs are byte-identical builds; the separate subpaths exist only for callers who want to name the target explicitly. Prefer the default . import unless you have a specific reason to pin one.
  • @vedmalex/ai-connect/local is the focused local runtime entry with ACP support.

Public API

Main exports:

  • defineConfig
  • createClient
  • createBrowserClient
  • createLocalClient
  • preparePortableFile
  • buildImagePromptBundle
  • IMAGE_SIZE_PRESETS
  • AiConnectError, isAiConnectError, toAiConnectError, mapAbortError
  • createConsoleWideEventLogger
  • shouldEmitWideEvent

File primitives:

  • SUPPORTED_DOCUMENT_MIME_TYPES
  • portableFileCategory
  • isPortableDocumentFile
  • materializePortableFile
  • portableFileToBase64

Model-reference primitives (browser-safe):

  • MODEL_REFERENCE, lookupModelRef
  • resolveModelContextWindow, extractModelContextLength
  • detectModelFree, parseModelPricing
  • DEFAULT_CONTEXT_WINDOW, normalizeModelKey, modelContextCacheKey

Probe + fan-out primitives:

  • classifyProbeOutcome
  • PROBE_DEFAULT_CONCURRENCY, PROBE_DEFAULT_TIMEOUT_MS, PROBE_DEFAULT_TTL_MS
  • createFanoutLimiter, normalizeFanoutPolicy, mergeFanoutPolicy

Client methods:

  • generate(request, opts?)
  • stream(request, opts?)
  • verify(target?, opts?)
  • discoverModels(target?, opts?)
  • discoverAcpModels(target?, opts?)
  • checkHealth(target?)
  • probeModels(target?, opts?)
  • probeModelsStream(target?, opts?)
  • resolveModelContext(input, options?)
  • prepareFile(input)
  • listRoutes(filter?)
  • listPublicRoutes(filter?)
  • listCandidateModels(filter?)

generate() and stream() accept GenerateCallOptions { signal?, pauseSignal?, timeoutMs? }; verify()/discoverModels()/discoverAcpModels() accept the { signal?, timeoutMs? } subset. See Cancellation, Pause, and Timeouts, Health Checks and Model Probes, and Client-Safe Projection and Flexible Routing.

discoverAcpModels() opens the configured ACP route, runs the ACP handshake up to session/new, and returns the advertised availableModels and currentModelId per route.

discoverModels() is the unified catalog API for HTTP API, ACP, local server routes, and CLI routes that delegate discovery to an ACP sidecar. Use target.transports when you want only one transport family.

Current discovery support matrix:

  • api: supported
  • acp: supported
  • server: supported
  • cli: supported when the route config enables transport.cli.discovery, or when a built-in CLI preset maps discovery to ACP

Built-in CLI discovery defaults:

  • claude-cli -> claude-code-acp
  • codex-cli -> codex-acp
  • openclaude-cli -> no default discovery bridge

CLI discovery through ACP adds ACP-side prerequisites:

  • the ACP executable must exist
  • the ACP harness must be authenticated if that provider requires auth
  • verify() checks route plausibility and handler presence, but it does not perform a live discovery/auth handshake up front

For custom CLI wrappers you can make the public API stay uniform by delegating discovery to an ACP sidecar. For example, a Codex wrapper that delegates to codex-acp:

transport: {
  kind: "cli",
  id: "my-codex-wrapper",
  command: "/opt/bin/codex-wrapper",
  cli: {
    discovery: {
      via: "acp",
      acp: {
        providerId: "openai",
        transportId: "codex-acp",
      },
    },
  },
}

Or a Claude wrapper delegating to claude-code-acp:

transport: {
  kind: "cli",
  id: "my-claude-wrapper",
  command: "/opt/bin/claude-wrapper",
  cli: {
    discovery: {
      via: "acp",
      acp: {
        providerId: "anthropic",
        transportId: "claude-code-acp",
      },
    },
  },
}

ACP routes are treated as harness-owned connections:

  • ai-connect does not inject baseUrl
  • ai-connect does not inject provider API keys into ACP
  • the local ACP tool is responsible for its own auth/session and upstream routing

Tool semantics are intentionally split:

  • api routes support tool schema passthrough via parameters.tools
  • api routes also support client-managed tools through clientTools
  • clientTools are executed locally by ai-connect after the provider returns tool calls
  • parameters.tools remains the right path for upstream-managed tool schemas that are not executed by ai-connect
  • acp routes support harness-owned tool execution
  • acp routes do not currently forward request-defined tool schema from parameters.tools
  • cli and current built-in server routes do not support tool schema passthrough or tool execution

That distinction is also reflected in route capabilities:

  • supportsToolSchema
  • supportsToolExecution
  • supportsClientToolExecution

Client-managed tools can be registered on the client and then selected per request:

const client = createBrowserClient(config, {
  clientTools: [
    {
      type: "function",
      function: {
        name: "lookup_weather",
        description: "Return current weather for a city",
        parameters: {
          type: "object",
          properties: {
            city: { type: "string" },
          },
          required: ["city"],
        },
      },
      async execute(args, context) {
        return {
          data: {
            city: String(args.city),
            source: "local-cache",
            workingDirectory: context.workingDirectory,
          },
        };
      },
    },
  ],
});

const result = await client.generate({
  messages: [{ role: "user", content: "Check the weather in Moscow." }],
  clientTools: ["lookup_weather"],
});

Current limits:

  • clientTools are supported only for generate()
  • clientTools are currently supported only for text requests without attachments/image options
  • built-in local execution of clientTools is implemented for built-in API handlers: openai, anthropic, gemini
Context and MCP Semantics

ai-connect separates transport routing from harness-owned context loading.

ACP routes:

  • default to workspace context mode and default skills mode
  • launch from the configured local cwd, or from the current process cwd when no override is provided
  • a request-level workingDirectory overrides that cwd for the current inference call
  • can therefore pick up project-local context files, rules, and skills that the harness itself knows how to load
  • do not automatically inherit MCP servers from the host agent or from the current Codex session

Important ACP boundary:

  • ai-connect currently sends mcpServers: [] in ACP session/new
  • this means host-agent MCP integrations are not forwarded into the ACP harness automatically
  • if an ACP harness needs tools, skills, or MCP-style integrations, they must come from that harness's own configuration/environment

ACP clean mode:

  • transport.launch.contextMode: "clean" asks ai-connect to isolate cwd/home/config best-effort for supported harnesses
  • transport.launch.skillsMode: "disabled" asks ai-connect to suppress harness-owned skills/rules where supported
  • this is strongest for harnesses where ai-connect has provider-specific launch isolation; for others it is best-effort

CLI routes:

  • run as one-shot commands from cli.cwd ?? process.cwd()
  • a request-level workingDirectory overrides that cwd for the current inference call
  • can therefore use the current project folder as context if the underlying CLI tool inspects cwd
  • do not have a first-class workspace vs clean launch mode today
  • if you need a clean CLI run, use an isolated cli.cwd, custom cli.env, or a wrapper command

Server routes:

  • use whatever context model the local HTTP server implements
  • spawned local server processes use workingDirectory ?? server.cwd ?? process.cwd()
  • ai-connect does not define project-context semantics for the server process beyond launch cwd/env overrides

ACP usage statistics are exposed on result.usage when the harness provides them. ai-connect currently normalizes:

  • OpenCode ACP usage_update (used, size, cost)

Examples

See:

Example execution notes:

  • local examples run verify() first and print missing prerequisites clearly
  • ACP and other live network examples only execute the real prompt when AI_CONNECT_RUN_EXAMPLE=1 is set
  • browser examples should be run in an actual browser runtime, not from Bun/Node CLI

Local Test Server Preset

If you are targeting the local gateway at 127.0.0.1:8045, configure direct API routes with transport.baseUrl.

import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";

// Read the local gateway key from the environment — never hardcode a key.
const LOCAL_TEST_API_KEY = process.env.LOCAL_TEST_API_KEY ?? "";

const client = createLocalClient(
  defineConfig({
    providers: {
      openai: {
        accounts: [
          {
            id: "local-openai",
            transport: {
              kind: "api",
              baseUrl: "http://127.0.0.1:8045/v1",
            },
            models: ["gpt-oss-120b-medium"],
            credentials: [{ apiKey: LOCAL_TEST_API_KEY }],
          },
        ],
      },
      anthropic: {
        accounts: [
          {
            id: "local-anthropic",
            transport: {
              kind: "api",
              baseUrl: "http://127.0.0.1:8045/v1/messages",
            },
            models: ["claude-sonnet-4-6"],
            credentials: [{ apiKey: LOCAL_TEST_API_KEY }],
          },
        ],
      },
      gemini: {
        accounts: [
          {
            id: "local-gemini",
            transport: {
              kind: "api",
              baseUrl: "http://127.0.0.1:8045/v1beta/models",
            },
            models: ["gemini-3.1-flash-lite", "gemini-3.1-flash-image"],
            credentials: [{ apiKey: LOCAL_TEST_API_KEY }],
          },
        ],
      },
    },
  }),
);

const catalog = await client.discoverModels({
  transports: ["api"],
});

console.log(
  catalog.routes.flatMap((route) => route.availableModels.map((model) => model.modelId)),
);

Publishing

Full release runbook (OIDC trusted publishing, cutting a release, caveats): docs/publishing.md.

@vedmalex/ai-connect ships as a public scoped npm package. The package metadata enforces the publish boundary:

  • publishConfig.access is public, which is required for a scoped name to publish without an explicit --access public flag.
  • prepublishOnly runs bun run check && bun run build before any publish.
    • bun run check runs bun run typecheck followed by the full test suite, including every Deterministic Simulation Testing (DST) scenario (DST scenarios are plain bun:test cases, so bun test exercises them transitively).
    • bun run build compiles the runtime bundles and the type declarations.

Because prepublishOnly gates on the entire check + build pipeline, a publish is only possible when the whole suite is green and the distributable output is freshly built.

Only the built output ships. files is restricted to dist, so src, tests, and workspace tooling are excluded from the published tarball. You can confirm the contents before publishing:

npm pack --dry-run

Both npm publish and bun publish honor the prepublishOnly gate.

Keywords