ai-connect
ai-connect is a Bun-first TypeScript library for unified access to AI providers from browser and local runtimes.
It models routes as provider + transport + account + credential + model, so one client can combine:
- direct APIs for
OpenAI,Anthropic, andGemini - local-only ACP harness routes for
Claude CodeandCodex - key rotation, account rotation, cooldowns, retries, and fallback chains
- portable file, PDF document, and image inputs across direct API and ACP paths
- cooperative cancellation, pause-with-partial, and per-operation timeouts
- incremental streaming deltas, live health checks, and read-only model probes
- context-window resolution, client-safe route projection, and fan-out throttling
Status
Implemented today:
OpenAI API,Anthropic API,Gemini APIClaude Code ACP,Codex ACPagy CLI,pi CLI,Claude/OpenClaude CLI,Codex CLIOpenCode Server- browser and local client factories
- env-backed key pools with delimiter-based rotation
- image generation helpers for
OpenAIandGemini - text, image, and PDF document attachment inlining for direct API prompts
- portable file normalization for paths,
File,Blob, data URLs, and remote URIs - cancellation, pause, and per-operation timeouts on
generate()andstream() - incremental streaming deltas for OpenAI SSE and ACP routes
- live two-stage health checks and read-only broken-model probes
- context-window resolution with a browser-safe curated model reference
- client-safe route/candidate projection for untrusted UI/agent surfaces
- per-route model allowlist modes, unknown-selector degrade policy, and a pluggable model selector
- client-side fan-out throttling (concurrency, rate, lifetime call ceiling)
Built-In Provider Scope
Built-in HTTP handlers exist for:
openaianthropicgemini
gemini is the canonical provider id for the Google Gemini stack.
google is not a supported provider id. Use gemini.
Custom provider ids are accepted by config normalization, but they are not automatically backed by built-in HTTP handlers.
Use these rules:
- for OpenAI-compatible APIs such as
OpenRouter, keepprovider: "openai"and overridetransport.baseUrl - for Anthropic-compatible APIs, keep
provider: "anthropic"and overridetransport.baseUrl - for Gemini-compatible APIs, keep
provider: "gemini"and overridetransport.baseUrl - use a truly custom provider id only when you also supply a custom handler, or when the route uses
cli,acp, orserver
Install
This package is published to the npm registry as the public scoped package @vedmalex/ai-connect.
npm install @vedmalex/ai-connect
With Bun:
bun add @vedmalex/ai-connect
You can still consume it directly from GitHub if you need a specific unpublished commit:
{
"dependencies": {
"@vedmalex/ai-connect": "git+ssh://git@github.com/vedmalex/ai-connect.git#<commit-sha>"
}
}
If your consumer uses Bun against the GitHub form, also add:
{
"trustedDependencies": [
"@vedmalex/ai-connect"
]
}
Full integration notes:
To build the workspace locally from source:
bun install
Reference Demos
This repository includes two full reference applications in monorepo form:
They are intended as copyable blueprints for real products. The web demo exposes settings through explicit windows and form controls. The local demo exposes settings through JSONC and a TUI workflow.
Run them from the repository root:
bun run dev:web-demo
bun run dev:local-demo
The shared contract used by both demos lives in:
Full demo guide:
Quick Start
import { createBrowserClient, defineConfig } from "@vedmalex/ai-connect/browser";
const client = createBrowserClient(
defineConfig({
providers: {
openai: {
accounts: [
{
id: "main",
transport: "api",
models: ["gpt-4.1"],
credentials: [{ apiKeyEnv: "OPENAI_API_KEY" }],
},
],
},
},
}),
{
runtime: {
getEnv: (name) => import.meta.env[name],
},
},
);
const result = await client.generate({
messages: [{ role: "user", content: "Summarize this design brief." }],
});
console.log(result.text);
For built-in API providers, transport.baseUrl is optional. If you omit it, ai-connect uses the official upstream defaults:
openai->https://api.openai.com/v1/...anthropic->https://api.anthropic.com/v1/...gemini->https://generativelanguage.googleapis.com/v1beta/...
Local ACP Example
import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";
const client = createLocalClient(
defineConfig({
providers: {
anthropic: {
accounts: [
{
id: "subscription",
transport: {
kind: "acp",
id: "claude-code-acp",
},
models: ["claude-sonnet-4"],
},
],
},
},
}),
{
acp: {
permissionMode: "approve-reads",
commands: {
"anthropic:claude-code-acp":
"npx -y @agentclientprotocol/claude-agent-acp@^0.25.0",
},
},
},
);
const result = await client.generate({
messages: [{ role: "user", content: "Review this repository layout." }],
});
console.log(result.text);
If the host application is running in one folder but the inference should use another project folder as local context, pass workingDirectory per request:
const result = await client.generate({
workingDirectory: "/Users/vedmalex/work/scancheck-target",
messages: [{ role: "user", content: "Review this repository layout." }],
});
ACP model selection + headless harness-noise suppression
For headless / batch prompts (e.g. per-document extraction) the ACP transport drives an interactive coding agent. Two behaviours make that robust by default:
- Model via the protocol. The route's model (
routeHints.model ?? account model) is selected through asession/set_modelcall aftersession/new— you do not need to inject anANTHROPIC_MODELenv var, and there is no env-driven "model switched" announcement leaking into the output. The call is sent only when the agent advertises a model catalog, the requested model is in itsavailableModels, and it differs from the current model; a model the agent does not advertise is surfaced as awarning(it is not silently replaced by the agent default). - Harness-noise suppression + guard. Known interactive-harness marker lines (a model-switch announcement, a
Готов к работе/Жду …idle greeting,<local-command-caveat>commentary) are filtered out of the answer text on both thegenerateand the streaming (delta) paths. If a turn yields only such harness chatter and no task output, it is surfaced astemporary_unavailableso the consumer can retry / fall back rather than receiving the greeting as a successful generation.
All three are on by default and can be toggled via acp client options:
const client = createLocalClient(config, {
acp: {
selectModel: true, // session/set_model from the route model (default true)
suppressHarnessNoise: true, // filter harness marker lines from text + deltas (default true)
failOnHarnessOnlyTurn: true, // harness-only turn → temporary_unavailable (default true)
},
});
Limitation: the harness-noise filter / guard recognises a curated, locale-specific marker set (Claude harness, RU greetings). A reworded or other-locale greeting that still carries non-marker text is not classified as harness-only. The model-switch root cause is removed independently by
selectModel.
Dedicated provider-specific ACP examples:
examples/acp-codex.ts Dedicated
clientToolsexamples:
Local CLI And Server Presets
Built-in local transport presets are available both as catalog entries and as exported preset metadata:
import {
AI_CONNECT_DEFAULT_CLI_PRESETS,
AI_CONNECT_DEFAULT_SERVER_PRESETS,
getTextTransportPresetById,
listTextProviderCatalog,
} from "@vedmalex/ai-connect";
const localCatalog = listTextProviderCatalog({ runtime: "local" });
const codexCli = getTextTransportPresetById("openai", "codex-cli");
const opencodeServer = AI_CONNECT_DEFAULT_SERVER_PRESETS.opencode;
For built-in CLI routes the shortest form is still the route id:
transport: {
kind: "cli",
id: "codex-cli",
}
If you want a custom route id but still want the built-in argv/parser/command defaults, set transport.cli.preset explicitly:
transport: {
kind: "cli",
id: "my-codex-wrapper",
cli: {
preset: "codex",
},
}
CLI command resolution order is:
transport.commandcreateLocalClient(..., { cli: { commands } })transport.cli.preset- built-in command mapped from
provider + transport.id
Known local presets now include:
openai:codex-clianthropic:claude-cliopenclaude:openclaude-clipi:pi-clianthropic:claude-code-acpopencode:opencode-serveropencode:opencode-acp
Custom CLI Providers
Custom CLI providers can be connected by describing the argv template and the parser:
import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";
const client = createLocalClient(
defineConfig({
providers: {
customcli: {
accounts: [
{
id: "my-cli",
transport: {
kind: "cli",
id: "my-company-cli",
command: "my-agent",
cli: {
argsTemplate: [
"run",
"--prompt",
"{prompt}",
"--model",
"{model}",
"--format",
"json",
],
parser: {
kind: "json",
textPath: "payload.message",
usagePath: "metrics",
errorPath: "error.message",
},
},
},
models: ["my-model-v1"],
},
],
},
},
}),
);
The parser supports three kinds:
kind: "json"— parse stdout as a single JSON object; read the answer fromtextPath(plus optionalusagePath/errorPath).kind: "jsonl"— parse stdout as newline-delimited JSON; select the answer/usage/error lines with{ path, wherePath, whereEquals }selectors.kind: "text"— treat stdout as raw plain text and return it asresult.text. For print-mode coding-agent CLIs that emit plain text, not JSON (no--output-format jsonflag, no ACP mode). Options:trim(defaulttrue) — trim leading/trailing whitespace.stripAnsi(defaultfalse) — strip ANSI escape sequences (spinner/color noise) before returning.
Print-mode plain text carries no token information, so
result.usageis absent (none is fabricated). An empty stdout on a non-zero exit still rejects withtemporary_unavailable.
import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";
// A custom print-mode coding-agent CLI ("agy") that writes the answer as raw text.
const client = createLocalClient(
defineConfig({
providers: {
agy: {
accounts: [
{
id: "local",
transport: {
kind: "cli",
id: "agy-cli",
command: "agy",
cli: {
argsTemplate: ["-p", "{prompt}", "--model", "{model}"],
parser: { kind: "text" }, // raw stdout -> result.text
},
},
models: ["default"],
},
],
},
},
}),
);
pi has a built-in CLI preset (pi-cli). The preset supplies the default command (pi), argsTemplate (["--print","--model","{model}","{prompt}"]), parser (kind: "text"), and discovery (via: "none"). The minimal config is therefore:
import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";
const client = createLocalClient(
defineConfig({
providers: {
pi: {
accounts: [
{
id: "local",
transport: {
kind: "cli",
id: "pi-cli", // selects the built-in pi-cli preset
},
models: ["gemini-3.1-pro-low"],
},
],
},
},
}),
);
Supported placeholders in argsTemplate:
{prompt}{model}{output_file}
{output_file} is useful for CLIs like codex exec that stream JSONL to stdout but write the final assistant message to a file.
CLI Model Discovery
A cli route exposes the same management interfaces as other providers — discoverModels / checkHealth / probeModels / listCandidateModels. The discovery source is resolved from transport.cli.discovery.via; when via is omitted it is chosen by the chain command → acp → static → none:
command— run a configured CLI sub-command that lists models and parse its stdout. Reuses the samejson | jsonl | textformats; model fields are mapped with amodelsselector. Falls back to the static source on empty/failed output (fallback: "static"by default whenmodels[]is present; setfallback: "none"to fail loud):transport: { kind: "cli", command: "agy", cli: { argsTemplate: ["-p", "{prompt}", "--model", "{model}"], parser: { kind: "text" }, discovery: { command: { argsTemplate: ["models", "list", "--json"], parser: { kind: "json" }, models: { path: "data", idPath: "id", namePath: "display_name", contextLengthPath: "context_window" }, }, }, }, }, models: ["agy-pro", { id: "agy-fast", contextWindow: 200_000 }],acp— delegate discovery to an ACP sidecar (the default for the built-in coding-agent presets).static— build the catalog from the account's configuredmodels[](+contextWindow). This is the default for a preset-less custom CLI that declaresmodels[](it previously reportednot_supported); opt out withdiscovery: { via: "none" }.none— no discovery.
Configured context window (GAP-A). A model entry's contextLength is surfaced from the route's configured contextWindow only when discovery did not already report one (monotonic — a live discovered value always wins). Provenance is exposed on the typed field ModelInfo.contextWindowSource: "discovered" (read from a live API/cli list-command record), "configured" (surfaced from the route's contextWindow), or undefined (unknown). A consumer mapping catalog.contextLength into resolveModelContextWindow's discovered slot should do so ONLY when contextWindowSource === "discovered" — treat "configured" as the configured input and undefined as unknown (do not promote it to the discovered slot) — so the precedence discovered > reference > configured > default stays honest. (metadata.contextWindowSource: "configured" is retained as a back-compat alias on the configured-fill path.)
Discovery diagnostics. A model-discovery route report (and its catalog) may carry a warnings: string[]. In particular, a cli discovery.via: "command" route whose list command fails/times out/returns nothing and falls back to its static models[] catalog records a warning there, so a degraded fallback is distinguishable from a healthy static catalog.
Current local transport scope:
cli: text generation, plus model discovery via a listcommand, astaticconfig catalog, or an ACP sidecarserver: text generation plus provider-native model discovery
CLI File and Image Input
CLI routes can stage local attachment files into a temp directory and pass them to the subprocess as argv tokens or inline prompt references.
Client-level staging is configured under cli.staging:
const client = createLocalClient(config, {
cli: {
staging: {
dir: "/tmp/my-staging", // default: os.tmpdir()
prefix: "ai-connect-", // temp dir name prefix
keep: true, // retain per-invocation temp dir (debug only)
},
},
});
Staged files are written under <stagingDir>/attachments/ with sanitized basenames (path-traversal safe) and removed after the call completes (unless keep: true). Attachments that carry only a remote URI and no bytes degrade to the raw URI reference.
Per-route file input is declared under transport.cli.fileInput:
transport: {
kind: "cli",
id: "my-route",
cli: {
fileInput: {
placement: "args", // "args" (default) | "prompt"
perFileArgs: ["@{path}"], // argv tokens per file; {path} and {name} placeholders
// mentionTemplate: "@{path}", // prompt placement: inserted per file (default "@{path}")
// separator: " ",
categories: ["image", "document", "text", "other"], // accepted categories (default: all)
stagingDir: "/custom/dir", // per-route override of cli.staging.dir
},
},
}
A {files} placeholder in argsTemplate expands to the full per-file argv block; it records a single telemetry key, not the staged absolute paths.
Capability gate. A route advertises supportsImageInput only when its fileInput stages the image category, and supportsFileUpload only when it stages the document category. A route that does not accept a category rejects the attachment with unsupported_capability at routing time, before any subprocess is spawned.
Built-in preset behavior:
pipreset — accepts attachments from all categories (image,document,text,other). Each file is passed as an@{path}argv token (perFileArgs: ["@{path}"]). Pass an image attachment andpireceives@/tmp/.../attachments/photo.pngas an argument. PDFs are staged as-is with no document extraction;pireceives raw bytes — prefer images or text files for reliable results.const result = await client.generate({ messages: [{ role: "user", content: "Describe this image." }], attachments: ["/path/to/photo.png"], // route is pi:cli:local or routeHints selects pi-cli });codexpreset — accepts images only (categories: ["image"]). Each image is passed as--image {path}(perFileArgs: ["--image", "{path}"]). A non-image attachment (e.g. a PDF) on a codex route is rejected withunsupported_capabilitybefore any spawn.claude/openclaudepresets — do not accept CLI file input by default. File input is opt-in: add an explicittransport.cli.fileInputblock to your route config.
Mock Gateway
For API-level debugging you can run a local mock backend that behaves like a small OpenAI/Anthropic/Gemini proxy and captures the real finalized wire payloads:
bun run mock-gateway
It prints base URLs for:
OpenAI:http://127.0.0.1:8046/v1Anthropic:http://127.0.0.1:8046/v1/messagesGemini:http://127.0.0.1:8046/v1beta/models
The mock backend accepts any API key value and logs each captured request after ai-connect has already normalized it. Set MOCK_GATEWAY_VERBOSE=1 to print full request snapshots instead of only summaries.
To run it as a transparent MITM in front of a real upstream proxy:
MITM_UPSTREAM_ORIGIN=http://127.0.0.1:8045 bun run mock-gateway
In that mode it keeps the same local URLs, forwards requests upstream, and logs:
- the finalized request payload
- the upstream response payload
- per-request total latency and upstream latency
This is useful both for direct API routes and for ACP harnesses that support gateway-style HTTP upstream configuration, because the harness can point at the MITM URL while ai-connect stays attached to the same local endpoint.
Rotation and Fallback
import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";
const client = createLocalClient(
defineConfig({
providers: {
openai: {
accounts: [
{
id: "main",
transport: "api",
models: ["gpt-4.1"],
credentials: [
{
id: "pool",
apiKeyEnv: "OPENAI_API_KEYS",
apiKeyDelimiter: ",",
},
],
},
],
},
anthropic: {
accounts: [
{
id: "subscription",
transport: { kind: "acp", id: "claude-code-acp" },
models: ["claude-sonnet-4"],
},
],
},
},
routing: {
strategy: "round-robin",
shuffleOnInit: true,
fallback: {
on: {
rate_limit: [
"rotate-credential",
"rotate-account",
"fallback-transport",
"fallback-provider",
],
},
},
},
}),
);
Route pools accept several selector forms, but the safest form is:
provider:transport:account:model- or the full concrete
route.id
Shorter selectors such as provider:account:model are convenience aliases. If the same account+model exists on multiple transports, the shorter form can match more than one route.
Three error codes are intentionally hard-terminal: they never rotate, retry, or fall back, and they never pollute route health:
aborted— the caller cancelled the operationtimeout— an operation deadline elapsedfanout_limit— a client-side fan-out ceiling was exhausted
All other normalized error codes (rate_limit, quota_exhausted, temporary_unavailable, etc.) remain eligible for the rotation/retry/fallback chain you configure under routing.fallback.
Cancellation, Pause, and Timeouts
generate(request, opts?) and stream(request, opts?) accept an optional second argument:
type GenerateCallOptions = {
signal?: AbortSignal;
pauseSignal?: AbortSignal;
timeoutMs?: number;
};
Cancellation with an AbortSignal discards any in-flight partial and throws an AiConnectError with code aborted:
const controller = new AbortController();
setTimeout(() => controller.abort(), 5_000);
try {
const result = await client.generate(
{ messages: [{ role: "user", content: "Long task..." }] },
{ signal: controller.signal },
);
console.log(result.text);
} catch (error) {
if (error instanceof AiConnectError && error.code === "aborted") {
console.log("cancelled");
}
}
pauseSignal is a separate, cooperative signal. In stream(), firing it stops reading and yields a terminal { type: "paused", result } event that keeps everything accumulated so far:
const pause = new AbortController();
for await (const event of client.stream(
{ messages: [{ role: "user", content: "Stream a draft." }] },
{ pauseSignal: pause.signal },
)) {
if (event.type === "delta") {
process.stdout.write(event.text);
} else if (event.type === "paused") {
console.log("\npaused with partial:", event.result.text);
} else if (event.type === "result") {
console.log("\ndone:", event.result.text);
}
}
In generate() a mid-call pause degenerates to aborted, because a non-streamed call cannot retain a partial. Abort always throws and discards; pause in stream() is the only way to keep a partial.
timeoutMs overrides the per-operation timeout tier for a single call. Setting <= 0 or Infinity disables the timer. A fired timeout throws AiConnectError with code timeout. You can also set client-wide tier defaults:
const client = createLocalClient(config, {
timeouts: {
generateMs: 120_000, // generate / stream (default 120000)
probeMs: 12_000, // verify / discover* / checkHealth / probeModels (default 12000)
},
});
verify(), discoverModels(), and discoverAcpModels() accept the signal/timeoutMs subset of these options as their own second argument.
Files and Images
The unified request format supports:
attachmentsfor text, image, and PDF document prompt inputsimage.sizeandimage.rawPromptfor image generation routes- portable file inputs:
- absolute local paths
- browser
File - browser
Blob data:URLs- remote file references with
urior a providerproviderFileId
Example:
const result = await client.generate({
operation: "image",
messages: [{ role: "user", content: "Create a lotus architecture diagram" }],
attachments: [
new File(["project outline"], "brief.md", { type: "text/markdown" }),
],
image: {
size: "1280x720",
},
});
console.log(result.attachments);
PDF and Document Input
PDF attachments (application/pdf) now route across the api transport family, not just ACP. Each built-in API handler maps a document attachment to its provider-native content block:
anthropic— adocumentblock (base64 inline, Files-APIfile_id, or url)openai— a file content block (inline file data or an uploaded Files-APIfile_id) alongsideimage_urlfor imagesgemini—inlineDatafor inline bytes orfileDatafor an uploaded file URI
Oversize PDFs are uploaded to the provider's Files API and referenced by id (providerFileId); if that upload fails the handler falls back to the inline base64 path and records a warning. A route that cannot carry a document at all fails with a clean AiConnectError whose code is unsupported_capability.
const result = await client.generate({
messages: [{ role: "user", content: "Summarize the attached report." }],
attachments: ["/Users/vedmalex/work/reports/q3.pdf"],
});
console.log(result.text);
A previously-uploaded document can be referenced directly by its provider file id, skipping re-upload:
const result = await client.generate({
messages: [{ role: "user", content: "What changed since the last revision?" }],
attachments: [{ providerFileId: "file_abc123", mimeType: "application/pdf", name: "spec.pdf" }],
});
The portable-file primitives used for this are exported and browser-safe where the source allows it:
SUPPORTED_DOCUMENT_MIME_TYPES— the set of MIME types treated as documents (currentlyapplication/pdf)portableFileCategory(file)— coarse"image" | "document" | "text" | "other"classificationisPortableDocumentFile(file)— convenience predicate for thedocumentcategorymaterializePortableFile(file)— one decode pass producing aPortableFilePayload(base64,dataUrl,uri,text,providerFileIdcarriers)portableFileToBase64(file)— raw base64 of the file bytes (nodata:prefix)
Path-based file access requires a local runtime; in a browser bundle use File, Blob, data: URLs, or remote references.
Wide Event Logging
The client supports opt-in structured logging in the "log once per request lifecycle" style described at loggingsucks.com.
import {
createConsoleWideEventLogger,
createLocalClient,
defineConfig,
} from "@vedmalex/ai-connect";
const client = createLocalClient(
defineConfig({
providers: {
openai: {
accounts: [
{
id: "main",
transport: "api",
models: ["gpt-4.1"],
credentials: [{ apiKeyEnv: "OPENAI_API_KEY" }],
},
],
},
},
}),
{
logging: {
logger: createConsoleWideEventLogger(),
sampling: {
sampleRate: 0.1,
slowOperationMs: 2_000,
keepErrors: true,
keepWarnings: true,
},
baseContext: {
service_name: "my-app",
environment: "production",
},
},
},
);
await client.generate({
messages: [{ role: "user", content: "Summarize this request." }],
logContext: {
request_id: "req-123",
tenant_id: "acme",
user_id: "u-42",
},
});
What gets logged:
- one canonical event per
generate,stream,verify,discoverModels,discoverAcpModels,checkHealth, orprobeModelscall - request shape summary, not raw prompt content
- selected route plus full fallback/retry attempt chain
- duration, usage (including
usage.calls), warnings, and verification issue codes - per-operation summaries:
verification,modelDiscovery,health, andprobe - caller-provided
logContextfor business identifiers
Helpers:
createConsoleWideEventLogger()shouldEmitWideEvent()
Streaming Deltas
stream() yields a GenerateStreamEvent union:
type GenerateStreamEvent =
| { type: "delta"; text: string }
| { type: "result"; result: GenerateResult }
| { type: "paused"; result: GenerateResult };
For routes with a real incremental producer (the OpenAI SSE handler and the ACP delta producer), stream() emits { type: "delta", text } tokens as they arrive and then a terminal { type: "result", result }. Routes without an incremental producer still yield a single terminal result. A cooperative pauseSignal ends the stream with a terminal { type: "paused", result } that keeps the accumulated partial (see Cancellation, Pause, and Timeouts).
for await (const event of client.stream({
messages: [{ role: "user", content: "Write a haiku." }],
})) {
if (event.type === "delta") {
process.stdout.write(event.text);
} else if (event.type === "result") {
console.log("\n", event.result.usage);
}
}
delta and result may interleave; paused and result are mutually exclusive terminals. Abort, by contrast, throws and discards partials — it never yields paused.
Health Checks and Model Probes
Two read-only diagnostics complement verify() and discoverModels(). Neither mutates router health (no recordFailure/recordSuccess).
checkHealth(target?) runs a live two-stage check per route:
- endpoint reachability (api
GET /modelsvia discovery; acp/cli/server session viaverify) - a minimal bounded chat ping (max one token) that captures
latencyMs
A Stage-1 failure short-circuits Stage-2 with detail "skipped: endpoint unreachable". Pass reachabilityOnly: true for the cheap Stage-1-only check on hot paths.
const report = await client.checkHealth({ transports: ["api"] });
for (const route of report.routes) {
console.log(route.routeId, route.ok, route.model.latencyMs);
}
probeModels(target?, opts?) classifies each route::model tuple as broken vs transient. For api transports broken is HTTP-status-driven (400 <= status < 500 and status !== 429); 429, 5xx, and status-less transport errors are transient (broken: false). Results are served from a per-route TTL cache (default 5 minutes), with bounded concurrency (default 4), a per-probe timeout (default 8s), and opts.signal support to stop a fan-out mid-flight. probeModelsStream(target?, opts?) yields each ProbeModelResult as it settles.
const results = await client.probeModels(
{ transports: ["api"] },
{ concurrency: 6, timeoutMs: 5_000, forceRefresh: false },
);
const broken = results.filter((r) => r.broken);
The classification primitive is exported as classifyProbeOutcome, with the defaults PROBE_DEFAULT_CONCURRENCY, PROBE_DEFAULT_TIMEOUT_MS, and PROBE_DEFAULT_TTL_MS.
Context Window and Model Discovery
resolveModelContext(input, options?) returns the effective context window for a model (synchronous, no I/O), with a clear precedence: discovered > reference (curated table) > configured (per-model/route config) > default (8192). Results are cached per (baseUrl|transportId)::model; a cache hit returns the same value and source and ignores options.discovered.
const ctx = client.resolveModelContext(
{ provider: "openai", model: "gpt-4.1" },
{ discovered: 1_047_576 },
);
console.log(ctx.contextWindow, ctx.source, ctx.cached);
Configure a per-model context window in account config either at the account level (inherited by string-form models) or per model:
{
id: "main",
transport: "api",
contextWindow: 128_000,
models: ["gpt-4o", { id: "gpt-4.1", contextWindow: 1_047_576 }],
credentials: [{ apiKeyEnv: "OPENAI_API_KEY" }],
}
Model discovery now also surfaces typed contextLength, free, and pricing fields on each discovered ModelInfo. The curated reference table and its parsers are browser-safe exports:
MODEL_REFERENCEandlookupModelRef(model)resolveModelContextWindow({ discovered?, reference?, configured?, defaultContextWindow? })extractModelContextLength(rawModelRecord)detectModelFree(modelId, pricing?, rawModelRecord?)parseModelPricing(rawModelRecord)DEFAULT_CONTEXT_WINDOW(8192),normalizeModelKey,modelContextCacheKey
Client-Safe Projection and Flexible Routing
For untrusted UI or agent-discovery surfaces, project routes without ever exposing credentials or baseUrl:
const publicRoutes = client.listPublicRoutes({ operation: "text" });
const candidates = client.listCandidateModels({ provider: "openai" });
listPublicRoutes() returns PublicRoute DTOs (built by explicit construction, never by spreading an internal route), and listCandidateModels() returns the same secret-free CandidateModel list offered to a model selector.
Per-route routing flexibility is configured on the account:
modelAllowlistMode: "strict" | "shortlist"—strict(default) drops an undeclaredrouteHints.model;shortlistpasses a verbatim requested model through on a synthetic copy that preserves the route id (never fragments health)defaultResponseFormat— injected only when the caller did not supplyparameters.responseFormatsystemPrompt— injected as a leading system message only when the caller authored no system messagecontextMode: "workspace" | "clean"— execution-context mode (see Clean Context Mode)
Unmatched route selectors are governed by routing.resolution.unknownSelector:
"error"(default) — throw on an unmatched selector"default"— substitute the configureddefaultRouteIdfor each unmatched selector"off"— silently drop the unmatched selector (degrade)
defineConfig({
providers: {
openai: {
accounts: [
{
id: "main",
transport: "api",
models: ["gpt-4.1"],
modelAllowlistMode: "shortlist",
defaultResponseFormat: { type: "json_object" },
systemPrompt: "You are a concise assistant.",
credentials: [{ apiKeyEnv: "OPENAI_API_KEY" }],
},
],
},
},
routing: {
resolution: {
unknownSelector: "default",
defaultRouteId: "openai:api:main:gpt-4.1",
},
},
});
Fan-Out Throttling
Client-side fan-out throttling bounds how aggressively a client issues calls. It is configured at the client level and can be overridden per request:
type FanoutPolicy = {
maxConcurrency?: number; // simultaneous in-flight calls (semaphore + FIFO fairness)
requestsPerSecond?: number; // deterministic token bucket driven by runtime.now()
maxCalls?: number; // hard LIFETIME ceiling
};
Any unset field is unbounded. Exhausting maxCalls throws AiConnectError with code fanout_limit before route selection, so it never pollutes route health.
const client = createLocalClient(config, {
fanout: { maxConcurrency: 4, requestsPerSecond: 10 },
});
await client.generate({
messages: [{ role: "user", content: "..." }],
fanout: { maxCalls: 100 }, // request-scoped, merged per-field over the client default
});
A per-request fanout merges per-field over the client default into a request-scoped limiter that never mutates the shared client limiter. The standalone limiter primitive is exported as createFanoutLimiter(policy, runtime); normalize a raw policy first with normalizeFanoutPolicy() (and mergeFanoutPolicy() to combine a base and override).
Model Selector Hook
A consumer-supplied modelSelector runs before normal routing and picks a model from the eligible candidates:
const client = createLocalClient(config, {
routeHints: {
modelSelector: (question, candidateModels) => {
// question carries text/messages/operation/routeHints (no secrets);
// candidateModels is the secret-free CandidateModel list.
if (question.text.length > 4_000) {
return candidateModels.find((c) => c.model.includes("4.1"))?.model;
}
return undefined; // defer to normal routing
},
failOpen: false,
},
});
Returning undefined defers to normal routing. An explicit routeHints.model always beats the selector (the hook is not even invoked). A thrown or rejected selector fails closed to validation_error by default; set failOpen: true to ignore it and fall through to normal routing instead. The selector may be async and LLM-backed.
Clean Context Mode
contextMode is now generalized across all transports (previously ACP-only), set per account or per ACP launch:
"workspace"(default) —ai-connectmay inject its ambient launch context (cwd/skills/rules for ACP)"clean"—ai-connectinjects nothing ambient; only the consumer messages/attachments plus explicit route config (systemPrompt,defaultResponseFormat) reach the wire
Clean mode suppresses ambient context, not explicit configuration: a route's systemPrompt and defaultResponseFormat are still applied in clean mode.
Usage Accounting
result.usage.calls counts the successful, usage-bearing model calls behind a result. It is seeded as +1 per reporting call (only when usage is actually reported) and summed across usage merges, so a result assembled from multiple rounds or a fallback chain reports the true call count. It is never fabricated — a route that reports no usage contributes no calls.
const result = await client.generate({
messages: [{ role: "user", content: "Multi-round task." }],
});
console.log(result.usage?.calls, result.usage?.totalTokens);
Robustness
Two robustness behaviors apply on the API path:
- Strict structured output —
parameters.responseFormatof{ type: "json_schema", strict: true, ... }requests strict schema enforcement. If the upstream rejects the request with a400specifically because ofresponse_format, the handler performs a one-shot graceful retry with the format dropped, records a warning, and continues. - Deep error unwrapping — upstream error payloads are unwrapped up to three levels deep (cycle-safe, JSON-decoding stringified
.error/.messagepayloads along the way) so the surfacedAiConnectErrormessage is the real provider message, not an opaque envelope.
Cross-Project Reuse
Several primitives are intentionally provider-agnostic, client-free where possible, and free of node:* imports so they ship cleanly in browser bundles:
- Model reference —
MODEL_REFERENCE,lookupModelRef,resolveModelContextWindow,extractModelContextLength,detectModelFree,parseModelPricing(pure data + functions, no client instance) - Probe classification —
classifyProbeOutcomeplus thePROBE_DEFAULT_*constants (HTTP-status-driven, provider-blind; the cache is owned and passed in by the caller) - Fan-out limiter —
createFanoutLimiter(policy, runtime)(a deterministic token bucket + semaphore driven byruntime.now(), standalone with no client) - Abort context — the
AbortContext/GenerateCallOptionscontract andmapAbortError(reason)for deterministicaborted/timeoutmapping - Usage accounting — the
UsageInfo.callssumming rule (carried on the flatUsageInfoshape; any new transport addscalls: 1in its usage guard and aggregation is automatic)
These are exported from both the default and @vedmalex/ai-connect/browser entry points (everything except createLocalClient).
bs-search Migration
When consuming ai-connect from bs-search:
- depend on the published package
@vedmalex/ai-connect, or pin an unpublished commit viafile:../ai-connectfor local development - the cancellation contract mirrors the engine's existing convention: a caller
signalaborts (discarding partials →aborted), while a separatepauseSignalcooperatively pauses a stream and keeps the partial — the same split as the engine'ssignalvs_pauseSignal - prefer the provider-agnostic primitives above (model reference, probe, fan-out, usage accounting) over re-implementing them, since they are client-free and browser-safe
Browser vs Local
| Capability | Browser API routes | Local API routes | Local ACP routes |
|---|---|---|---|
| Text generation | yes | yes | yes |
| Image generation | yes | yes | depends on harness |
| Text attachments | yes | yes | yes |
| Image attachments | yes | yes | yes |
| PDF document attachments | yes (data URL / remote ref) | yes | depends on harness |
| Streaming deltas | yes (OpenAI SSE) | yes (OpenAI SSE) | yes (ACP delta producer) |
| Cancellation / pause / timeout | yes | yes | yes |
| Health check / model probe | yes | yes | yes |
| Context-window resolution | yes | yes | yes |
| Local file paths | no | yes | yes |
| Local command/session verification | no | yes | yes |
| Claude/Codex ACP | no | yes | yes |
Runtime Entry Points
Use explicit runtime entry points when you know the target in advance:
@vedmalex/ai-connect/browser@vedmalex/ai-connect/node@vedmalex/ai-connect/bun@vedmalex/ai-connect/local
Notes:
@vedmalex/ai-connect/browseris the browser-safe bundle.@vedmalex/ai-connectdefaults to the full Node/Bun-oriented entry. Itspackage.jsonexportsalready resolve thenodeandbunconditions to the respective builds, so@vedmalex/ai-connect/nodeand@vedmalex/ai-connect/bunare just explicit aliases of.— both are built from the samesrc/index.tsentry (esbuild,node20target). Thenodeandbunoutputs are byte-identical builds; the separate subpaths exist only for callers who want to name the target explicitly. Prefer the default.import unless you have a specific reason to pin one.@vedmalex/ai-connect/localis the focused local runtime entry with ACP support.
Public API
Main exports:
defineConfigcreateClientcreateBrowserClientcreateLocalClientpreparePortableFilebuildImagePromptBundleIMAGE_SIZE_PRESETSAiConnectError,isAiConnectError,toAiConnectError,mapAbortErrorcreateConsoleWideEventLoggershouldEmitWideEvent
File primitives:
SUPPORTED_DOCUMENT_MIME_TYPESportableFileCategoryisPortableDocumentFilematerializePortableFileportableFileToBase64
Model-reference primitives (browser-safe):
MODEL_REFERENCE,lookupModelRefresolveModelContextWindow,extractModelContextLengthdetectModelFree,parseModelPricingDEFAULT_CONTEXT_WINDOW,normalizeModelKey,modelContextCacheKey
Probe + fan-out primitives:
classifyProbeOutcomePROBE_DEFAULT_CONCURRENCY,PROBE_DEFAULT_TIMEOUT_MS,PROBE_DEFAULT_TTL_MScreateFanoutLimiter,normalizeFanoutPolicy,mergeFanoutPolicy
Client methods:
generate(request, opts?)stream(request, opts?)verify(target?, opts?)discoverModels(target?, opts?)discoverAcpModels(target?, opts?)checkHealth(target?)probeModels(target?, opts?)probeModelsStream(target?, opts?)resolveModelContext(input, options?)prepareFile(input)listRoutes(filter?)listPublicRoutes(filter?)listCandidateModels(filter?)
generate() and stream() accept GenerateCallOptions { signal?, pauseSignal?, timeoutMs? }; verify()/discoverModels()/discoverAcpModels() accept the { signal?, timeoutMs? } subset. See Cancellation, Pause, and Timeouts, Health Checks and Model Probes, and Client-Safe Projection and Flexible Routing.
discoverAcpModels() opens the configured ACP route, runs the ACP handshake up to session/new, and returns the advertised availableModels and currentModelId per route.
discoverModels() is the unified catalog API for HTTP API, ACP, local server routes, and CLI routes that delegate discovery to an ACP sidecar. Use target.transports when you want only one transport family.
Current discovery support matrix:
api: supportedacp: supportedserver: supportedcli: supported when the route config enablestransport.cli.discovery, or when a built-in CLI preset maps discovery to ACP
Built-in CLI discovery defaults:
claude-cli->claude-code-acpcodex-cli->codex-acpopenclaude-cli-> no default discovery bridge
CLI discovery through ACP adds ACP-side prerequisites:
- the ACP executable must exist
- the ACP harness must be authenticated if that provider requires auth
verify()checks route plausibility and handler presence, but it does not perform a live discovery/auth handshake up front
For custom CLI wrappers you can make the public API stay uniform by delegating discovery to an ACP sidecar. For example, a Codex wrapper that delegates to codex-acp:
transport: {
kind: "cli",
id: "my-codex-wrapper",
command: "/opt/bin/codex-wrapper",
cli: {
discovery: {
via: "acp",
acp: {
providerId: "openai",
transportId: "codex-acp",
},
},
},
}
Or a Claude wrapper delegating to claude-code-acp:
transport: {
kind: "cli",
id: "my-claude-wrapper",
command: "/opt/bin/claude-wrapper",
cli: {
discovery: {
via: "acp",
acp: {
providerId: "anthropic",
transportId: "claude-code-acp",
},
},
},
}
ACP routes are treated as harness-owned connections:
ai-connectdoes not injectbaseUrlai-connectdoes not inject provider API keys into ACP- the local ACP tool is responsible for its own auth/session and upstream routing
Tool semantics are intentionally split:
apiroutes support tool schema passthrough viaparameters.toolsapiroutes also support client-managed tools throughclientToolsclientToolsare executed locally byai-connectafter the provider returns tool callsparameters.toolsremains the right path for upstream-managed tool schemas that are not executed byai-connectacproutes support harness-owned tool executionacproutes do not currently forward request-defined tool schema fromparameters.toolscliand current built-inserverroutes do not support tool schema passthrough or tool execution
That distinction is also reflected in route capabilities:
supportsToolSchemasupportsToolExecutionsupportsClientToolExecution
Client-managed tools can be registered on the client and then selected per request:
const client = createBrowserClient(config, {
clientTools: [
{
type: "function",
function: {
name: "lookup_weather",
description: "Return current weather for a city",
parameters: {
type: "object",
properties: {
city: { type: "string" },
},
required: ["city"],
},
},
async execute(args, context) {
return {
data: {
city: String(args.city),
source: "local-cache",
workingDirectory: context.workingDirectory,
},
};
},
},
],
});
const result = await client.generate({
messages: [{ role: "user", content: "Check the weather in Moscow." }],
clientTools: ["lookup_weather"],
});
Current limits:
clientToolsare supported only forgenerate()clientToolsare currently supported only for text requests without attachments/image options- built-in local execution of
clientToolsis implemented for built-in API handlers:openai,anthropic,gemini
Context and MCP Semantics
ai-connect separates transport routing from harness-owned context loading.
ACP routes:
- default to
workspacecontext mode anddefaultskills mode - launch from the configured local cwd, or from the current process cwd when no override is provided
- a request-level
workingDirectoryoverrides that cwd for the current inference call - can therefore pick up project-local context files, rules, and skills that the harness itself knows how to load
- do not automatically inherit MCP servers from the host agent or from the current Codex session
Important ACP boundary:
ai-connectcurrently sendsmcpServers: []in ACPsession/new- this means host-agent MCP integrations are not forwarded into the ACP harness automatically
- if an ACP harness needs tools, skills, or MCP-style integrations, they must come from that harness's own configuration/environment
ACP clean mode:
transport.launch.contextMode: "clean"asksai-connectto isolate cwd/home/config best-effort for supported harnessestransport.launch.skillsMode: "disabled"asksai-connectto suppress harness-owned skills/rules where supported- this is strongest for harnesses where
ai-connecthas provider-specific launch isolation; for others it is best-effort
CLI routes:
- run as one-shot commands from
cli.cwd ?? process.cwd() - a request-level
workingDirectoryoverrides that cwd for the current inference call - can therefore use the current project folder as context if the underlying CLI tool inspects cwd
- do not have a first-class
workspacevscleanlaunch mode today - if you need a clean CLI run, use an isolated
cli.cwd, customcli.env, or a wrapper command
Server routes:
- use whatever context model the local HTTP server implements
- spawned local server processes use
workingDirectory ?? server.cwd ?? process.cwd() ai-connectdoes not define project-context semantics for the server process beyond launch cwd/env overrides
ACP usage statistics are exposed on result.usage when the harness provides them. ai-connect currently normalizes:
- OpenCode ACP
usage_update(used,size,cost)
Examples
See:
- examples/acp-claude.ts
- examples/acp-codex.ts
- examples/browser-basic.ts
- examples/browser-client-tools.ts
- examples/cancellation-timeout.ts
- examples/clean-context.ts
- examples/fanout-throttle.ts
- examples/health-probe.ts
- examples/image-edit-test.ts
- examples/image-test.ts
- examples/image-workflow.ts
- examples/local-acp.ts
- examples/local-client-tools.ts
- examples/local-test-server.ts
- examples/model-context.ts
- examples/model-selector.ts
- examples/pdf-document-input.ts
- examples/public-routes.ts
- examples/rotation-fallback.ts
- examples/streaming-deltas.ts
- examples/usage-accounting.ts
- examples/wide-event-logging.ts
Example execution notes:
- local examples run
verify()first and print missing prerequisites clearly - ACP and other live network examples only execute the real prompt when
AI_CONNECT_RUN_EXAMPLE=1is set - browser examples should be run in an actual browser runtime, not from Bun/Node CLI
Local Test Server Preset
If you are targeting the local gateway at 127.0.0.1:8045, configure direct API routes with transport.baseUrl.
import { createLocalClient, defineConfig } from "@vedmalex/ai-connect";
// Read the local gateway key from the environment — never hardcode a key.
const LOCAL_TEST_API_KEY = process.env.LOCAL_TEST_API_KEY ?? "";
const client = createLocalClient(
defineConfig({
providers: {
openai: {
accounts: [
{
id: "local-openai",
transport: {
kind: "api",
baseUrl: "http://127.0.0.1:8045/v1",
},
models: ["gpt-oss-120b-medium"],
credentials: [{ apiKey: LOCAL_TEST_API_KEY }],
},
],
},
anthropic: {
accounts: [
{
id: "local-anthropic",
transport: {
kind: "api",
baseUrl: "http://127.0.0.1:8045/v1/messages",
},
models: ["claude-sonnet-4-6"],
credentials: [{ apiKey: LOCAL_TEST_API_KEY }],
},
],
},
gemini: {
accounts: [
{
id: "local-gemini",
transport: {
kind: "api",
baseUrl: "http://127.0.0.1:8045/v1beta/models",
},
models: ["gemini-3.1-flash-lite", "gemini-3.1-flash-image"],
credentials: [{ apiKey: LOCAL_TEST_API_KEY }],
},
],
},
},
}),
);
const catalog = await client.discoverModels({
transports: ["api"],
});
console.log(
catalog.routes.flatMap((route) => route.availableModels.map((model) => model.modelId)),
);
Publishing
Full release runbook (OIDC trusted publishing, cutting a release, caveats):
docs/publishing.md.
@vedmalex/ai-connect ships as a public scoped npm package. The package metadata enforces the publish boundary:
publishConfig.accessispublic, which is required for a scoped name to publish without an explicit--access publicflag.prepublishOnlyrunsbun run check && bun run buildbefore any publish.bun run checkrunsbun run typecheckfollowed by the full test suite, including every Deterministic Simulation Testing (DST) scenario (DST scenarios are plainbun:testcases, sobun testexercises them transitively).bun run buildcompiles the runtime bundles and the type declarations.
Because prepublishOnly gates on the entire check + build pipeline, a publish is only possible when the whole suite is green and the distributable output is freshly built.
Only the built output ships. files is restricted to dist, so src, tests, and workspace tooling are excluded from the published tarball. You can confirm the contents before publishing:
npm pack --dry-run
Both npm publish and bun publish honor the prepublishOnly gate.