1.0.2 • Published 1 month ago

@silyze/async-audio-openai v1.0.2

Weekly downloads
-
License
MIT
Repository
-
Last release
1 month ago

Async Audio OpenAI

@silyze/async-audio-openai is a full‑duplex wrapper around the \ OpenAI Realtime API \ built on @silyze/async-audio-stream.
It streams µ‑law (G.711) audio to the /v1/realtime WebSocket endpoint, \ receives delta/response frames, and exposes them through the familiar \ AudioStream interface – so you can plug it straight into the rest of the \ @silyze/* async‑stream ecosystem.

  • ✔️ 8 kHz µ‑law audio codec via @silyze/async-audio-format-ulaw
  • ✔️ Real‑time speech transcripts (transcript stream)
  • ✔️ writeMessage() helper for sending text prompts
  • ✔️ Bridging OpenAI Function Calls with \ @silyze/async-openai-function-stream
  • ✔️ DTMF helper for telephony integrations
  • ✔️ Abort‑signal aware and back‑pressure friendly

Install

npm install @silyze/async-audio-openai \
            @silyze/async-openai-function-stream  # ← if you need function calls

Requires Node ≥ 18 and the WS polyfill \ (peerDependency) when used outside browsers.


Quick Start

import OpenAiStream, {
  OpenAiVoice,
  OpenAiSessionConfig,
} from "@silyze/async-audio-openai";
import { OpusFormat } from "@silyze/async-audio-format-webm";

const key = process.env.OPENAI_API_KEY!;
const voice: OpenAiVoice = "alloy";

const session: OpenAiSessionConfig = {
  voice,
  model: "gpt-4o-audio-preview",
  instructions: "You are a helpful voice assistant.",
};

// 1.  Create the realtime connection & audio stream.
const { stream, socket } = await OpenAiStream.create(key, session, []);

// 2.  Capture microphone in µ‑law and pipe to OpenAI.
getMicrophoneStream().pipe(stream);

// 3.  Handle transcript lines.
stream.transcript.forEach((line) => {
  console.log(`[${line.source}] ${line.content}`);
});

OpenAI Function Calls

Combine OpenAiStream with @silyze/async-openai-function-stream to \ transparently proxy function calls:

import {
  createJsonStreamFunctions,
  FunctionStream,
} from "@silyze/async-openai-function-stream";

// Expose local functions to the AI
const { tools, collection } = createJsonStreamFunctions(
  [
    {
      type: "function",
      name: "get_time",
      description: "Return the current ISO date/time.",
      parameters: { type: "object", properties: {}, required: [] },
    },
  ],
  wire.reverse()
);

const wire = new FunctionStream(collection); // JSON ⬅️➡️  OpenAI frames
wire.pipe(stream);

// Tell OpenAI about them:
await stream.update(session, tools);

// Later – handle calls:
wire.forEach((frame) => {
  // frame = { id, name, value }
  if (frame.name === "get_time") {
    wire.write({ ...frame, value: new Date().toISOString() });
  }
});

Class Reference

class OpenAiStream implements AudioStream

Property / MethodTypeDescription
formatULawFormatAlways µ‑law 8 kHz.
readyPromise<void>Resolves when the session.updated event arrives.
transcriptAsyncReadStream<Transcript>Emits {source, content} for both user & agent transcription lines.
write(chunk)(Buffer) → Promise<void>Write raw µ‑law audio.
writeMessage(role, text)( "user" \| "system", string ) → Promise<void>Send a text message to the conversation.
handleFunctionCalls(functionStream)Promise<void>Bidirectional pipe of OpenAI function‑call frames.
handleDtmf(dtmfStream, signal)Promise<void>Converts DTMF digits into system messages.
update(config, tools)Promise<void>Hot‑update session parameters (voice, temperature, …).
static create(key, cfg, tools)Promise<{stream, socket}>Opens the WS & performs initial session.update.
static from(ws)Promise<OpenAiStream>Wrap an existing WebSocket.

Session Config

interface OpenAiSessionConfig {
  voice: OpenAiVoice; // "alloy" | "echo" | …
  instructions: string; // system prompt
  model: string; // e.g. "gpt-4o-audio-preview"
  temperature?: number; // default 0.8
  transcriptionModel?: string; // default "whisper-1"
  functions?: FunctionTool[]; // OpenAI function‑call tools
}

Related Packages

PackagePurpose
@silyze/async-audio-streamCore duplex audio interfaces.
@silyze/async-audio-format-ulawµ‑law codec used internally.
@silyze/async-openai-function-streamJSON‑RPC bridge for OpenAI Function Calls.