0.4.0 • Published 5 months ago

@pipecat-ai/openai-realtime-webrtc-transport v0.4.0

Weekly downloads
-
License
BSD-2-Clause
Repository
github
Last release
5 months ago

OpenAI RealTime WebRTC Transport

Docs Demo NPM Version

A real-time websocket transport implementation for interacting with Google's Gemini Multimodal Live API, supporting bidirectional audio and unidirectional text communication.

Installation

npm install \
@pipecat-ai/client-js \
@pipecat-ai/openai-realtime-webrtc-transport

Overview

The OpenAIRealTimeWebRTCTransport is a fully functional RTVI Transport. It provides a framework for implementing real-time communication directly with the OpenAI Realtime API using WebRTC voice-to-voice service. It handles media device management, audio/video streams, and state management for the connection.

Features

  • Real-time bidirectional communication with OpenAI Realtime API
  • Input device management
  • Audio streaming support
  • Text message support
  • Automatic reconnection handling
  • Configurable generation parameters
  • Support for initial conversation context

Usage

Basic Setup

import { OpenAIRealTimeWebRTCTransport, OpenAIServiceOptions } from '@pipecat-ai/openai-realtime-webrtc-transport';

const options: OpenAIServiceOptions = {
  api_key: 'YOUR_API_KEY',
  session_config: {
    instructions: 'you are a confused jellyfish',
  }
};

const transport = new OpenAIRealTimeWebRTCTransport(options);
let RTVIConfig: RTVIClientOptions = {
  transport,
  ...
};

Configuration Options

/**********************************
 * OpenAI-specific types
 *   types and comments below are based on:
 *     gpt-4o-realtime-preview-2024-12-17
 **********************************/
type JSONSchema = { [key: string]: any };
export type OpenAIFunctionTool = {
  type: "function";
  name: string;
  description: string;
  parameters: JSONSchema;
};

export type OpenAIServerVad = {
  type: "server_vad";
  create_response?: boolean; // defaults to true
  interrupt_response?: boolean; // defaults to true
  prefix_padding_ms?: number; // defaults to 300ms
  silence_duration_ms?: number; // defaults to 500ms
  threshold?: number; // range (0.0, 1.0); defaults to 0.5
};

export type OpenAISemanticVAD = {
  type: "semantic_vad";
  eagerness?: "low" | "medium" | "high" | "auto"; // defaults to "auto", equivalent to "medium"
  create_response?: boolean; // defaults to true
  interrupt_response?: boolean; // defaults to true
};

export type OpenAISessionConfig = Partial<{
  modalities?: string;
  instructions?: string;
  voice?:
    | "alloy"
    | "ash"
    | "ballad"
    | "coral"
    | "echo"
    | "sage"
    | "shimmer"
    | "verse";
  input_audio_noise_reduction?: {
    type: "near_field" | "far_field";
  } | null; // defaults to null/off
  input_audio_transcription?: {
    model: "whisper-1" | "gpt-4o-transcribe" | "gpt-4o-mini-transcribe";
    language?: string;
    prompt?: string[] | string; // gpt-4o models take a string
  } | null; // we default this to gpt-4o-transcribe
  turn_detection?: OpenAIServerVad | OpenAISemanticVAD | null; // defaults to server_vad
  temperature?: number;
  max_tokens?: number | "inf";
  tools?: Array<OpenAIFunctionTool>;
}>;

export interface OpenAIServiceOptions {
  api_key: string;
  model?: string;
  initial_messages?: LLMContextMessage[];
  settings?: OpenAISessionConfig;
}

Sending Messages

// at setup time...
llmHelper = new LLMHelper({});
rtviClient.registerHelper("llm", llmHelper);
// the 'llm' name in this call above isn't used.
//that value is specific to working with a pipecat pipeline

// at time of sending message...
// Send text prompt message
llmHelper.appendToMessages({ role: "user", content: 'Hello OpenAI!' });

Handling Events

The transport implements the various RTVI event handlers. Check out the docs or samples for more info.

Updating Session Configuration

transport.updateSessionConfig({
  instructions: 'you are a an over-sharing neighbor',
  input_audio_noise_reduction: {
    type: 'near_field'
  }
});

API Reference

Methods

  • initialize(): Set up the transport and establish connection
  • sendMessage(message): Send a text message
  • handleUserAudioStream(data): Stream audio data to the model
  • disconnectLLM(): Close the connection
  • sendReadyMessage(): Signal ready state

States

The transport can be in one of the following states:

  • "disconnected"
  • "initializing"
  • "initialized"
  • "connecting"
  • "connected"
  • "ready"
  • "disconnecting
  • "error"

Error Handling

The transport includes comprehensive error handling for:

  • Connection failures
  • WebRTC connection errors
  • API key validation
  • Message transmission errors

License

BSD-2 Clause

0.4.0

5 months ago

0.3.6

8 months ago

0.3.5

8 months ago

0.3.4

11 months ago