0.4.0 • Published 5 months ago
@pipecat-ai/openai-realtime-webrtc-transport v0.4.0
OpenAI RealTime WebRTC Transport
A real-time websocket transport implementation for interacting with Google's Gemini Multimodal Live API, supporting bidirectional audio and unidirectional text communication.
Installation
npm install \
@pipecat-ai/client-js \
@pipecat-ai/openai-realtime-webrtc-transportOverview
The OpenAIRealTimeWebRTCTransport is a fully functional RTVI Transport. It provides a framework for implementing real-time communication directly with the OpenAI Realtime API using WebRTC voice-to-voice service. It handles media device management, audio/video streams, and state management for the connection.
Features
- Real-time bidirectional communication with OpenAI Realtime API
- Input device management
- Audio streaming support
- Text message support
- Automatic reconnection handling
- Configurable generation parameters
- Support for initial conversation context
Usage
Basic Setup
import { OpenAIRealTimeWebRTCTransport, OpenAIServiceOptions } from '@pipecat-ai/openai-realtime-webrtc-transport';
const options: OpenAIServiceOptions = {
api_key: 'YOUR_API_KEY',
session_config: {
instructions: 'you are a confused jellyfish',
}
};
const transport = new OpenAIRealTimeWebRTCTransport(options);
let RTVIConfig: RTVIClientOptions = {
transport,
...
};Configuration Options
/**********************************
* OpenAI-specific types
* types and comments below are based on:
* gpt-4o-realtime-preview-2024-12-17
**********************************/
type JSONSchema = { [key: string]: any };
export type OpenAIFunctionTool = {
type: "function";
name: string;
description: string;
parameters: JSONSchema;
};
export type OpenAIServerVad = {
type: "server_vad";
create_response?: boolean; // defaults to true
interrupt_response?: boolean; // defaults to true
prefix_padding_ms?: number; // defaults to 300ms
silence_duration_ms?: number; // defaults to 500ms
threshold?: number; // range (0.0, 1.0); defaults to 0.5
};
export type OpenAISemanticVAD = {
type: "semantic_vad";
eagerness?: "low" | "medium" | "high" | "auto"; // defaults to "auto", equivalent to "medium"
create_response?: boolean; // defaults to true
interrupt_response?: boolean; // defaults to true
};
export type OpenAISessionConfig = Partial<{
modalities?: string;
instructions?: string;
voice?:
| "alloy"
| "ash"
| "ballad"
| "coral"
| "echo"
| "sage"
| "shimmer"
| "verse";
input_audio_noise_reduction?: {
type: "near_field" | "far_field";
} | null; // defaults to null/off
input_audio_transcription?: {
model: "whisper-1" | "gpt-4o-transcribe" | "gpt-4o-mini-transcribe";
language?: string;
prompt?: string[] | string; // gpt-4o models take a string
} | null; // we default this to gpt-4o-transcribe
turn_detection?: OpenAIServerVad | OpenAISemanticVAD | null; // defaults to server_vad
temperature?: number;
max_tokens?: number | "inf";
tools?: Array<OpenAIFunctionTool>;
}>;
export interface OpenAIServiceOptions {
api_key: string;
model?: string;
initial_messages?: LLMContextMessage[];
settings?: OpenAISessionConfig;
}Sending Messages
// at setup time...
llmHelper = new LLMHelper({});
rtviClient.registerHelper("llm", llmHelper);
// the 'llm' name in this call above isn't used.
//that value is specific to working with a pipecat pipeline
// at time of sending message...
// Send text prompt message
llmHelper.appendToMessages({ role: "user", content: 'Hello OpenAI!' });Handling Events
The transport implements the various RTVI event handlers. Check out the docs or samples for more info.
Updating Session Configuration
transport.updateSessionConfig({
instructions: 'you are a an over-sharing neighbor',
input_audio_noise_reduction: {
type: 'near_field'
}
});API Reference
Methods
initialize(): Set up the transport and establish connectionsendMessage(message): Send a text messagehandleUserAudioStream(data): Stream audio data to the modeldisconnectLLM(): Close the connectionsendReadyMessage(): Signal ready state
States
The transport can be in one of the following states:
- "disconnected"
- "initializing"
- "initialized"
- "connecting"
- "connected"
- "ready"
- "disconnecting
- "error"
Error Handling
The transport includes comprehensive error handling for:
- Connection failures
- WebRTC connection errors
- API key validation
- Message transmission errors
License
BSD-2 Clause