@fciannella/nvidia-easy-ar-tts-client v1.0.15
NVIDIA Easy AR TTS – TypeScript/JavaScript Streaming Client
@fciannella/nvidia-easy-ar-tts-client is a tiny, zero‑dependency helper library that makes it trivial to talk to NVIDIA's Easy AR Text‑To‑Speech HTTP endpoint and play the audio back while it is still streaming.
The package is delivered as both ESM and CommonJS, ships its own TypeScript declarations (no @types needed) and can be used from the browser or Node.js.
Installation
# with npm
npm install @fciannella/nvidia-easy-ar-tts-client
# or with yarn
yarn add @fciannella/nvidia-easy-ar-tts-clientQuick start (browser)
import { TTSClient } from "@fciannella/nvidia-easy-ar-tts-client";
// Create the client once and reuse it for all requests
const tts = new TTSClient({
apiUrl: "https://riva.nvidia.com/tts", // <-- your Easy AR TTS endpoint
apiKey: "YOUR_SECRET_TOKEN" // optional – only if your endpoint is protected
});
await tts.play({
text: "Hello there – I'm streaming while I speak!", // required
voice: "English-US.Female-1", // required
emotion: "neutral", // optional
description: "friendly voice" // optional (display name shown in UI)
});Under the hood the library will:
POSTthe synthesis request and keep the connection open.- Parse server‑sent‑events (SSE) coming back from the service.
- Convert each
audio_chunktoFloat32Arraysamples. - Feed the samples to an
AudioWorkletso you can listen in real time.
❗ AudioWorklet requirement – streaming playback relies on the Web‑Audio API, therefore the quick‑start above needs to run in a modern browser (Chrome, Edge, Firefox, Safari, …).
Usage from Node.js
You can still stream and save the audio in Node.js – you just won't hear it in real time.
import { TTSClient } from "@fciannella/nvidia-easy-ar-tts-client";
import { writeFileSync } from "node:fs";
const tts = new TTSClient({ apiUrl: "https://riva.nvidia.com/tts" });
const chunks = [];
for await (const chunk of tts.synthesize({
text: "This file has been assembled in Node.js!",
voice: "English-US.Male-1",
})) {
if (chunk.samples.length) chunks.push(chunk);
}
const wavBlob = tts.assembleWav(chunks);
writeFileSync("out.wav", Buffer.from(await wavBlob.arrayBuffer()));
console.log("Saved → out.wav");API reference
new TTSClient(options)
| option | type | required | description |
|---|---|---|---|
apiUrl | string | yes | Base URL of the Easy AR TTS endpoint (e.g. https://riva.nvidia.com/tts). Do not include a trailing slash – the library will strip it anyway. |
apiKey | string | no | Bearer token that will be sent as Authorization: Bearer <token> if provided. |
tts.play(synthOptions, onChunk?) → Promise<void>
Streams audio, plays it immediately using an AudioWorklet and resolves when synthesis is complete.
| synthOptions field | type | required | description |
|---|---|---|---|
text | string | yes | Text to be spoken. |
voice | string | yes | Actor/voice identifier as expected by your service. |
emotion | string | no | Optional emotion code accepted by the API. |
description | string | no | Free‑form description (showed in dashboards, logs, …). |
onChunk (optional) is a callback that will be invoked for every AudioChunk that is played. An AudioChunk looks like this:
interface AudioChunk {
samples: Float32Array; // PCM samples in the range –1…+1
isFirstChunk: boolean;
isLastChunk: boolean;
}tts.synthesize(synthOptions) → AsyncIterable<AudioChunk>
Low‑level method that yields chunks as soon as they arrive from the network. Useful when you need manual control (e.g. saving to disk, visualising a waveform, custom DSP, …).
tts.assembleWav(chunks) → Blob
Utility that concatenates the received Float32Arrays and returns a WAV file in a Blob. In the browser you can generate an object URL with URL.createObjectURL(blob); in Node.js convert it to Buffer as shown above.
Building from source
git clone https://github.com/fciannella/nvidia-easy-ar-tts-client.git
cd nvidia-easy-ar-tts-client
npm install
npm run buildThe build step uses tsup to generate:
dist/
├── index.js # ESM (imports)
├── index.cjs.js # CommonJS (requires)
├── index.d.ts # Types
└── …License
MIT © 2024 Francesco Ciannella
Acknowledgements
This library is an independent, open‑source project and is not affiliated with NVIDIA in any way. All trademarks belong to their respective owners.
Streaming Chat (text + audio)
The repository also ships AudioChatClient — a thin wrapper around the OpenAI SDK that
hits an NVIDIA Cloud Function endpoint which returns both text and audio chunks in
real-time.
Prerequisites
npm install speaker # required only in Node.js to play audio
# set three env vars used by the client / CLI script
export NVCF_KEY=<your_ngc_token>
export OPENAI_PROXY_KEY=<inner_openai_key_expected_by_the_service>
export NVCF_CHAT_BASE_URL=<full_invocation_url> # optional (defaults to sample URL)1. Programmatic usage
import { AudioChatClient } from "@fciannella/nvidia-easy-ar-tts-client";
import Speaker from "speaker";
const speaker = new Speaker({
channels: 1,
sampleRate: 44_100,
bitDepth: 32,
signed: true,
float: true,
});
const chat = new AudioChatClient({
systemPrompt: "You are a helpful assistant.",
actorName: "Emma World Class",
emotion: "Narrative",
ngcKey: process.env.NVCF_KEY!,
proxyKey: process.env.OPENAI_PROXY_KEY!,
baseURL: process.env.NVCF_CHAT_BASE_URL, # optional override
});
await chat.chat("Hi there!", {
onText: chunk => process.stdout.write(chunk),
onAudio: pcm => speaker.write(Buffer.from(pcm.buffer)),
});2. Built-in CLI helper
npm run chat # REPL — press Enter on an empty line to quit
npm run chat -- "Hello!" # one-off single turnThe script streams the assistant reply to stdout while simultaneously playing audio through your default output device.
Additional CLI flags
| flag | description |
|---|---|
--ulaw | Convert the 44.1 kHz PCM stream returned by NVIDIA into 8 kHz G.711 μ-law. Handy if you need to forward the audio to Twilio. |
--ulawFile <path> | In combination with --ulaw dumps the raw μ-law bytes to the given file for further inspection / integration tests. |
Example:
# Interactive chat with μ-law output + dump to tmp.raw
yarn chat -- --ulaw --ulawFile tmp.raw3. Streaming straight into Twilio (μ-law example)
import { AudioChatClient } from "@fciannella/nvidia-easy-ar-tts-client";
import ws from "ws"; // npm install ws
// Outbound Media Stream coming from <Stream> TwiML verb
const twilioSocket = new ws("wss://<your-twilio-stream-url>");
const chat = new AudioChatClient({
systemPrompt: "You are Chris, the upbeat assistant…",
actorName: "Emma World Class",
emotion: "Narrative",
ngcKey: process.env.NVCF_KEY!,
proxyKey: process.env.OPENAI_PROXY_KEY!,
// ↓ ask the client to down-sample + μ-law encode on-the-fly
outputEncoding: "ulaw",
});
await chat.chat("Hi Twilio!", {
onAudio: ulaw => twilioSocket.send(ulaw) // Uint8Array (PCMU 8 kHz)
});The helper will take care of:
- Converting NVIDIA's 44.1 kHz PCM16 → Float-32.
- Down-sampling to 8 kHz using linear interpolation.
- Encoding to G.711 μ-law (
Uint8Array).
The resulting bytes can be sent straight to a Twilio Programmable Voice / Media Stream without any extra transcoding.