@fciannella/nvidia-asr-client NPM

NVIDIA ASR Client

Minimal cross-platform wrapper around NVIDIA/Riva streaming ASR WebSocket API with optional client-side silence detection.

Features

Works in Node.js and browsers without any additional dependencies
Built-in audio resampling
Support for different input formats (f32, PCM_s16, G.711 μ-law)
Client-side silence detection to determine when utterances are complete
Minimal footprint with no external dependencies in browser

Installation

npm install @fciannella/nvidia-asr-client

Usage (Browser)

Modern ES Modules Approach

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <!-- Prevent browser caching during development -->
  <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
  <meta http-equiv="Pragma" content="no-cache">
  <meta http-equiv="Expires" content="0">
</head>
<body>
  <div id="transcript"></div>
  <button id="startBtn">Start</button>
  <button id="stopBtn" disabled>Stop</button>

  <script type="module">
    // Dynamic import with cache-busting during development
    const moduleUrl = './node_modules/@fciannella/nvidia-asr-client/dist/index.js?' + Date.now();
    const { NvidiaAsrClient } = await import(moduleUrl);
    
    let asr = null;
    let stopFn = null;
    
    async function startASR() {
      // Setup ASR client
      asr = new NvidiaAsrClient({
        websocketUrl: 'wss://your-riva-endpoint/v1/speech_recognition/streaming_multi',
        languageCode: 'en-US', // Change to 'it-IT', 'es-ES', etc. if supported by server
        silenceTimeout: 1.5,
        closeOnSilence: false,
      });
      
      asr.on('partial', (e) => {
        document.getElementById('transcript').textContent = e.text;
      });
      
      asr.on('final', (e) => {
        document.getElementById('transcript').textContent = e.text;
      });
      
      // Connect and setup WebAudio
      await asr.connect();
      const audioContext = new (window.AudioContext || window.webkitAudioContext)();
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      const source = audioContext.createMediaStreamSource(stream);
      const processor = audioContext.createScriptProcessor(4096, 1, 1);
      
      processor.onaudioprocess = (e) => {
        const float32Data = new Float32Array(e.inputBuffer.getChannelData(0));
        asr.write(float32Data, audioContext.sampleRate);
      };
      
      source.connect(processor);
      processor.connect(audioContext.destination);
      
      // Return cleanup function
      return () => {
        processor.disconnect();
        source.disconnect();
        stream.getTracks().forEach(track => track.stop());
        asr.finish();
        setTimeout(() => asr.end(), 1500);
      };
    }
    
    document.getElementById('startBtn').addEventListener('click', async () => {
      document.getElementById('startBtn').disabled = true;
      document.getElementById('stopBtn').disabled = false;
      stopFn = await startASR();
    });
    
    document.getElementById('stopBtn').addEventListener('click', () => {
      if (stopFn) {
        stopFn();
        stopFn = null;
        document.getElementById('startBtn').disabled = false;
        document.getElementById('stopBtn').disabled = true;
      }
    });
  </script>
</body>
</html>

A complete example is available in examples/browser-example.html.

Notes on Browser Usage

WebSocket Endpoint: Ensure your Riva server allows cross-origin requests from your web application.
Caching: During development, use cache-busting techniques as shown in the example.
Language Selection: The server must support the language code you specify. Not all deployments support all languages.
Audio Context: Modern browsers require a user gesture (like a button click) before allowing audio capture.

Usage (Node.js)

For Node.js usage, you'll need to install the optional dependencies:

npm install ws mic

import { NvidiaAsrClient } from '@fciannella/nvidia-asr-client';
import mic from 'mic';

const SAMPLE_RATE = 16000;

const asr = new NvidiaAsrClient({
  websocketUrl: 'wss://your-riva-endpoint/v1/speech_recognition/streaming_multi',
  languageCode: 'en-US',
  silenceTimeout: 1.5,
  closeOnSilence: false,
});

asr.on('partial', (e) => {
  process.stdout.write(`\r[${e.serverFinal ? 'FINAL' : 'PARTIAL'}] ${e.text}        `);
});

asr.on('final', (e) => {
  console.log(`\n[USER_FINAL] ${e.text}`);
});

asr.on('silence', () => {
  console.log('\n--- silence detected ---');
});

asr.on('error', (err) => console.error('ASR error', err));

(async () => {
  await asr.connect();

  const micInstance = mic({
    rate: String(SAMPLE_RATE),
    channels: '1',
    encoding: 'signed-integer',
    bitwidth: 16,
    endian: 'little',
    fileType: 'raw',
  });

  const stream = micInstance.getAudioStream();
  stream.on('data', (buf) => {
    // convert Int16 PCM -> Float32 [-1,1]
    const int16 = new Int16Array(buf.buffer, buf.byteOffset, buf.byteLength / 2);
    const float32 = new Float32Array(int16.length);
    for (let i = 0; i < int16.length; i++) float32[i] = int16[i] / 0x8000;
    asr.write(float32, SAMPLE_RATE);
  });

  micInstance.start();

  process.on('SIGINT', () => {
    micInstance.stop();
    asr.finish();
    setTimeout(() => process.exit(0), 1500);
  });
})();

API

Constructor

new NvidiaAsrClient(options: NvidiaAsrOptions)

Options

interface NvidiaAsrOptions {
  websocketUrl?: string;            // Required: Your Riva endpoint URL
  languageCode?: string;            // Default: 'en-US'
  silenceTimeout?: number;          // Seconds of inactivity before finalizing
  closeOnSilence?: boolean;         // Default: true
  inputFormat?: 'f32' | 'pcm_s16' | 'g711_ulaw'; // Default: 'f32'
  inputSampleRate?: number;         // Default: 16000
  targetSampleRate?: number;        // Default: 16000
}

Methods

connect(): Promise - Opens WebSocket and sends configuration packet
write(chunk, sampleRate?): void - Send audio data to the ASR service
finish(): void - Signal end-of-audio but keep the socket open
end(): void - Flushes EOS marker and closes the WebSocket immediately

Events

partial: { text: string, serverFinal: boolean }
final: { text: string }
silence: Emitted when silence is detected
error: Error event

Troubleshooting

Language Support

If specifying a non-English language code (e.g., 'it-IT', 'es-ES') doesn't result in transcription in that language, the issue is likely on the server side: