0.1.9 • Published 5 months ago

@fciannella/nvidia-asr-client v0.1.9

Weekly downloads
-
License
MIT
Repository
-
Last release
5 months ago

NVIDIA ASR Client

Minimal cross-platform wrapper around NVIDIA/Riva streaming ASR WebSocket API with optional client-side silence detection.

Features

  • Works in Node.js and browsers without any additional dependencies
  • Built-in audio resampling
  • Support for different input formats (f32, PCM_s16, G.711 μ-law)
  • Client-side silence detection to determine when utterances are complete
  • Minimal footprint with no external dependencies in browser

Installation

npm install @fciannella/nvidia-asr-client

Usage (Browser)

Modern ES Modules Approach

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <!-- Prevent browser caching during development -->
  <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
  <meta http-equiv="Pragma" content="no-cache">
  <meta http-equiv="Expires" content="0">
</head>
<body>
  <div id="transcript"></div>
  <button id="startBtn">Start</button>
  <button id="stopBtn" disabled>Stop</button>

  <script type="module">
    // Dynamic import with cache-busting during development
    const moduleUrl = './node_modules/@fciannella/nvidia-asr-client/dist/index.js?' + Date.now();
    const { NvidiaAsrClient } = await import(moduleUrl);
    
    let asr = null;
    let stopFn = null;
    
    async function startASR() {
      // Setup ASR client
      asr = new NvidiaAsrClient({
        websocketUrl: 'wss://your-riva-endpoint/v1/speech_recognition/streaming_multi',
        languageCode: 'en-US', // Change to 'it-IT', 'es-ES', etc. if supported by server
        silenceTimeout: 1.5,
        closeOnSilence: false,
      });
      
      asr.on('partial', (e) => {
        document.getElementById('transcript').textContent = e.text;
      });
      
      asr.on('final', (e) => {
        document.getElementById('transcript').textContent = e.text;
      });
      
      // Connect and setup WebAudio
      await asr.connect();
      const audioContext = new (window.AudioContext || window.webkitAudioContext)();
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      const source = audioContext.createMediaStreamSource(stream);
      const processor = audioContext.createScriptProcessor(4096, 1, 1);
      
      processor.onaudioprocess = (e) => {
        const float32Data = new Float32Array(e.inputBuffer.getChannelData(0));
        asr.write(float32Data, audioContext.sampleRate);
      };
      
      source.connect(processor);
      processor.connect(audioContext.destination);
      
      // Return cleanup function
      return () => {
        processor.disconnect();
        source.disconnect();
        stream.getTracks().forEach(track => track.stop());
        asr.finish();
        setTimeout(() => asr.end(), 1500);
      };
    }
    
    document.getElementById('startBtn').addEventListener('click', async () => {
      document.getElementById('startBtn').disabled = true;
      document.getElementById('stopBtn').disabled = false;
      stopFn = await startASR();
    });
    
    document.getElementById('stopBtn').addEventListener('click', () => {
      if (stopFn) {
        stopFn();
        stopFn = null;
        document.getElementById('startBtn').disabled = false;
        document.getElementById('stopBtn').disabled = true;
      }
    });
  </script>
</body>
</html>

A complete example is available in examples/browser-example.html.

Notes on Browser Usage

  1. WebSocket Endpoint: Ensure your Riva server allows cross-origin requests from your web application.
  2. Caching: During development, use cache-busting techniques as shown in the example.
  3. Language Selection: The server must support the language code you specify. Not all deployments support all languages.
  4. Audio Context: Modern browsers require a user gesture (like a button click) before allowing audio capture.

Usage (Node.js)

For Node.js usage, you'll need to install the optional dependencies:

npm install ws mic
import { NvidiaAsrClient } from '@fciannella/nvidia-asr-client';
import mic from 'mic';

const SAMPLE_RATE = 16000;

const asr = new NvidiaAsrClient({
  websocketUrl: 'wss://your-riva-endpoint/v1/speech_recognition/streaming_multi',
  languageCode: 'en-US',
  silenceTimeout: 1.5,
  closeOnSilence: false,
});

asr.on('partial', (e) => {
  process.stdout.write(`\r[${e.serverFinal ? 'FINAL' : 'PARTIAL'}] ${e.text}        `);
});

asr.on('final', (e) => {
  console.log(`\n[USER_FINAL] ${e.text}`);
});

asr.on('silence', () => {
  console.log('\n--- silence detected ---');
});

asr.on('error', (err) => console.error('ASR error', err));

(async () => {
  await asr.connect();

  const micInstance = mic({
    rate: String(SAMPLE_RATE),
    channels: '1',
    encoding: 'signed-integer',
    bitwidth: 16,
    endian: 'little',
    fileType: 'raw',
  });

  const stream = micInstance.getAudioStream();
  stream.on('data', (buf) => {
    // convert Int16 PCM -> Float32 [-1,1]
    const int16 = new Int16Array(buf.buffer, buf.byteOffset, buf.byteLength / 2);
    const float32 = new Float32Array(int16.length);
    for (let i = 0; i < int16.length; i++) float32[i] = int16[i] / 0x8000;
    asr.write(float32, SAMPLE_RATE);
  });

  micInstance.start();

  process.on('SIGINT', () => {
    micInstance.stop();
    asr.finish();
    setTimeout(() => process.exit(0), 1500);
  });
})();

API

Constructor

new NvidiaAsrClient(options: NvidiaAsrOptions)

Options

interface NvidiaAsrOptions {
  websocketUrl?: string;            // Required: Your Riva endpoint URL
  languageCode?: string;            // Default: 'en-US'
  silenceTimeout?: number;          // Seconds of inactivity before finalizing
  closeOnSilence?: boolean;         // Default: true
  inputFormat?: 'f32' | 'pcm_s16' | 'g711_ulaw'; // Default: 'f32'
  inputSampleRate?: number;         // Default: 16000
  targetSampleRate?: number;        // Default: 16000
}

Methods

  • connect(): Promise - Opens WebSocket and sends configuration packet
  • write(chunk, sampleRate?): void - Send audio data to the ASR service
  • finish(): void - Signal end-of-audio but keep the socket open
  • end(): void - Flushes EOS marker and closes the WebSocket immediately

Events

  • partial: { text: string, serverFinal: boolean }
  • final: { text: string }
  • silence: Emitted when silence is detected
  • error: Error event

Troubleshooting

Language Support

If specifying a non-English language code (e.g., 'it-IT', 'es-ES') doesn't result in transcription in that language, the issue is likely on the server side:

  1. The server may not have that language model loaded
  2. The server may be configured to ignore client language settings
  3. The specific language may not be supported by your Riva deployment

Contact your Riva server administrator to confirm which languages are available.

Browser Caching

When developing or updating the client, use cache-busting techniques:

  1. Add timestamp query parameters to imports: import(...)?v=${Date.now()}
  2. Use cache control meta tags in your HTML
  3. Run your development server with cache disabled (e.g., http-server -c-1)
  4. Use browser developer tools to clear cache and perform hard reloads

License

MIT

0.1.9

5 months ago

0.1.8

5 months ago

0.1.7

5 months ago

0.1.5

5 months ago

0.1.4

6 months ago

0.1.3

6 months ago

0.1.2

6 months ago

0.1.1

6 months ago