0.2.0 • Published 7 months ago
@ericedouard/vad-node-realtime v0.2.0
Forked from https://github.com/ricky0123/vad which supports web and react. It used to support node, but it wasn't realtime. Here it's based for node and realtime.
See the project home for more details.
Features
- Real-time and non-real-time voice activity detection
- Built on the Silero VAD model
- Easy to use API
- Works completely offline
- Efficient processing for server environments
Installation
npm install @ericedouard/vad-node-realtimeUsage
Real-time VAD
Use RealTimeVAD when you need to process audio chunks in real time, such as receiving audio from a client application:
const { RealTimeVAD } = require('@eric-edouard/vad-node-realtime');
async function example() {
// Create a new RealTimeVAD instance
const vad = await RealTimeVAD.new({
onSpeechStart: () => {
console.log('Speech started');
},
onSpeechEnd: (audio) => {
console.log('Speech ended, received audio of length:', audio.length);
// Process the audio data here
},
// Optional: customize VAD parameters
positiveSpeechThreshold: 0.6,
negativeSpeechThreshold: 0.4,
minSpeechFrames: 4,
});
// Start processing
vad.start();
// When you receive audio chunks from your source:
function onAudioChunkReceived(audioChunk) {
// Process each chunk of audio data
// audioChunk should be a Float32Array with sample rate matching the sampleRate option (default: 16000Hz)
await vad.processAudio(audioChunk);
}
// When you're done with the stream:
await vad.flush(); // Process any remaining audio
vad.destroy(); // Clean up resources
}
example();Non-real-time VAD
For processing entire audio files or pre-recorded chunks:
const { NonRealTimeVAD } = require('@eric-edouard/vad-node-realtime');
async function example() {
const vad = await NonRealTimeVAD.new();
// audioData is a Float32Array of audio samples
// sampleRate is the sample rate of the audio
for await (const { audio, start, end } of vad.run(audioData, sampleRate)) {
console.log(`Speech detected from ${start}ms to ${end}ms`);
// Process detected speech segment
}
}API Reference
RealTimeVAD
RealTimeVAD.new(options): Create a new RealTimeVAD instancestart(): Start processing audiopause(): Pause processing audioprocessAudio(audioData): Process a chunk of audio dataflush(): Process any remaining audio and trigger final callbacksreset(): Reset the VAD statedestroy(): Clean up resources
RealTimeVADOptions
sampleRate: Sample rate of the input audio (default: 16000, inputs with different sample rates will be automatically resampled)onSpeechStart: Callback when speech startsonSpeechEnd: Callback when speech ends, with the audio dataonVADMisfire: Callback when speech was detected but was too shortonFrameProcessed: Callback after each frame is processedpositiveSpeechThreshold: Threshold for detecting speech (0-1)negativeSpeechThreshold: Threshold for detecting silence (0-1)minSpeechFrames: Minimum number of frames to consider as speech
License
ISC