Audio-sentence-detector NPM

Audio Sentence Detector

An advanced audio sentence detection library that uses voice activity detection, silence analysis, and acoustic features to segment audio into sentences.

Installation

npm install audio-sentence-detector

Usage

const AudioSentenceDetector = require('audio-sentence-detector');

// Create detector with custom options
const detector = new AudioSentenceDetector({
    minSilenceDuration: 0.5,
    silenceThreshold: 0.01
});

// Process audio buffer
const sentences = await detector.detect(audioBuffer);

Configuration Options

The AudioSentenceDetector constructor accepts an options object with the following parameters:

Basic Sentence Detection Options

Option	Default	Description
`minSilenceDuration`	`0.5`	Minimum duration of silence (in seconds) to be considered a sentence boundary
`silenceThreshold`	`0.01`	RMS threshold below which audio is considered silence
`minSentenceLength`	`1`	Minimum length of a sentence in seconds
`maxSentenceLength`	`15`	Maximum length of a sentence in seconds
`windowSize`	`2048`	Size of the analysis window in samples
`idealSentenceLength`	`5`	Ideal length of a sentence in seconds (used for probability calculations)
`idealSilenceDuration`	`0.8`	Ideal duration of silence between sentences
`allowGaps`	`true`	Whether to allow gaps between sentences
`minSegmentLength`	`0`	Minimum length for merged segments
`alignToAudioBoundaries`	`false`	Whether to align sentences with audio file boundaries

Voice Detection Options

Option	Default	Description
`fundamentalFreqMin`	`85`	Minimum fundamental frequency for voice detection (Hz)
`fundamentalFreqMax`	`255`	Maximum fundamental frequency for voice detection (Hz)
`voiceActivityThreshold`	`0.4`	Threshold for voice activity detection
`minVoiceActivityDuration`	`0.1`	Minimum duration of voice activity (seconds)
`energySmoothing`	`0.95`	Smoothing factor for energy calculations
`formantEmphasis`	`0.7`	Emphasis factor for formant detection
`zeroCrossingRateThreshold`	`0.3`	Threshold for zero-crossing rate in voice detection

Debug Option

Option	Default	Description
`debug`	`false`	Enable debug logging

Return Value

The detect() method returns an array of sentence objects, each containing:

{
    index: number,          // Index of the sentence
    start: number,          // Start time in seconds
    end: number,           // End time in seconds
    duration: number,      // Duration in seconds
    probability: number    // Confidence score (0-1)
}

Example

const AudioSentenceDetector = require('audio-sentence-detector');

// Create detector with custom settings
const detector = new AudioSentenceDetector({
    minSilenceDuration: 0.3,
    silenceThreshold: 0.02,
    minSentenceLength: 1.5,
    maxSentenceLength: 10,
    debug: true
});

// Process audio file
const fs = require('fs');
const audioBuffer = fs.readFileSync('speech.wav');

try {
    const sentences = await detector.detect(audioBuffer);
    console.log('Detected sentences:', sentences);
} catch (error) {
    console.error('Error processing audio:', error);
}

License

MIT

audio speech detection sentence voice silence-detection sentence-detection wav speech-analysis audio-processing

2 years ago

2 years ago

2 years ago

2 years ago

2 years ago

2 years ago