1.1.2 • Published 12 months ago
@maia-id/maleo v1.1.2
MALEO: Multi Platform Speaker Diarization
A JavaScript library for speaker diarization - the process of partitioning an audio stream into segments according to speaker identity.
Features
- Audio preprocessing with customizable options
- CPU, GPU, WebGPU, and WASM support
- Progress tracking during inference
- Flexible audio input handling
- Silence removal and audio normalization capabilities
Prerequisites
GPU Support
If you plan to use GPU acceleration, ensure you have the required CUDA libraries installed:
libcublasLt.so.12For CUDA installation instructions, refer to the NVIDIA cuDNN Installation Guide.
Installation
npm install @maia-id/maleoUsage
Basic Example
import { SpeakerDiarization } from 'speaker-diarization';
// Example usage
const example = async () => {
const speakerDiarization = new SpeakerDiarization();
const result = await speakerDiarization.inference({
audio: './examples/audio.wav', // File path for Node.js
language: 'en',
device: 'cpu', // Device support : 'cpu', 'cuda', 'webgpu', or 'wasm'
audioOptions: {
targetSampleRate: 16000,
normalizeAudio: true,
removeSilence: true,
silenceThreshold: -50,
},
progress_callback: (progress) => console.log('Progress:', progress)
});
console.table(result.segments);
};
example();Running the Example
node examples/inference.jsConfiguration Options
Audio Options
| Option | Type | Default | Description |
|---|---|---|---|
| targetSampleRate | number | 16000 | Target sample rate for audio processing |
| normalizeAudio | boolean | true | Whether to normalize audio amplitude |
| removeSilence | boolean | true | Whether to remove silence segments |
| silenceThreshold | number | -50 | Threshold (in dB) for silence detection |
Inference Options
| Option | Type | Description |
|---|---|---|
| audio | string | Path to the audio file |
| device | 'cpu' | 'cuda' | 'webgpu' | 'wasm' | Processing device to use |
| progress_callback | function | Callback for tracking progress |
Output Format
The inference method returns a result object containing segments with the following structure:
interface Segment {
start: number; // Start time in seconds
end: number; // End time in seconds
speaker: string; // Speaker identifier
confidence: number; // Confidence score
}Citation
If you use this library in your research, please cite:
@inproceedings{irawan2025cross,
title = {Cross-Platform Speaker Diarization: Evaluating the Scalability of Maleo},
author = {Eka Tresna Irawan and Ardi Mardiana and Dedy Hariyadi and I Putu Agus Eka Pratama},
booktitle = {International Conference on Discoveries in Applied Sciences & Advanced Technology 2025},
year = {2025}
}Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
- NVIDIA for CUDA support