1.1.0 • Published 9 months ago

@maia-id/maleo v1.1.0

Weekly downloads
-
License
MIT
Repository
-
Last release
9 months ago

Speaker Diarization

A JavaScript library for speaker diarization - the process of partitioning an audio stream into segments according to speaker identity.

Features

  • Audio preprocessing with customizable options
  • CPU, GPU, and WebGPU support
  • Progress tracking during inference
  • Flexible audio input handling
  • Silence removal and audio normalization capabilities

Prerequisites

GPU Support

If you plan to use GPU acceleration, ensure you have the required CUDA libraries installed:

libcublasLt.so.12

For CUDA installation instructions, refer to the NVIDIA cuDNN Installation Guide.

Installation

npm install speaker-diarization

Usage

Basic Example

import { SpeakerDiarization } from 'speaker-diarization';

// Example usage
const example = async () => {
    const speakerDiarization = new SpeakerDiarization();
    const result = await speakerDiarization.inference({
        audio: './examples/audio.wav',  // File path for Node.js
        device: 'cpu', // or 'cuda'
        audioOptions: {
            targetSampleRate: 16000,
            normalizeAudio: true,
            removeSilence: true,
            silenceThreshold: -50,
        },
        progress_callback: (progress) => console.log('Progress:', progress)
    });

    console.table(result.segments);
};

example();

Running the Example

node examples/inference.js

Configuration Options

Audio Options

OptionTypeDefaultDescription
targetSampleRatenumber16000Target sample rate for audio processing
normalizeAudiobooleantrueWhether to normalize audio amplitude
removeSilencebooleantrueWhether to remove silence segments
silenceThresholdnumber-50Threshold (in dB) for silence detection

Inference Options

OptionTypeDescription
audiostringPath to the audio file
device'cpu' | 'cuda'Processing device to use
progress_callbackfunctionCallback for tracking progress

Output Format

The inference method returns a result object containing segments with the following structure:

interface Segment {
    start: number;      // Start time in seconds
    end: number;        // End time in seconds
    speaker: string;    // Speaker identifier
    confidence: number; // Confidence score
}

Citation

If you use this library in your research, please cite:

@inproceedings{irawan2025cross,
  title = {Cross-Platform Speaker Diarization: Evaluating the Scalability of Maleo},
  author = {Eka Tresna Irawan and Ardi Mardiana and Dedy Hariyadi and I Putu Agus Eka Pratama},
  booktitle = {International Conference on Discoveries in Applied Sciences & Advanced Technology 2025},
  year = {2025}
}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • NVIDIA for CUDA support
1.1.0

9 months ago

1.0.1

9 months ago

1.0.0

9 months ago