1.1.1 • Published 5 months ago
whisper-node v1.1.1
whisper-node
Node.js bindings for OpenAI's Whisper. Transcription done local.
Features
- Output transcripts to JSON (also .txt .srt .vtt)
- Optimized for CPU (Including Apple Silicon ARM)
- Timestamp precision to single word
Installation
- Add dependency to project
npm install whisper-node
- Download whisper model of choice
npx whisper-node download
Requirement for Windows: Install the make
command from here.
Usage
import whisper from 'whisper-node';
const transcript = await whisper("example/sample.wav");
console.log(transcript); // output: [ {start,end,speech} ]
Output (JSON)
[
{
"start": "00:00:14.310", // time stamp begin
"end": "00:00:16.480", // time stamp end
"speech": "howdy" // transcription
}
]
Usage with Additional Options
import whisper from 'whisper-node';
const filePath = "example/sample.wav"; // required
const options = {
modelName: "base.en", // default
// modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName')
whisperOptions: {
language: 'auto' // default (use 'auto' for auto detect)
gen_file_txt: false, // outputs .txt file
gen_file_subtitle: false, // outputs .srt file
gen_file_vtt: false, // outputs .vtt file
word_timestamps: true // timestamp for every word
// timestamp_size: 0 // cannot use along with word_timestamps:true
}
}
const transcript = await whisper(filePath, options);
Files must be .wav and 16Hz
Use FFmpeg to convert an example .mp3 with this command: ffmpeg -i input.mp3 -ar 16000 output.wav
Made with
Roadmap
- Support projects not using Typescript
- Allow custom directory for storing models
- Config files as alternative to model download cli
- Remove path, shelljs and prompt-sync package for browser, react-native expo, and webassembly compatibility
- fluent-ffmpeg to support more audio formats
- Pyanote diarization for speaker names
- Implement WhisperX as optional alternative model for diarization and higher precision timestamps (as alternative to C++ version)
- Add option for viewing detected langauge as described in Issue 16
- Include typescript typescript types in
d.ts
file - Add support for language option
- Add support for transcribing audio streams as already implemented in whisper.cpp
Modifying whisper-node
npm run dev
- runs nodemon and tsc on '/src/test.ts'
npm run build
- runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'
Acknowledgements
1.1.1
5 months ago
1.0.2
6 months ago
1.1.0
5 months ago
1.0.0
6 months ago
1.0.4
5 months ago
1.0.3
6 months ago
0.3.0
6 months ago
0.3.2
6 months ago
0.3.1
6 months ago
0.2.12
1 year ago
0.2.11
1 year ago
0.2.10
1 year ago
0.2.1
1 year ago
0.2.0
1 year ago
0.2.6
1 year ago
0.2.9
1 year ago
0.2.8
1 year ago
0.2.3
1 year ago
0.1.4
1 year ago
0.2.2
1 year ago
0.1.3
1 year ago
0.2.5
1 year ago
0.2.4
1 year ago
0.1.5
1 year ago
0.1.1
1 year ago
0.1.0
1 year ago