1.0.8 • Published 4 months ago
deepgram-media-transcriber v1.0.8
Deepgram Media Transcriber
A robust TypeScript package for transcribing audio/video files using Deepgram's API, with speaker diarization and subtitle generation (SRT/VTT) capabilities.
Features
- 🎙️ Audio/video transcription with speaker identification
- 📄 Multiple output formats: SRT, VTT, Plain Text
- ⏱️ Automatic splitting of long utterances (>15s)
- 🔊 Supports various audio formats (automatic conversion to MP3)
- 🎯 Accurate word-level timing information
- 🗣️ Speaker confidence scores (when available)
Installation
npm install deepgram-media-transcriber
## Quick Start
```typescript
import { transcribeMedia } from 'deepgram-media-transcriber';
async function main() {
const results = await transcribeMedia(
'/path/to/your/media/file.mp4',
'your-deepgram-api-key'
);
console.log('Formatted Text:', results.formattedText);
console.log('SRT Subtitles:', results.srt);
console.log('VTT Subtitles:', results.vtt);
}
main().catch(console.error);
API Documentation
transcribeMedia(filePath: string, deepgramApiKey: string, keepAudioFile?: boolean) Parameters
- filePath : Path to media file (supports MP3, WAV, MP4, MOV, etc.)
- deepgramApiKey : Your Deepgram API key
- keepAudioFile : Keep converted audio file (default: false) Returns
{
transcript: any; // Raw Deepgram response
formattedText: string; // Speaker-formatted plain text
srt: string; // SRT formatted subtitles
vtt: string; // VTT formatted subtitles
audioFilePath?: string // Path to converted audio (if kept)
}
Output Formats
SRT Format Example
1
00:00:00,000 --> 00:00:04,120
Speaker 0: Let's start with the main agenda items...
2
00:00:04,240 --> 00:00:07,800
Speaker 1: I agree, we should prioritize...
VTT Format Example
WEBVTT
1
00:00:00.000 --> 00:00:04.120
Speaker 0: Let's start with the main agenda items...
2
00:00:04.240 --> 00:00:07.800
Speaker 1: I agree, we should prioritize...
Text Format Example
Speaker 0: Let's start with the main agenda items...
Speaker 1: I agree, we should prioritize...
Configuration
DeepGram API Setup
- Get API key from Deepgram Console
- Enable the following features in your Deepgram project:
- Speaker Diarization
- Punctuation
- Utterance Detection
Browser Usage
This package also supports browser environments using ffmpeg.wasm:
import { transcribeMediaBrowser } from 'deepgram-media-transcriber/browser';
async function processMedia() {
const fileInput = document.getElementById('fileInput');
const file = fileInput.files[0];
try {
const { formattedText, srt, vtt } = await transcribeMediaBrowser(
file,
'your-deepgram-api-key'
);
console.log('Formatted Text:', formattedText);
console.log('SRT Subtitles:', srt);
console.log('VTT Subtitles:', vtt);
} catch (error) {
console.error('Processing failed:', error.message);
}
}
Browser Considerations
- The browser version uses ffmpeg.wasm which requires loading WebAssembly modules
- Cross-Origin Resource Sharing (CORS) must be properly configured when loading the WebAssembly modules
- The browser version doesn't support the
keepAudioFile
option as files are processed in memory
Audio Conversion
The package automatically:
- Converts non-MP3 files to high-quality MP3
- Maintains original audio quality (44.1kHz sample rate)
- Handles stereo-to-mono conversion when needed
Error Handling
The package throws specific errors for:
- Invalid file paths
- Deepgram API errors
- FFmpeg conversion failures
- Invalid audio formats
Example Usage
import { writeFileSync } from 'fs';
import { transcribeMedia } from 'deepgram-media-transcriber';
async function processMedia() {
try {
const { formattedText, srt, vtt } = await transcribeMedia(
'interview.mp4',
process.env.DEEPGRAM_KEY,
true
);
writeFileSync('interview.txt', formattedText);
writeFileSync('interview.srt', srt);
writeFileSync('interview.vtt', vtt);
console.log('Processing complete!');
} catch (error) {
console.error('Processing failed:', error.message);
}
}
processMedia();
Development
Build
npm run build
Contribution
- Clone repository
- Install dependencies: npm install
- Implement features/fixes
- Write tests (coming soon)
- Submit PR
License
MIT © Ernesto Voltaggio
Note: This package requires FFmpeg for audio conversion. The ffmpeg-static dependency is included automatically.