1.0.8 • Published 10 months ago

deepgram-media-transcriber v1.0.8

Weekly downloads
-
License
MIT
Repository
-
Last release
10 months ago

Deepgram Media Transcriber

A robust TypeScript package for transcribing audio/video files using Deepgram's API, with speaker diarization and subtitle generation (SRT/VTT) capabilities.

Features

  • 🎙️ Audio/video transcription with speaker identification
  • 📄 Multiple output formats: SRT, VTT, Plain Text
  • ⏱️ Automatic splitting of long utterances (>15s)
  • 🔊 Supports various audio formats (automatic conversion to MP3)
  • 🎯 Accurate word-level timing information
  • 🗣️ Speaker confidence scores (when available)

Installation

npm install deepgram-media-transcriber
## Quick Start
```typescript
import { transcribeMedia } from 'deepgram-media-transcriber';

async function main() {
  const results = await transcribeMedia(
    '/path/to/your/media/file.mp4',
    'your-deepgram-api-key'
  );
  
  console.log('Formatted Text:', results.formattedText);
  console.log('SRT Subtitles:', results.srt);
  console.log('VTT Subtitles:', results.vtt);
}

main().catch(console.error);

API Documentation

transcribeMedia(filePath: string, deepgramApiKey: string, keepAudioFile?: boolean) Parameters

  • filePath : Path to media file (supports MP3, WAV, MP4, MOV, etc.)
  • deepgramApiKey : Your Deepgram API key
  • keepAudioFile : Keep converted audio file (default: false) Returns
{
  transcript: any;         // Raw Deepgram response
  formattedText: string;   // Speaker-formatted plain text
  srt: string;             // SRT formatted subtitles
  vtt: string;             // VTT formatted subtitles
  audioFilePath?: string   // Path to converted audio (if kept)
}

Output Formats

SRT Format Example

1
00:00:00,000 --> 00:00:04,120
Speaker 0: Let's start with the main agenda items...

2
00:00:04,240 --> 00:00:07,800
Speaker 1: I agree, we should prioritize...

VTT Format Example

WEBVTT

1
00:00:00.000 --> 00:00:04.120
Speaker 0: Let's start with the main agenda items...

2
00:00:04.240 --> 00:00:07.800
Speaker 1: I agree, we should prioritize...

Text Format Example

Speaker 0: Let's start with the main agenda items...

Speaker 1: I agree, we should prioritize...

Configuration

DeepGram API Setup

  1. Get API key from Deepgram Console
  2. Enable the following features in your Deepgram project:
    • Speaker Diarization
    • Punctuation
    • Utterance Detection

Browser Usage

This package also supports browser environments using ffmpeg.wasm:

import { transcribeMediaBrowser } from 'deepgram-media-transcriber/browser';

async function processMedia() {
  const fileInput = document.getElementById('fileInput');
  const file = fileInput.files[0];
  
  try {
    const { formattedText, srt, vtt } = await transcribeMediaBrowser(
      file,
      'your-deepgram-api-key'
    );
    
    console.log('Formatted Text:', formattedText);
    console.log('SRT Subtitles:', srt);
    console.log('VTT Subtitles:', vtt);
  } catch (error) {
    console.error('Processing failed:', error.message);
  }
}

Browser Considerations

  • The browser version uses ffmpeg.wasm which requires loading WebAssembly modules
  • Cross-Origin Resource Sharing (CORS) must be properly configured when loading the WebAssembly modules
  • The browser version doesn't support the keepAudioFile option as files are processed in memory

Audio Conversion

The package automatically:

  • Converts non-MP3 files to high-quality MP3
  • Maintains original audio quality (44.1kHz sample rate)
  • Handles stereo-to-mono conversion when needed

Error Handling

The package throws specific errors for:

  • Invalid file paths
  • Deepgram API errors
  • FFmpeg conversion failures
  • Invalid audio formats

Example Usage

import { writeFileSync } from 'fs';
import { transcribeMedia } from 'deepgram-media-transcriber';

async function processMedia() {
  try {
    const { formattedText, srt, vtt } = await transcribeMedia(
      'interview.mp4',
      process.env.DEEPGRAM_KEY,
      true
    );

    writeFileSync('interview.txt', formattedText);
    writeFileSync('interview.srt', srt);
    writeFileSync('interview.vtt', vtt);
    
    console.log('Processing complete!');
  } catch (error) {
    console.error('Processing failed:', error.message);
  }
}

processMedia();

Development

Build

npm run build

Contribution

  1. Clone repository
  2. Install dependencies: npm install
  3. Implement features/fixes
  4. Write tests (coming soon)
  5. Submit PR

License

MIT © Ernesto Voltaggio

Note: This package requires FFmpeg for audio conversion. The ffmpeg-static dependency is included automatically.

1.0.8

10 months ago

1.0.7

10 months ago

1.0.6

10 months ago

1.0.5

10 months ago

1.0.4

10 months ago

1.0.3

10 months ago

1.0.2

10 months ago

1.0.1

10 months ago

1.0.0

10 months ago