1.0.8 • Published 8 months ago

deepgram-media-transcriber v1.0.8

Weekly downloads
-
License
MIT
Repository
-
Last release
8 months ago

Deepgram Media Transcriber

A robust TypeScript package for transcribing audio/video files using Deepgram's API, with speaker diarization and subtitle generation (SRT/VTT) capabilities.

Features

  • 🎙️ Audio/video transcription with speaker identification
  • 📄 Multiple output formats: SRT, VTT, Plain Text
  • ⏱️ Automatic splitting of long utterances (>15s)
  • 🔊 Supports various audio formats (automatic conversion to MP3)
  • 🎯 Accurate word-level timing information
  • 🗣️ Speaker confidence scores (when available)

Installation

npm install deepgram-media-transcriber
## Quick Start
```typescript
import { transcribeMedia } from 'deepgram-media-transcriber';

async function main() {
  const results = await transcribeMedia(
    '/path/to/your/media/file.mp4',
    'your-deepgram-api-key'
  );
  
  console.log('Formatted Text:', results.formattedText);
  console.log('SRT Subtitles:', results.srt);
  console.log('VTT Subtitles:', results.vtt);
}

main().catch(console.error);

API Documentation

transcribeMedia(filePath: string, deepgramApiKey: string, keepAudioFile?: boolean) Parameters

  • filePath : Path to media file (supports MP3, WAV, MP4, MOV, etc.)
  • deepgramApiKey : Your Deepgram API key
  • keepAudioFile : Keep converted audio file (default: false) Returns
{
  transcript: any;         // Raw Deepgram response
  formattedText: string;   // Speaker-formatted plain text
  srt: string;             // SRT formatted subtitles
  vtt: string;             // VTT formatted subtitles
  audioFilePath?: string   // Path to converted audio (if kept)
}

Output Formats

SRT Format Example

1
00:00:00,000 --> 00:00:04,120
Speaker 0: Let's start with the main agenda items...

2
00:00:04,240 --> 00:00:07,800
Speaker 1: I agree, we should prioritize...

VTT Format Example

WEBVTT

1
00:00:00.000 --> 00:00:04.120
Speaker 0: Let's start with the main agenda items...

2
00:00:04.240 --> 00:00:07.800
Speaker 1: I agree, we should prioritize...

Text Format Example

Speaker 0: Let's start with the main agenda items...

Speaker 1: I agree, we should prioritize...

Configuration

DeepGram API Setup

  1. Get API key from Deepgram Console
  2. Enable the following features in your Deepgram project:
    • Speaker Diarization
    • Punctuation
    • Utterance Detection

Browser Usage

This package also supports browser environments using ffmpeg.wasm:

import { transcribeMediaBrowser } from 'deepgram-media-transcriber/browser';

async function processMedia() {
  const fileInput = document.getElementById('fileInput');
  const file = fileInput.files[0];
  
  try {
    const { formattedText, srt, vtt } = await transcribeMediaBrowser(
      file,
      'your-deepgram-api-key'
    );
    
    console.log('Formatted Text:', formattedText);
    console.log('SRT Subtitles:', srt);
    console.log('VTT Subtitles:', vtt);
  } catch (error) {
    console.error('Processing failed:', error.message);
  }
}

Browser Considerations

  • The browser version uses ffmpeg.wasm which requires loading WebAssembly modules
  • Cross-Origin Resource Sharing (CORS) must be properly configured when loading the WebAssembly modules
  • The browser version doesn't support the keepAudioFile option as files are processed in memory

Audio Conversion

The package automatically:

  • Converts non-MP3 files to high-quality MP3
  • Maintains original audio quality (44.1kHz sample rate)
  • Handles stereo-to-mono conversion when needed

Error Handling

The package throws specific errors for:

  • Invalid file paths
  • Deepgram API errors
  • FFmpeg conversion failures
  • Invalid audio formats

Example Usage

import { writeFileSync } from 'fs';
import { transcribeMedia } from 'deepgram-media-transcriber';

async function processMedia() {
  try {
    const { formattedText, srt, vtt } = await transcribeMedia(
      'interview.mp4',
      process.env.DEEPGRAM_KEY,
      true
    );

    writeFileSync('interview.txt', formattedText);
    writeFileSync('interview.srt', srt);
    writeFileSync('interview.vtt', vtt);
    
    console.log('Processing complete!');
  } catch (error) {
    console.error('Processing failed:', error.message);
  }
}

processMedia();

Development

Build

npm run build

Contribution

  1. Clone repository
  2. Install dependencies: npm install
  3. Implement features/fixes
  4. Write tests (coming soon)
  5. Submit PR

License

MIT © Ernesto Voltaggio

Note: This package requires FFmpeg for audio conversion. The ffmpeg-static dependency is included automatically.

1.0.8

8 months ago

1.0.7

8 months ago

1.0.6

8 months ago

1.0.5

8 months ago

1.0.4

8 months ago

1.0.3

8 months ago

1.0.2

8 months ago

1.0.1

8 months ago

1.0.0

8 months ago