1.0.3 • Published 9 months ago
pptx-content-extractor v1.0.3
PPTX Content Extractor
PPTX Content Extractor is a Node.js library for extracting slides, notes, and media content (e.g., images) from .pptx files. This tool leverages JSZip for unpacking .pptx archives and xml2js for parsing XML-based content.
Features
- Extract text content from PowerPoint slides (
.pptx). - Retrieve media files (e.g., images) embedded in the presentation.
- Extract speaker notes for each slide.
- Modular structure for extracting specific content types (slides, media, or notes).
Installation
Install the library via npm:
npm install --save pptx-content-extractorUsage
Full Extraction
Extract all slides, media, and notes from a .pptx file:
import { extractPptx } from 'pptx-content-extractor';
(async () => {
const result = await extractPptx('/path/to/presentation.pptx');
console.log('Slides:', result.slides);
console.log('Media:', result.media);
console.log('Notes:', result.notes);
})();Extract specific content
Slides
import { extractPptxSlides } from 'pptx-content-extractor';
(async () => {
const slides = await extractPptxSlides('/path/to/presentation.pptx');
console.log('Slides:', slides);
})();Media
import { extractPptxMedia } from 'pptx-content-extractor';
(async () => {
const media = await extractPptxMedia('/path/to/presentation.pptx');
console.log('Media:', media);
})();Notes
import { extractPptxNotes } from 'pptx-content-extractor';
(async () => {
const notes = await extractPptxNotes('/path/to/presentation.pptx');
console.log('Notes:', notes);
})();API
extractPptx(filePath: string): Promise<ParsedPowerPoint>
Extracts slides, media, and notes from a .pptx file.
filePath: Path to the.pptxfile.- Returns: A
Promise<ParsedPowerPoint>containing:slides: An array of parsed slides.media: An array of media content.notes: An array of parsed notes.
extractPptxSlides(filePath: string): Promise<ParsedSlide[]>
Extracts only the slides.
filePath: Path to the.pptxfile.- Returns: A
Promise<ParsedSlide[]>containing parsed slides.
extractPptxMedia(filePath: string): Promise<ParsedMedia[]>
Extracts only the media content.
filePath: Path to the.pptxfile.- Returns: A
Promise<ParsedMedia[]>containing media content.
extractPptxNotes(filePath: string): Promise<ParsedNote[]>
Extracts only the notes.
filePath: Path to the.pptxfile.- Returns: A
Promise<ParsedNote[]>containing parsed notes.
Types
ParsedContent
Base interface for parsed content.
export interface ParsedContent {
name: string;
content: unknown;
}ParsedPptx
export interface ParsedPowerPoint {
slides: ParsedSlide[];
media: ParsedMedia[];
notes: ParsedNote[];
}ParsedSlide
export interface ParsedSlide extends ParsedContent {
content: { id: string; type: string; text: string[] }[];
mediaNames: string[] // names of media file e.g. ['image23.jpeg']
}ParsedMedia
export interface ParsedMedia extends ParsedContent {
content: string; // Base64-encoded media content
}ParsedNote
export interface ParsedNote extends ParsedContent {
content: string;
}