1.0.3 • Published 6 months ago

pptx-content-extractor v1.0.3

Weekly downloads
-
License
MIT
Repository
github
Last release
6 months ago

PPTX Content Extractor

PPTX Content Extractor is a Node.js library for extracting slides, notes, and media content (e.g., images) from .pptx files. This tool leverages JSZip for unpacking .pptx archives and xml2js for parsing XML-based content.

Features

  • Extract text content from PowerPoint slides (.pptx).
  • Retrieve media files (e.g., images) embedded in the presentation.
  • Extract speaker notes for each slide.
  • Modular structure for extracting specific content types (slides, media, or notes).

Installation

Install the library via npm:

npm install --save pptx-content-extractor

Usage

Full Extraction

Extract all slides, media, and notes from a .pptx file:

import { extractPptx } from 'pptx-content-extractor';

(async () => {
  const result = await extractPptx('/path/to/presentation.pptx');
  console.log('Slides:', result.slides);
  console.log('Media:', result.media);
  console.log('Notes:', result.notes);
})();

Extract specific content

Slides

import { extractPptxSlides } from 'pptx-content-extractor';

(async () => {
  const slides = await extractPptxSlides('/path/to/presentation.pptx');
  console.log('Slides:', slides);
})();

Media

import { extractPptxMedia } from 'pptx-content-extractor';

(async () => {
  const media = await extractPptxMedia('/path/to/presentation.pptx');
  console.log('Media:', media);
})();

Notes

import { extractPptxNotes } from 'pptx-content-extractor';

(async () => {
  const notes = await extractPptxNotes('/path/to/presentation.pptx');
  console.log('Notes:', notes);
})();

API

extractPptx(filePath: string): Promise<ParsedPowerPoint>

Extracts slides, media, and notes from a .pptx file.

  • filePath: Path to the .pptx file.
  • Returns: A Promise<ParsedPowerPoint> containing:
    • slides: An array of parsed slides.
    • media: An array of media content.
    • notes: An array of parsed notes.

extractPptxSlides(filePath: string): Promise<ParsedSlide[]>

Extracts only the slides.

  • filePath: Path to the .pptx file.
  • Returns: A Promise<ParsedSlide[]> containing parsed slides.

extractPptxMedia(filePath: string): Promise<ParsedMedia[]>

Extracts only the media content.

  • filePath: Path to the .pptx file.
  • Returns: A Promise<ParsedMedia[]> containing media content.

extractPptxNotes(filePath: string): Promise<ParsedNote[]>

Extracts only the notes.

  • filePath: Path to the .pptx file.
  • Returns: A Promise<ParsedNote[]> containing parsed notes.

Types

ParsedContent

Base interface for parsed content.

export interface ParsedContent {
  name: string;
  content: unknown;
}

ParsedPptx

export interface ParsedPowerPoint {
  slides: ParsedSlide[];
  media: ParsedMedia[];
  notes: ParsedNote[];
}

ParsedSlide

export interface ParsedSlide extends ParsedContent {
  content: { id: string; type: string; text: string[] }[];
  mediaNames: string[] // names of media file e.g. ['image23.jpeg']
}

ParsedMedia

export interface ParsedMedia extends ParsedContent {
  content: string; // Base64-encoded media content
}

ParsedNote

export interface ParsedNote extends ParsedContent {
  content: string;
}

1.0.3

6 months ago

1.0.2

6 months ago

1.0.1

6 months ago

1.0.0

6 months ago