0.3.0 • Published 4 months ago
@transformgovsg/pdf2md v0.3.0
pdf2md
A CLI tool that converts PDF files into Markdown using Azure Document Intelligence for text extraction and OpenAI for Markdown formatting.
✨ Features
- 📜 Converts PDFs into Markdown format
- 🧠 Uses Azure Document Intelligence for text extraction
- 🤖 Enhances Markdown formatting with OpenAI LLM
- 🛠️ Simple CLI usage
📋 Prerequisites
Before using pdf2md, ensure you have the following:
- ✅ Node.js installed
- ✅ Azure Document Intelligence API credentials
- ✅ OpenAI API credentials
- ✅ Ensure you have the following environment variables set in your system or in a
.env
file in the current working directory:
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=<your_azure_endpoint>
AZURE_DOCUMENT_INTELLIGENCE_API_KEY=<your_azure_api_key>
OPENAI_API_BASE_URL=<your_openai_base_url>
OPENAI_API_KEY=<your_openai_api_key>
OPENAI_CHAT_MODEL=<your_openai_chat_model> # Default: o3-mini
⚡ Quick Start
To convert a PDF file to Markdown without installing locally, run:
pnpm dlx @transformgovsg/pdf2md <path-to-pdf>
Alternatively, using yarn or npm:
yarn dlx @transformgovsg/pdf2md <path-to-pdf>
npx @transformgovsg/pdf2md <path-to-pdf>
Example:
pnpm dlx @transformgovsg/pdf2md ./path/to/document.pdf
📜 License
This project is licensed under the AGPL-3.0-only License.