0.1.3 • Published 9 months ago
page-piranha v0.1.3
Page Piranha 🦈
Use LLMs to convert PDFs to text, markdown, or JSON. By default, page piranha uses the Gemini 2.0 Flash model.
Features
- Convert PDFs to plain text, markdown, or JSON
- Support for local files and remote URLs
- Pipe-friendly CLI interface
- Progress indicators and colorful output
- Configurable output directory
- Optional custom prompts for fine-tuned conversions
Installation
npm install page-piranhaEnvironment Setup
Page Piranha requires Google Cloud Platform credentials to use Vertex AI. Create a .env file with the following variables:
GCP_PROJECT=your_gcp_project
GCP_LOCATION=your_gcp_location
GOOGLE_APPLICATION_CREDENTIALS=path_to_your_gcp_credentials_fileCLI Usage
Basic usage:
.bin/page-piranha -f input.pdf -m text -o outputOptions:
-f, --file <file>- The PDF file to convert (required)-m, --mode <mode>- Conversion mode: text, markdown, or json (default: text)-o, --outDir <directory>- Output directory (default: out)-t, --tee- Output to both file and stdout-v, --verbose- Enable verbose logging-p, --prompt <prompt>- Additional hints for conversion
Examples:
Convert to text
.bin/page-piranha -f document.pdf -m textConvert to markdown with custom output directory
.bin/page-piranha -f document.pdf -m markdown -o convertedConvert to JSON and pipe to jq
.bin/page-piranha -f assets/demo.pdf -m json -p "Make sure to use camel case. This is an invoice. Feel free to nest fields" -t | jqProgrammatic Usage
Page Piranha can be used programmatically in your TypeScript/JavaScript projects:
import { PagePiranha } from 'page-piranha';
import { JorEl } from 'jorel';
// Initialize
const jorEl = new JorEl({ vertexAi: true });
const piranha = new PagePiranha(jorEl);
// Convert to text
const text = await piranha.toText('document.pdf');
// Convert to markdown with additional prompt
const markdown = await piranha.toMarkdown('document.pdf', 'Focus on headers and lists');
// Convert to JSON
const json = await piranha.toJson('document.pdf');API Reference
PagePiranha Class
Constructor
constructor(jorEl: JorEl, options?: PagePiranhaOptions)
Methods
toText(fileOrFiles: string | Buffer, additionalPrompt?: string): Promise<string>toMarkdown(fileOrFiles: string | Buffer, additionalPrompt?: string): Promise<string>toJson(fileOrFiles: string | Buffer, additionalPrompt?: string): Promise<object>
License
MIT