Groq-ocr NPM | npm.io

Disclaimer
Installation
Usage
How it works
Models
Roadmap
Credit

Disclaimer

This project is still in development‼️

Multi-page PDF support is experimental and work in progress.

PDF support relies on pdftopic library which requires node>=12 and imagemagick.

JSON mode might fail with json_validate_failed error

Installation

npm i groq-ocr to use as an NPM package.

npm i -g groq-ocr to use as a CLI.

Usage

Use as NPM package:

import { ocr, GroqVisionModel } from "groq-ocr";
const result = await ocr({
  filePath: "./filepath.jpg", // Allowed formats: jpg, jpeg, png, pdf.
  apiKey: process.env.GROQ_API_KEY, // Get your API key from https://console.groq.com/
  model: GroqVisionModel.LLAMA_32_90B, // available models: LLAMA_32_11B, LLAMA_32_90B. Default: LLAMA_32_11B
  jsonMode: false, // Default: false. Set to true to get JSON output.
  additionalInstructions: "Additional instructions to be included in the prompt.", // Use to give custom instructions to the model.
});

ocr options:

filePath (required): Path to image/PDF file or URL
- Supported formats: .jpg, .jpeg, .png, .pdf
apiKey (optional): Groq API key
- Defaults to GROQ_API_KEY environment variable
model (optional): Vision model to use
- GroqVisionModel.LLAMA_32_11B (default) - Llama 3.2 11B Vision Preview
- GroqVisionModel.LLAMA_32_90B - Llama 3.2 90B Vision Preview
jsonMode (optional): Return structured JSON instead of markdown
- Defaults to false
additionalInstructions (optional): Additional instructions to be included in the prompt.
- Defaults to "" - use to give custom instructions to the model.

Use as CLI:

Either set your Groq API key as environment variable:

export GROQ_API_KEY=your-api-key

Or provide it as CLI option with -k flag when running commands.

CLI Examples

# Basic usage
groq-ocr -f image.jpg

# Output as JSON
groq-ocr -f scan.pdf -j

# Save to file
groq-ocr -f receipt.png -o result.txt

# Use specific model and API key
groq-ocr -f document.jpg -m llama-3.2-90b-vision-preview -k your-api-key

CLI Options

-f, --file <path> (required): Path to input image/PDF file
-k, --api-key <key>: Groq API key (defaults to GROQ_API_KEY env var)
-m, --model <model>: Vision model to use:
- llama-3.2-11b-vision-preview (default)
- llama-3.2-90b-vision-preview
-j, --json: Output in JSON format instead of markdown
-o, --output <path>: Write result to file instead of console
-V, --version: Display version number
-h, --help: Display help information

How it works

This library and CLI uses multimodal models with vision capabilities provided by Groq to run OCR on images and PDFs and return markdown or JSON.

PDFs are converted to images using pdftopic.

Models

The plan is to support all models provided by Groq with vision capabilities. Groq vision models

Currently supported models:

enum GroqVisionModel {
  LLAMA_32_11B = "llama-3.2-11b-vision-preview",
  LLAMA_32_90B = "llama-3.2-90b-vision-preview",
}

Roadmap

Add support for local images OCR
Add support for remote images OCR
Add support for single page PDFs
Add support for JSON output in addition to markdown
Add CLI
extend prompt with custom instructions
Add support for multi-page PDFs OCR (Available but experimental)

Credit

This project was highly inspired by llama-ocr.

AI LLMs GROQ OPEN SOURCE

commander groq-sdk dotenv pdftopic

10 months ago

10 months ago

10 months ago

10 months ago

10 months ago

10 months ago

10 months ago

groq-ocr v1.0.6

Table of Contents

Disclaimer

Installation

Usage

Use as NPM package:

ocr options:

Use as CLI:

CLI Examples

CLI Options

How it works

Models

Roadmap

Credit