@fnet/ocr-text-coords v0.1.1
@fnet/ocr-text-coords
This project is designed to facilitate the extraction of text and their corresponding bounding box coordinates from images, using the Tesseract.js library. It supports various input formats such as file paths, URLs, or Base64 encoded strings and can handle multiple languages.
How It Works
The project uses Tesseract.js to perform Optical Character Recognition (OCR) on a given image. You provide an image through a file path, URL, or Base64 string, along with a language specification. The project then processes the image, identifying text and marking each word's location with a bounding box. This information is returned as a structured object containing both the extracted text and coordinates.
Key Features
- Text Extraction: Automatically extracts text from images.
- Bounding Box Coordinates: Provides the coordinates for each detected word.
- Multiple Input Formats: Works with file paths, URLs, or Base64 image data.
- Language Support: Allows recognition in multiple languages by specifying language codes.
Conclusion
This project offers a straightforward way to extract and locate text within images, applicable to a variety of formats and languages. It provides a simple interface to access Tesseract.js capabilities, returning structured results that can be easily integrated into further processing or analysis tasks.
Developer Guide for @fnet/ocr-text-coords
Overview
The @fnet/ocr-text-coords library is designed to help developers extract textual content from images along with their bounding box coordinates. Using Optical Character Recognition (OCR) powered by Tesseract.js, this library supports images provided via file paths, URLs, or Base64 strings. Developers can easily integrate this functionality into applications where text extraction from visual media is required.
Installation
You can install the @fnet/ocr-text-coords library using either npm or yarn:
npm install @fnet/ocr-text-coordsOr with yarn:
yarn add @fnet/ocr-text-coordsUsage
The library provides a simple public function that you can use to extract text from images:
Basic Usage
First, import the library into your project:
import extractTextWithCoords from '@fnet/ocr-text-coords';You can then use the function to process an image:
(async () => {
try {
const result = await extractTextWithCoords({ imageInput: 'path/to/your/image.jpg', language: 'eng' });
console.log(result.text); // Outputs full extracted text
console.log(result.words); // Outputs array of words with bounding boxes
} catch (error) {
console.error('Error extracting text:', error.message);
}
})();Parameters
imageInput: A string representing the image input. This can be a file path, a URL, or a Base64-encoded string.language(optional): A string specifying the language(s) for OCR. The default is"eng". You can specify multiple languages by joining their codes with a plus sign (e.g.,"eng+tur").
Examples
Here are a few practical examples demonstrating typical use cases:
Extracting Text from a Local File
import extractTextWithCoords from '@fnet/ocr-text-coords';
// Process an image from a local file
(async () => {
const result = await extractTextWithCoords({ imageInput: './local-image.png' });
console.log(result);
})();Extracting Text from a URL
import extractTextWithCoords from '@fnet/ocr-text-coords';
// Process an image from a URL
(async () => {
const result = await extractTextWithCoords({ imageInput: 'https://example.com/image.jpg', language: 'eng+spa' });
console.log(result);
})();Extracting Text from a Base64 String
import extractTextWithCoords from '@fnet/ocr-text-coords';
// Process an image from a Base64 string
(async () => {
const base64String = 'data:image/png;base64,iVBORw0KGgoAAAANS...';
const result = await extractTextWithCoords({ imageInput: base64String });
console.log(result);
})();Acknowledgement
This library uses Tesseract.js for OCR processing. Tesseract.js is an open-source OCR engine supported by an active developer community.
Input Schema
$schema: https://json-schema.org/draft/2020-12/schema
type: object
properties:
imageInput:
type: string
description: The path, URL, or Base64 string of the image to be processed.
language:
type: string
description: The language code(s) to use for OCR (e.g., "eng", "tur", or "eng+tur").
default: eng
required:
- imageInput12 months ago