office-text-extractor
Yet another library to extract text from MS Office and PDF files
Yet another library to extract text from MS Office and PDF files
Line segmentation algorithm for GCP Vision OCR.
Yet another library to extract text from MS Office and PDF files
n8n nodes for Unstract services including LLMWhisperer and Unstract API
Fork of office-text-extractor with unreleased changes that include browser support
A simple OCR library with image preprocessing, URL/base64 support, and multi-language OCR.
A lightweight toolkit for extracting, searching, and processing PDF text efficiently.
React native library to perform OCR on images
A Node.js wrapper for the Python EasyOCR library
Easily extract text from digital PDF files with coordinate and font size included, and optionally group text by lines or render scanned pdf to canvas/png.
Easily extract text from digital PDF files with coordinate and font size included, and optionally group text by lines.
A Node.js library that extracts and structures text from HTML files for full-text search indexing.
A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown
MCP server for JinaAI reader
MCP server for JinaAI search
MCP server for JinaAI grounding
MCP server for Svelte docs
MCP server for Vectorize.io.
A Text extracting package docx, pdf and pptx files
A powerful text parser library for extracting, processing, and manipulating text data in various formats.