Text-extraction | npm.io

Yet another library to extract text from MS Office and PDF files

text-extraction get-text parser ms-office ms-excel ms-word ms-powerpoint xlsx docx pptx

3.0.3 • Published 1 year ago

Line segmentation algorithm for GCP Vision OCR.

segmentation gcp google vision ocr google-vision text-extraction algorithm

1.0.0 • Published 5 years ago

Yet another library to extract text from MS Office and PDF files

text-extraction get-text parser ms-office ms-excel ms-word ms-powerpoint xlsx docx pptx

3.0.4 • Published 2 years ago

n8n nodes for Unstract services including LLMWhisperer and Unstract API

n8n-community-node-package unstract llmwhisperer n8n n8n-node document-processing text-extraction

0.1.1 • Published 6 months ago

Fork of office-text-extractor with unreleased changes that include browser support

text-extraction get-text parser ms-office ms-excel ms-word ms-powerpoint xlsx docx pptx

3.1.4 • Published 1 year ago

A simple OCR library with image preprocessing, URL/base64 support, and multi-language OCR.

ocr tesseract text-extraction image-to-text ocr-easy

1.0.6 • Published 7 months ago

A lightweight toolkit for extracting, searching, and processing PDF text efficiently.

pdf text-extraction pdf-toolkit pdf-processing

1.1.0 • Published 10 months ago

React native library to perform OCR on images

react-native ocr optical-character-recognition text-recognition image-processing image-to-text scanning document-scanning mobile-ocr text-extraction

0.2.4 • Published 2 years ago

A Node.js wrapper for the Python EasyOCR library

ocr easyocr optical-character-recognition image-processing text-extraction document-analysis python-wrapper image-to-text

1.0.9 • Published 12 months ago

Easily extract text from digital PDF files with coordinate and font size included, and optionally group text by lines or render scanned pdf to canvas/png.

pdf-reader text-extraction pdf-rag bbox pdf pdf-typescript bun pdf-digital pdf-scan pdf-canvas

4.3.1 • Published 7 months ago

Easily extract text from digital PDF files with coordinate and font size included, and optionally group text by lines.

pdf-reader text-extraction pdf-rag bbox pdf pdf-typescript bun pdf-digital pdfjs

1.0.0 • Published 7 months ago

A Node.js library that extracts and structures text from HTML files for full-text search indexing.

html parsing text-extraction full-text-search indexing anchor headings node.js cheerio filesystem

1.1.1 • Published 2 years ago

A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown

web-crawler markdown html-to-markdown readability content-extraction playwright cli-tool web-scraping gfm turndown

1.0.11 • Published 10 months ago

MCP server for JinaAI reader

mcp model-context-protocol jinaai reader web-content documentation content-extraction text-extraction llm ai

0.0.4 • Published 8 months ago

MCP server for JinaAI search

mcp model-context-protocol jinaai search web-content documentation content-extraction text-extraction llm ai

0.0.2 • Published 8 months ago

MCP server for JinaAI grounding

mcp model-context-protocol jinaai grounding web-content documentation content-extraction text-extraction llm ai

0.0.2 • Published 8 months ago

MCP server for Svelte docs

mcp model-context-protocol jinaai reader web-content documentation content-extraction text-extraction llm ai

0.0.11 • Published 8 months ago

MCP server for Vectorize.io.

mcp vectorize retrieval metadata-extraction text-extraction

0.2.0 • Published 6 months ago

A Text extracting package docx, pdf and pptx files

parser docx pdf pptx text-extraction document-parser

1.0.0 • Published 9 months ago

A powerful text parser library for extracting, processing, and manipulating text data in various formats.

advance-text-parser string-manipulation data-processing text-analysis text-extraction

1.0.2 • Published 9 months ago