0.1.25 • Published 1 year ago
@sk-global/text-extractor v0.1.25
@sk-global/text-extractor
A text extractor for extracting text from HTML, PDF, Image and other files.
Currently supported types ...
- HTML, use html-to-text
- PDF, use pdf-parse
- Image (PNG, JPEG, GIF, BMP, TIFF, ICO, SVG). Use tesseract.js for OCR.
- ... and more to come
Installation
npm install @sk-global/text-extractor
Usage
CommonJS
const { fromUrl, fromBufferWithMimeType, fromBuffer } = require('@sk-global/text-extractor');
// fromUrl
const text = await fromUrl('https://www.digital.go.jp/assets/contents/node/basic_page/field_ref_resources/d6cfdcdd-75e4-460c-9ec0-af4f952e03d5/20210906_meeting_promoting_01.pdf');
// fromBufferWithMimeType
const text = await fromBufferWithMimeType(buffer, 'image/png');
// fromBuffer
const text = await fromBuffer(buffer);
ES6
import { fromUrl, fromBufferWithMimeType, fromBuffer } from '@sk-global/text-extractor';
Roadmap
- Add support for more file types
- Add support for options passed to the underlying libraries
0.1.22
1 year ago
0.1.23
1 year ago
0.1.24
1 year ago
0.1.25
1 year ago
0.1.21
2 years ago
0.1.2
2 years ago
0.1.1
2 years ago
0.1.0
2 years ago
0.0.34
2 years ago
0.0.33
2 years ago
0.0.32
2 years ago
0.0.31
2 years ago
0.0.3
2 years ago
0.0.24
2 years ago
0.0.23
2 years ago
0.0.22
2 years ago
0.0.21
2 years ago
0.0.2
2 years ago
0.0.1
2 years ago