0.0.2 • Published 10 months ago

query-file-content v0.0.2

Weekly downloads
-
License
ISC
Repository
github
Last release
10 months ago

query-file-content

A module to query file content from pdf, doc, docx, and odt formates built on the textract module

Install

npm install query-file-content

Extraction Requirements

Note, if any of the requirements below are missing, textract will run and extract all files for types it is capable. Not having these items installed does not prevent you from using textract, it just prevents you from extracting those specific files.

  • PDF extraction requires pdftotext be installed, link
  • DOC extraction requires antiword be installed, link, unless on OSX in which case textutil (installed by default) is used.
  • RTF extraction requires unrtf be installed, link, unless on OSX in which case textutil (installed by default) is used.
  • PNG, JPG and GIF require tesseract to be available, link. Images need to be pretty clear, high DPI and made almost entirely of just text for tesseract to be able to accurately extract the text.
  • DXF extraction requires drawingtotext be available, link
0.0.2

10 months ago

0.0.1

10 months ago