0.0.2 • Published 2 years ago
query-file-content v0.0.2
query-file-content
A module to query file content from pdf, doc, docx, and odt formates built on the textract module
Install
npm install query-file-contentExtraction Requirements
Note, if any of the requirements below are missing, textract will run and extract all files for types it is capable. Not having these items installed does not prevent you from using textract, it just prevents you from extracting those specific files.
PDFextraction requirespdftotextbe installed, linkDOCextraction requiresantiwordbe installed, link, unless on OSX in which case textutil (installed by default) is used.RTFextraction requiresunrtfbe installed, link, unless on OSX in which case textutil (installed by default) is used.PNG,JPGandGIFrequiretesseractto be available, link. Images need to be pretty clear, high DPI and made almost entirely of just text fortesseractto be able to accurately extract the text.DXFextraction requiresdrawingtotextbe available, link