0.2.0 • Published 6 years ago

pdf-search v0.2.0

Weekly downloads
-
License
MIT
Repository
-
Last release
6 years ago

pdf-searcher

Traverses a folder and parses PDFs encountered. Use in the CLI. e.g.

pdfsearch 'Registration certificate' --maxpages 4 -a 07/24 -b 08/01 -i
Searching C:\Users\bsamm\Google Drive\Scanned for files matching /Registration\s+certificate/gim with less than 4 pages, created (strictly) between "7/24/2018, 12:00:00 AM" and "8/1/2018, 12:00:00 AM"

> C:\Users\bsamm\Google Drive\Scanned\2018_07_25_07_47_00.pdf Matching Content REGISTRATION  CERTIFICATE

install with npm install -g pdf-search

This is a pretty rough implementation, thrown together on a Sunday afternoon after getting tired of digging through a folder of scanned PDFs when my scanner OCRs the documents. Why not use some custom third party search software? I want to get around to integrating with a node-opencv lib because now-adays opencv has tesseract built in. I should be able to scan images & pdf images and pull text to match against the regexp as well. That'd be a neat script right?

Licensed under MIT by Benjamin Sammons. Have fun.