1.0.1 • Published 4 years ago

@warren-bank/node-hocr-resizer v1.0.1

Weekly downloads
1
License
GPL-2.0
Repository
github
Last release
4 years ago

hocr-resizer

Command-line utility for resizing the coordinates in an hOCR (html-formatted ocr text) file.

Use Case:

  • lets say that we've already done the following:
    • tesseract was used to generate hOCR data from a set of high-resolution images
    • hocr-pdf was used to generate a PDF document that contains:
      • one visible image per page
      • ocr text in an invisible layer that makes the images searchable
  • what if:
    • the size of the PDF is too big
  • we can:
    • use ImageMagick to resize all images to a lower resolution
  • the problem:
    • the coordinates in the hOCR files no-longer correspond to the dimensions of the images
      • the aspect ratio hasn't changed
      • the width and height of the images (in pixels) have decreased

Existing Solution:

Reason for Yet-Another Solution:

  • I fkn hate Ruby, and its enormous non-portable runtime
  • ..why not?

Installation:

npm install --global @warren-bank/node-hocr-resizer

Usage:

hocr-resizer <options>

options:
========
"--help"
    Print a help message describing all command-line options.

"-v"
"--version"
    Display the version.

"-w" <integer>
"--width" <integer>
    [required] Width of new/resized image.

"-h" <integer>
"--height" <integer>
    [optional] Height of new/resized image.
    Default: calculated from old aspect ratio and new/resized width.

"-i" <filepath>
"--input" <filepath>
    [required] Filepath to input hOCR file.

"-o" <filepath>
"--output" <filepath>
    [optional] Filepath to output hOCR file.
    Default: overwrite input hOCR file.

Example:

  • overwrite an hOCR with updated coordinates based on the same aspect ratio and a new image width of 1275px (ie: 150dpi @ 8.5"):
      hocr-resizer -w 1275 -i '/path/to/file.hocr'

Legal: