1.0.0 • Published 2 years ago

aws-lambda-tesseract-french v1.0.0

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

aws-lambda-tesseract CircleCI npm.io Tesseract

Tesseract 5.1 (with French training data) to fit inside AWS Lambda

Forked from https://github.com/shelfio/aws-lambda-tesseract, all the credits go to shelf.io, I just compiled Tesseract 5.1 for french language, changed the params passed to the cli and published it !

Inspired by chrome-aws-lambda & lambda-scanner-ocr

Install

$ yarn add aws-lambda-tesseract-french

Works for Node 16.x runtime and compiled with Tesseract 5.1.0. It works with x86_64 CPUs for now only.

How does it work?

This package contains an archive with Tesseract 5.1 compiled for usage in AWS Lambda environment.

When a Lambda starts, it unpacks an archive with a binary to the /tmp folder and makes sure it's done only once per Lambda cold start.

Usage

const {getTextFromImage, isSupportedFile} = require('aws-lambda-tesseract-french');

module.exports.handler = async event => {
  // assuming there is a photo.jpg inside /tmp dir
  // original file will be deleted afterwards

  if (!isSupportedFile('/tmp/photo.jpg')) {
    return false;
  }

  return getTextFromImage('/tmp/photo.jpg');
};

isSupportedFile checks that file has image-like file extension and it's not in the list of unsupported by Tesseract file extensions.

Compile It Yourself

See compile-tesseract.sh

Smoke test that it works by running test.sh script

See Also

Publish

$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags

License

MIT © Shelf