0.1.1 • Published 1 year ago

duplicate-document-image-finder v0.1.1

Weekly downloads
-
License
MIT
Repository
github
Last release
1 year ago

duplicate-documet-image-finder

A JavaScript library to find duplicate document images.

How does it work?

It extracts the text of images using OCR and uses levenshtein distance to calculate the similarity between two texts.

Online demo

Methods and Interfaces

  • find. Find the duplicated images. You can pass your own OCR results.

    async find(images:HTMLImageElement[],textLinesOfImages?:TextLine[][],progressCallback?:any):Promise<HTMLImageElement[]>
  • TextLine

    export interface TextLine{
      x:number;
      y:number;
      width:number;
      height:number;
      text:string;
    }

Install

Via NPM:

npm install duplicate-document-image-finder

Via CDN:

<script type="module">
  import { DuplicateDocumentImageFinder } from 'https://cdn.jsdelivr.net/npm/duplicate-document-image-finder/dist/duplicate-document-image-finder.js';
</script>

License

MIT

0.1.1

1 year ago

0.1.0

1 year ago