pdf-template-parse v0.0.5
pdf-template-parse
A JavaScript frontend cross-browser compatible 'PDF parser w/ template engine' to convert pdf documents into organized data objects.
Live Demo: Click Here
Install
Install with npm:
npm install pdf-template-parse
Install with yarn:
yarn add pdf-template-parse
Introduction
This module exposes two functions:
1 - pdfParse (character & location extraction)
import { pdfParse } from 'pdf-template-parse';
pdfParse
takes a pdf
file and returns a promise. Promise resolves all the character data (character code, text, x, y, width) found in the provided document allowing the user to process the raw data themselves.
2 - pdfTemplateParse (character extraction & templating)
import pdfTemplateParse from 'pdf-template-parse';
pdfTemplateParse
takes a pdf
file and a template
file and returns a promise. Promise resolves all the values / tables declared in the template file. (see example below for sample template file)
Example Usage
Example 1: helloWorldDemo.pdf
sample pdf download: helloWorldDemo.pdf
import { pdfParse } from 'pdf-template-parse';
import pdf from './samplePdf/helloWorldDemo.pdf';
const characterData = pdfParse(pdf);
console.log({ characterData });
Output: (console screenshot)
** Note: the promise will not resolve if the browser tab is not visible.
Example 2: helloWorldDemo.pdf w/ template file
Template file: helloWorldDemo.json
{
"captureList": [
{
"name": "1",
"type": "value",
"rules": {
"all": {
"bounds": {
"top": 220,
"left": 70,
"bottom": 230,
"right": 140
}
}
}
},
{
"name": "2",
"type": "value",
"rules": {
"all": {
"bounds": {
"top": 220,
"left": 150,
"bottom": 230,
"right": 200
}
}
}
},
{
"name": "1+2",
"type": "value",
"rules": {
"all": {
"bounds": {
"top": 220,
"left": 70,
"bottom": 230,
"right": 200
}
}
}
}
]
}
Code:
import pdfTemplateParse from 'pdf-template-parse';
import pdf from './samplePdf/helloWorldDemo.pdf';
import template from './sampleFile/helloWorldDemo.json';
const data = pdfTemplateParse(pdf, template);
console.log({ data });
Output: (console screenshot)
** Note: the promise will not resolve if the browser tab is not visible.
Todo
- Add tests
- Replace char_offset option with character map detection
- Add value validation.
- Add template validation.
- Add node support (either remove canvas dependency or add node canvas package)
Authors
- Thomas J. Herzog - https://github.com/tomrule007
License 📄
This project is licensed under the MIT License - see the LICENSE file for details