10.9.0 • Published 16 days ago

@pdftron/data-extraction v10.9.0

Weekly downloads
-
License
Commercial
Repository
-
Last release
16 days ago

@pdftron/data-extraction

This package is meant to be used in conjunction with @pdftron/pdfnet-node to support IDP data extraction from Apryse. Follow this guide for more info on usage. https://docs.apryse.com/documentation/core/guides/intelligent-data-extraction/

For further reading checkout our blog post on the project. https://apryse.com/blog/introducing-automated-data-extraction-pdf-idp

Supported platform, Node.js, and Electron versions

This package depends on unmanaged add-on binaries, and the add-on binaries are not cross-platform. At the moment we have support for

  • OS: Linux (excluding Alpine), Windows(x64)
  • Node.js version: 8 - 18
  • Electron version: 6 - 19

Installation will fail if your OS, Node.js or Electron version is not supported.

Usage

Add the @pdftron/data-extraction package as a dependency in your package.json

Inside of your @pdftron/pdfnet-node code after initialization you should include the following line:

await PDFNet.addResourceSearchPath("./node_modules/@pdftron/data-extraction/lib")

Here is an example of data extraction being used with this line.

const { PDFNet } = require('@pdftron/pdfnet-node');
const licenseKey = "Insert license key here"
const inputFile = "Insert input file location here"

async function main() {
        // This is where we import data-extraction
        await PDFNet.addResourceSearchPath("./node_modules/@pdftron/data-extraction/lib")

        // Extract document structure as a JSON file
        console.log('Extract document structure as a JSON file');

        let outputFile = 'out/paragraphs_and_tables.json';
        await PDFNet.DataExtractionModule.extractData(inputFile, outputFile, PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure);

        console.log('Result saved in ' + outputFile);

        ///////////////////////////////////////////////////////
        // Extract document structure as a JSON string
        console.log('Extract document structure as a JSON string');

        outputFile = 'out/tagged.json';
        const json = await PDFNet.DataExtractionModule.extractDataAsString(inputFile, PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure);

        fs.writeFileSync(outputFile, json);
}

PDFNet.runWithCleanup(main, licenseKey).catch(function (error) {
    console.log('Error: ' + JSON.stringify(error));
}).then(function () { return PDFNet.shutdown(); });;

A larger code sample can be found here

To get started please see the documentation at https://www.pdftron.com/documentation/nodejs/get-started/integration.

Licensing

Please go to https://docs.apryse.com/documentation/core/info/license/ to obtain a demo or production license.

10.9.0

16 days ago

10.9.0-beta

16 days ago

10.8.0

1 month ago

10.8.0-beta

1 month ago

10.6.0

5 months ago

10.6.0-beta

5 months ago

10.5.0

7 months ago

10.5.0-beta

7 months ago

9.5.0-beta

8 months ago

10.2.0-2

11 months ago

10.2.0-3

11 months ago

10.2.0-1

11 months ago

10.2.0

11 months ago

10.1.1-3

12 months ago

10.1.1-2

12 months ago

10.1.1-1

12 months ago

10.1.1

12 months ago