0.0.7 • Published 9 years ago

pdf-textstring v0.0.7

Weekly downloads
1
License
ISC
Repository
github
Last release
9 years ago

PDF-TextString

This is a nodejs modules that will extract the text from a pdf. If there is no text attached to the pdf, then you will get the null value back. To use this module, you have to install pdftotext and pdffonts from Xpdf.

Installation

Linux

pdftotext and pdffonts are included in the poppler-utils library. To install poppler-utils execute

  • apt-get install poppler-utils

Windows

For Windows you can simply download the executables from Xpdf

Usage

the path to the pdf must be the absolute path

Windows

first add :

var pdftext = require('pdf-textstring'); 

And then just do the following :

pdftext.pdftotext(pdf_path, function (err, data) {
  if(err){
    console.log(err);
  }else{
    console.log(data)
  }
}

if pdftotext and/or pdffonts aren't in the PATH of Windows Then you can simply tell the module where the executables are located.

pdftext.setBinaryPath_PdfToText("AbsolutePath/To/Binary");
pdftext.setBinaryPath_PdfFont("AbsolutePath/To/Binary");
pdftext.pdftotext(pdf_path, function (err, data) {
  if(err){
    console.log(err);
  }else{
    console.log(data)
  }
}

i recommend the usage of the path module to get the absolute path :

var path = require('path');
var AbsolutePathToApp = path.dirname(process.mainModule.filename);
var pathToPdftotext = AbsolutePathToApp + "/binaries/pdftotext.exe";
var pathToPdffonts = AbsolutePathToApp + "/binaries/pdffonts.exe";

Linux

first add :

var pdftext = require('pdf-textstring'); 

And then just do the following :

pdftext.pdftotext(pdf_path, function (err, data) {
  if(err){
    console.log(err);
  }else{
    console.log(data)
  }
}
0.0.7

9 years ago

0.0.6

9 years ago

0.0.5

9 years ago

0.0.4

9 years ago

0.0.3

9 years ago

0.0.2

9 years ago

0.0.1

9 years ago