0.0.7 • Published 6 years ago
pdf-paragraph-parser v0.0.7
PDF Paragraph Parser
This module is a pdf parser able to split pages into paragraphs objects. The output is an array of JSON paragraph objects.
e.g.
[
{
"page": 1,
"text":"The Mysterious Island\nby Jules Verne 1874"
},
{
"page": 1,
"text": "Chapter 1"
},
{
"page": 1,
"text": "Hello"
},
{
"page": 1,
"text": "World!"
},
]
Getting started
npm install pdf-paragraph-parser
const paragraphParser = require('pdf-paragraph-parser');
paragraphParser('path/to/input/file', '%')
.then((data) => console.log(data))
.catch((err) => console.error(err));
Enjoy