0.0.7 • Published 6 years ago

pdf-paragraph-parser v0.0.7

Weekly downloads
-
License
MIT
Repository
gitlab
Last release
6 years ago

PDF Paragraph Parser

This module is a pdf parser able to split pages into paragraphs objects. The output is an array of JSON paragraph objects.

e.g.

[
  {
    "page": 1,
    "text":"The Mysterious Island\nby Jules Verne 1874"
  },
  {
    "page": 1,
    "text": "Chapter 1"
  },
  {
    "page": 1,
    "text": "Hello"
  },
  {
    "page": 1,
    "text": "World!"
  },
]

Getting started

npm install pdf-paragraph-parser
const paragraphParser = require('pdf-paragraph-parser');

paragraphParser('path/to/input/file', '%')
    .then((data) => console.log(data))
    .catch((err) => console.error(err));

Enjoy

0.0.7

6 years ago

0.0.6

6 years ago

0.0.5

6 years ago

0.0.4

6 years ago

0.0.3

6 years ago

0.0.2

6 years ago

0.0.1

6 years ago