1.2.1 • Published 1 year ago
@bsorrentino/pdf-tools v1.2.1
pdf-tools
Tools to extract/transform data from PDF
inspired by project: pdf-to-markdown
Installation
npm install @bsorrentino/pdf-tools -gRequirements
- NodeJs >= 16
- Since pdf-tools use
canvasthat is aCairo-backed Canvas implementation for Node.js take a look to its reqirements
pdftools Commands
common options
-o, --outdir [folder] output folder (default: "out")pdfximages
extract images (as png) from pdf and save it to the given folder
Usage:
pdftools pdfximages|pxi [options] <pdf>pdf2images
create an image (as png) for each pdf page
Usage:
pdftools pdf2images|p2i <pdf>pdf2md
convert pdf to markdown format.
Usage:
pdftools pdf2md|p2md [options] <pdf>Options:
-ps, --pageseparator [separator] add page separator (default: "---")
--imageurl [url prefix] imgage url prefix
--stats print stats information
--debug print debug informationConversion to Markdown
supported features
- Detect headers
- Detect and extract images
- Extract plain text
- Extract fonts and allow custom mapping through a generated file
<document name>.font.jsonSupported fonts bold, italic,
monospace, bold+italic - Detect code block ( i.e.
```) - Detect external link
TO DO
- Detect TOC