@ozymandiasthegreat/react-native-pdfjs v2.11.330-1
react-native-pdfjs
This is react-native port of Mozilla's excellent PDF.js
Due to lack of web API's (limitations of react-native), namely proper canvas support,
this port only functions as a parser. For rendering PDF's in your app I suggest you
check out react-native-pdf for a complete
reader solution or react-native-pdf-light
for component you can use to build your own viewer.
Rendering to canvas is now ported, albeit with some limitations. You must use @flyskywhy/react-native-gcanvas with this library, as react-native-canvas
changes the API too much (makes it async). GCanvas is an incomplete implementation of canvas API and has some outstanding bugs, therefor not every PDF will render correctly. There's also a memory leak somewhere so rendering many pages or complicated documents might crash your app.
Also, rendering text layer is now supported. It also requires GCanvas, as well as react-native-text-size since GCanvas do not implement measureText()
. You'll also need a DOM polyfill. You can use whichever you like as long as it supports basic DOM structure and methods. (There's a polyfill included with GCanvas).
If you want a reader with selectable text and custom actions, you can use @alentoma/react-native-selectable-text to render DOM nodes returned by PDF.js. This has the downside of being unable to select all text, as selection is limited to node. That's because React Native doesn't support absolute layout in Text element, so you can't wrap text nodes in a single parent without messing up the layout.
What works
So, what's the use for a parser? Well, you can implement full text search for example. Or anything else that requires textual information from PDF's. You can also retrieve outline (think table of contents) and metadata, like title of document and it's author.
Now you can also render PDF pages to canvas for display, but this is buggy and should be considered experimental.
If you want to display PDF with selectable text and custom actions (see above), you should use react-native-pdf-light
to render page with scaling set to "fitWidth", calculate viewport scale based on component/window width, and render text layer on top.
See below for example.
What doesn't work
Anything requiring canvas. Obviously react-native doesn't implement all of web API's
and canvas are among those not implemented. There are 2! even canvas plugins for react-native,
but neither is sufficiently complete to work with PDF.js.
The way PDF.js works is it renders fragments of the document on secondary canvas,
scales them, and then arranges these secondary canvases on to the main (visible) canvas.
We now skip secondary canvas, render images to GImage instance and scale and render that.
The aforementioned plugins, react-native-canvas
and @flyskywhy/react-native-gcanvas
both don't support creating secondary (non-visible) canvas. The former is more complete, but implemented
via WebView so performance is lacking. The latter is based on Alibaba's GCanvas and very buggy.
It has impressive performance, but basic methods like (This is just mostly true now)canvas.getImageData()
are not implemented.
Because of this rendering the document is not possible using react-native-pdfjs.
Rendering text layer is also not possible as it relies on canvas for text measurements.
Rendering is now possible but buggy and should be avoided. Text layer renders well enough for most use cases.
One thing that still doesn't work is font loading. We can load fonts embedded in PDF and use them in any component, except GCanvas. Therefor if you want accurate fonts, you should set disableFontFace: true
.
Usage
There are two ways to use this library: with workers and without them. Using workers requires additional library and provides slightly speedier parsing. It also doesn't block the UI thread while parsing, so it's the recommended way for release builds.
Another thing to keep in mind is that PDF.js doesn't work with mobile file systems. If you're using
react-native-fs (which is what I'm using for testing) you can only get binary files as base64 string.
PDF.js on the other hand expects an URI (file path, doesn't work) or UInt8Array of data.
For that, I suggest you install buffer polyfill (npm i buffer
) and do Buffer.from(base64str, "base64")
.
Without workers (Fake worker)
In your entry file (most likely ./index.js
) define global.MAIN_THREAD = true
,
then just import the library like import * as pdfjs from "react-native-pdfjs"
.
The main thread check fails without this definition and fake worker loading fails. Worse, it fails without errors so you end up with non-functioning library and no clues.
With workers (react-native-threads)
First, install react-native-threads. The library hasn't been updated in a while and install instructions are outdated in light of autolinking. You can find up-to-date instructions here.
Once it's set up and working import PDF.js like import *a as pdfjs from "react-native-pdfjs";
and set up worker port. Since PDF.js is not aware of Thread class, you have to instantiate the thread
yourself and pass it to PDF.js.
import { Thread } from "react-native-threads";
import * as pdfjs from "react-native-pdfjs";
pdfjs.GlobalWorkerOptions.workerPort = new Thread("./node_modules/react-native-pdfjs/dist/pdf.worker.js");
Examples
These examples assume that you have your PDF byte data in a buffer/UInt8Array called pdfData.
Getting the text
const doc = await pdfjs.getDocument({ data: pdfData }).promise;
const promises = [];
for (let i = 1; i <= doc.numPages; i++) { // PDF pages start at 1
promises.push(doc.getPage(i));
}
const pages = await Promise.all(promises).then((pages) => Promise.all(pages.map((page) => page.getTextContent())));
const text = pages.map((page) => page.items.map((item) => item.str).join(" ")).join("\n");
Getting metadata
const doc = await pdfjs.getDocument({ data: pdfData }).promise;
const { info } = await doc.getMetadata();
console.log(info["Title"]);
Getting the table of contents (Outline)
const doc = await pdfjs.getDocument({ data: pdfData }).promise;
const outline = await doc.getOutline();
const toc = [];
for (let node of outline) {
const label = node.title;
const index = await document.getPageIndex(node.dest && node.dest[0]);
toc.push({ label, index });
}
Rendering to canvas
import React, { useEffect, useState } from "react";
import { Dimensions, View } from "react-native";
import * as pdfjs from "@ozymandiasthegreat/react-native-pdfjs";
import { Thread } from "react-native-threads";
import { GCanvasView } from "@flyskywhy/react-native-gcanvas";
const PDFView = ({ data }) => {
const [canvas, setCanvas] = useState(null);
useEffect(() => {
if (canvas) {
pdfjs.GlobalWorkerOptions.workerPort = new Thread("./node_modules/@ozymandiasthegreat/react-native-pdfjs/dist/pdf.worker.js");
(async () => {
const doc = await pdfjs.getDocument({ data, disableFontFace: true }).promise;
const page = await doc.getPage(1);
let viewport = page.getViewport({ scale: 1 });
const scale = Dimensions.get("window").width / viewport.width;
viewport = page.getViewport({ scale });
const canvasContext = canvas.getContext("2d");
// GCanvas context has no ref to canvas, put it there
canvasContext.canvas = canvas;
await page.render({ viewport, canvasContext }).promise;
})();
}
}, [canvas]);
return (
<View>
<GCanvasView onCanvasCreate={setCanvas}></GCanvasView>
</View>
);
}
Rendering text layer
import React, { useEffect, useState } from "react";
import { Dimensions, StyleSheet, View } from "react-native";
import * as pdfjs from "@ozymandiasthegreat/react-native-pdfjs";
import { Thread } from "react-native-threads";
import { GCanvasView } from "@flyskywhy/react-native-gcanvas";
import { DOMParser } from "@xmldom/xmldom";
import { SelectableText } from "@alentoma/react-native-selectable-text";
const PDFView = ({ data }) => {
const [canvas, setCanvas] = useState(null);
const [spans, setSpans] = useState([]);
useEffect(() => {
const domParser = new DOMParser();
const dom = domParser.parseFromString("<html><body/></html>");
const body = dom.childNodes[0].childNodes[0];
if (canvas) {
pdfjs.GlobalWorkerOptions.workerPort = new Thread("./node_modules/@ozymandiasthegreat/react-native-pdfjs/dist/pdf.worker.js");
(async () => {
const doc = await pdfjs.getDocument({ data, disableFontFace: true }).promise;
const page = await doc.getPage(1);
let viewport = page.getViewport({ scale: 1 });
const scale = Dimensions.get("window").width / viewport.width;
viewport = page.getViewport({ scale });
const textContent = await page.getTextContent();
const textDivs = [];
await pdfjs.renderTextLayer({
canvas,
container: body,
viewport,
textContent,
textDivs,
}).promise;
setSpans(textDivs.map((span, i) => {
const fontSize = parseFloat(span.style.fontSize);
const top = parseFloat(span.style.top);
const left = parseFloat(span.style.left);
const fontFamily = span.style.fontFamily;
return (
<SelectableText
key={`${i}:${span.textContent}`}
value={span.textContent}
style={{ color: "transparent", position: "absolute", fontFamily, fontSize, left, top }}
menuItems={["Log"]}
onSelection={(e) => console.log(e)}
></SelectableText>
);
}));
})();
}
}, [canvas]);
return (
<View>
<GCanvasView onCanvasCreate={setCanvas} style={styles.canvas}></GCanvasView>
{spans}
</View>
)
}
// Keep the canvas invisible and out of the way
const screen = Dimensions.get("screen");
const styles = StyleSheet.create({
canvas: {
position: "absolute",
top: -screen.height * 3,
width: screen.width,
height: screen.height,
opacity: 0,
},
});
Bugs
If you encounter a problem, don't hesitate to open an issue. If it's rendering related, please include the problematic PDF.
4 years ago
4 years ago