@ridi/epub-parser NPM

@ridi/epub-parser

Common EPUB2 data parser for Ridibooks services

Features

EPUB2 parsing
EPUB3 parsing
Package validation with option
Unzip epub file when parsing with options
Read files
- Extract inner HTML of body in Spine with option
- Change base path of Spine, CSS and Inline style with option
- Customize CSS, Inline Style with options
- Truncate inner HTML of body in Spine with options
- Minify HTML, CSS, Inline Style with options
Encrypt and decrypt function when parsing or reading or unzipping
More spec
- encryption.xml
- manifest.xml
- metadata.xml
- rights.xml
- signatures.xml
Debug mode
Environment
- Node
- CLI
- Browser
Online demo

Install

npm install @ridi/epub-parser

Usage

Basic:

import { EpubParser } from '@ridi/epub-parser';
// or const { EpubParser } = require('@ridi/epub-parser');

const parser = new EpubParser('./foo/bar.epub' or './unzippedPath');
parser.parse(/* { parseOptions } */).then((book) => {
  parser.readItems(book.spines/*, { readOptions } */).then((results) => {
    ...
  });
  ...
});

with AesCryptor:

import { CryptoProvider, AesCryptor } from '@ridi/epub-parser';
// or const { CryptoProvider, AesCryptor } = require('@ridi/epub-parser');

const { Purpose } = CryptoProvider;
const { Mode, Padding } = AesCryptor;

class ContentCryptoProvider extends CryptoProvider {
  constructor(key) {
    super();
    this.cryptor = new AesCryptor(Mode.ECB, { key });
  }

  getCryptor(filePath, purpose) {
    return this.cryptor;
  }

  // If use as follows:
  // const provider = new ContentCryptoProvider(...);
  // const parser = new EpubParser('encrypted.epub', provider);
  // const book = await parser.parse({ unzipPath: ... });
  // const firstSpine = await parser.readItem(book.spines[0]);
  //
  // It will be called as follows:
  // 1. run(data, 'encrypted.epub', Purpose.READ_IN_DIR)
  // 2. run(data, 'META-INF/container.xml', Purpose.READ_IN_ZIP)
  // 3. run(data, 'OEBPS/content.opf', Purpose.READ_IN_ZIP)
  // ...
  // 4. run(data, 'mimetype', Purpose.WRITE)
  // ...
  // 5. run(data, 'OEBPS/Text/Section0001.xhtml', Purpose.READ_IN_DIR)
  //
  run(data, filePath, purpose) {
    const cryptor = this.getAesCryptor(filePath, purpose);
    const padding = Padding.AUTO;
    if (purpose === Purpose.READ_IN_DIR) {
      return cryptor.decrypt(data, { padding });
    } else if (purpose === Purpose.WRITE) {
      return cryptor.encrypt(data, { padding });
    }
    return data;
  }
}

const cryptoProvider = new ContentCryptoProvider(key);
const parser = new EpubParser('./encrypted.epub' or './unzippedPath', cryptoProvider);

Log level setting:

import { LogLevel, ... } from '@ridi/epub-parser';
const parser = new EpubParser(/* path */, /* cryptoProvider */, /* logLevel */)
// or const parser = new EpubParser(/* path */, /* logLevel */)
parser.logger.logLevel = LogLevel.VERBOSE; // SILENT, ERROR, WARN(default), INFO, DEBUG, VERBOSE

API

parse(parseOptions)

Returns Promise<EpubBook> with:

EpubBook: Instance with metadata, spine list, table of contents, etc.

Or throw exception.

parseOptions: `?object`

readItem(item, readOptions)

Returns string or Buffer in Promise with:

SpineItem, CssItem, InlineCssItem, NcxItem, SvgItem:
- string
Other items:
- Buffer

or throw exception.

item: `Item` (see: Item Types)

readOptions: `?object`

readItems(items, readOptions)

Returns string[] or Buffer[] in Promise with:

SpineItem, CssItem, InlineCssItem, NcxItem, SvgItem:
- string[]
Other items:
- Buffer[]

or throw exception.

items: `Item[]` (see: Item Types)

readOptions: `?object`

unzip(unzipPath, overwrite)

Returns Promise<boolean> with:

If result is true, unzip is successful or has already been unzipped.

Or throw exception.

unzipPath: `string`

overwrite: `boolean`

onProgress = callback(step, totalStep, action)

Tells the progress of parser through callback.

const { Action } = EpubParser; // PARSE, READ_ITEMS
parser.onProgress = (step, totalStep, action) => {
  console.log(`[${action}] ${step} / ${totalStep}`);
}

Model

EpubBook

titles: string[]
creators: Author[]
subjects: string[]
description: ?string
publisher: ?string
contributors: Author[]
dates: DateTime[]
type: ?string
format: ?string
identifiers: Identifier[]
source: ?string
languages: string[]
relation: ?string
rights: ?string
version: Version
metas: Meta[]
items: Item[]
spines: SpintItem[]
ncx: ?NcxItem
fonts: FontItem[]
cover: ?ImageItem
images: ImageItem[]
styles: CssItem[]
guides: Guide[]
deadItems: DeadItem[]
toRaw(): object

Author

name: ?string
fileAs: ?string
role: string (Default: Author.Roles.UNDEFINED)
toRaw(): object

Author.Roles

Type	Value
UNDEFINED	undefined
UNKNOWN	unknown
ADAPTER	adp
ANNOTATOR	ann
ARRANGER	arr
ARTIST	art
ASSOCIATEDNAME	asn
AUTHOR	aut
AUTHOR_IN_QUOTATIONS_OR_TEXT_EXTRACTS	aqt
AUTHOR_OF_AFTER_WORD_OR_COLOPHON_OR_ETC	aft
AUTHOR_OF_INTRODUCTIONOR_ETC	aui
BIBLIOGRAPHIC_ANTECEDENT	ant
BOOK_PRODUCER	bkp
COLLABORATOR	clb
COMMENTATOR	cmm
DESIGNER	dsr
EDITOR	edt
ILLUSTRATOR	ill
LYRICIST	lyr
METADATA_CONTACT	mdc
MUSICIAN	mus
NARRATOR	nrt
OTHER	oth
PHOTOGRAPHER	pht
PRINTER	prt
REDACTOR	red
REVIEWER	rev
SPONSOR	spn
THESIS_ADVISOR	ths
TRANSCRIBER	trc
TRANSLATOR	trl

DateTime

value: ?string
event: string (Default: DateTime.Events.UNDEFINED)
toRaw(): object

DateTime.Events

Type	Value
UNDEFINED	undefined
UNKNOWN	unknown
CREATION	creation
MODIFICATION	modification
PUBLICATION	publication

Identifier

value: ?string
scheme: string (Default: Identifier.Schemes.UNDEFINED)
toRaw(): object

Identifier.Schemes

Type	Value
UNDEFINED	undefined
UNKNOWN	unknown
DOI	doi
ISBN	isbn
ISBN13	isbn13
ISBN10	isbn10
ISSN	issn
UUID	uuid
URI	uri

Guide

title: ?string
type: string (Default: Guide.Types.UNDEFINED)
href: ?string
item: ?Item
toRaw(): object

Guide.Types

Type	Value
UNDEFINED	undefined
UNKNOWN	unknown
COVER	cover
TITLE_PAGE	title-page
TOC	toc
INDEX	index
GLOSSARY	glossary
ACKNOWLEDGEMENTS	acknowledgements
BIBLIOGRAPHY	bibliography
COLOPHON	colophon
COPYRIGHT_PAGE	copyright-page
DEDICATION	dedication
EPIGRAPH	epigraph
FOREWORD	foreword
LOI	loi
LOT	lot
NOTES	notes
PREFACE	preface
TEXT	text

Item Types

Item

id: ?string
href: ?string
mediaType: ?string
size: ?number
isFileExists: boolean (size !== undefined)
toRaw(): object

SpineItem (extend Item)

index: number (Default: undefined)
isLinear: boolean (Default: true)
styles: ?CssItem[]
first: ?SpineItem
prev: ?SpineItem
next: ?SpineItem

NcxItem (extend Item)

navPoints: NavPoint[]

CssItem (extend Item)

namespace: string

InlineCssItem (extend CssItem)

style: string (Default: '')

ImageItem (extend Item)

isCover: boolean (Default: false)

SvgItem (extend ImageItem)

FontItem (extend Item)

DeadItem (extend Item)

reason: string (Default: DeadItem.Reason.UNDEFINED)

DeadItem.Reason

Type	Value
UNDEFINED	undefined
UNKNOWN	unknown
NOT_EXISTS	not_exists
NOT_SPINE	not_spine
NOT_NCX	not_ncx
NOT_SUPPORT_TYPE	not_support_type

NavPoint

id: ?string
label: ?string
src: ?string
anchor: ?string
depth: number (Default: 0)
children: NavPoint[]
spine: ?SpineItem
toRaw(): object

Version

major: number
minor: number
patch: number
toString(): string

Parse Options

validatePackage: `boolean`

If true, validation package specifications in IDPF listed below.

used only if input is EPUB file.

Zip header should not corrupt.
mimetype file must be first file in archive.
mimetype file should not compressed.
mimetype file should only contain string application/epub+zip.
Should not use extra field feature of ZIP format for mimetype file.

Default: false

allowNcxFileMissing: `boolean`

If false, stop parsing when NCX file not exists.

Default: true

unzipPath: `?string`

If specified, unzip to that path.

only using if input is EPUB file.

Default: undefined

overwrite: `boolean`

If true, overwrite to unzipPath when unzip.

only using if unzipPath specified.

Default: true

parseStyle: `boolean`

If true, styles used for spine is described, and one namespace is given per CSS file or inline style.

Otherwise it CssItem.namespace, SpineItem.styles is undefined.

In any list, InlineCssItem is always positioned after CssItem. (EpubBook.styles, EpubBook.items, SpineItem.styles, ...)

Default: true

styleNamespacePrefix: `string`

Prepend given string to namespace for identification.

only available if parseStyle is true.

Default: 'ridi_style'

additionalInlineStyle: `?string`

If specified, added inline styles to all spines.

only available if parseStyle is true.

Default: undefined

Read Options

force: boolean

If true, ignore any exceptions that occur within parser.

Default: false

basePath: `?string`

If specified, change base path of paths used by spine and css.

HTML: SpineItem

...
  <!-- Before -->
  <div>
    <img src="../Images/cover.jpg">
  </div>
  <!-- After -->
  <div>
    <img src="{basePath}/OEBPS/Images/cover.jpg">
  </div>
...

CSS: CssItem, InlineCssItem

/* Before */
@font-face {
  font-family: NotoSansRegular;
  src: url("../Fonts/NotoSans-Regular.ttf");
}
/* After */
@font-face {
  font-family: NotoSansRegular;
  src: url("{basePath}/OEBPS/Fonts/NotoSans-Regular.ttf");
}

Default: undefined

extractBody: `boolean|function`

If true, extract body. Otherwise it returns a full string. If specify a function instead of true, use function to transform body.

false:

'<!doctype><html>\n<head>\n</head>\n<body style="background-color: #000000;">\n  <p>Extract style</p>\n  <img src=\"../Images/api-map.jpg\"/>\n</body>\n</html>'

true:

'<body style="background-color: #000000;">\n  <p>Extract style</p>\n  <img src=\"../Images/api-map.jpg\"/>\n</body>'

function:

readOptions.extractBody = (innerHTML, attrs) => {
  const string = attrs.map((attr) => {
    return ` ${attr.key}=\"${attr.value}\"`;
  }).join(' ');
  return `<article ${string}>${innerHTML}</article>`;
};

'<article style="background-color: #000000;">\n  <p>Extract style</p>\n  <img src=\"../Images/api-map.jpg\"/>\n</article>'

Default: false

serializedAnchor: `Boolean`

If true, replace file path of anchor in spine with spine index.

...
<spine toc="ncx">
  <itemref idref="Section0001.xhtml"/> <!-- index: 0 -->
  <itemref idref="Section0002.xhtml"/> <!-- index: 1 -->
  <itemref idref="Section0003.xhtml"/> <!-- index: 2 -->
  ...
</spine>
...

<!-- Before -->
<a href="./Text/Section0002.xhtml#title">Chapter 2</a>
<!-- After -->
<a href="1#title">Chapter 2</a>

Default: false

ignoreScript: `boolean`

Ignore all scripts from within HTML.

Default: false

removeAtrules: `string[]`

Remove at-rules.

Default: []

removeTagSelector: `string[]`

Remove selector that point to specified tags.

Default: []

removeIdSelector: `string[]`

Remove selector that point to specified ids.

Default: []

removeClassSelector: `string[]`

Remove selector that point to specified classes.

Default: []

License

MIT

EPUB EPUB2 parser serialize deserialize unzip read crypto

he css-tree fs-extra himalaya fast-xml-parser @ridi/parser-core

@everything-registry/sub-chunk-781 terminal-book-reader @ridi/content-parser

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

5 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

6 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

7 years ago

8 years ago

8 years ago

8 years ago

8 years ago

@ridi/epub-parser

Features

Install

Usage

API

parse(parseOptions)

parseOptions: ?object

readItem(item, readOptions)

item: Item (see: Item Types)

readOptions: ?object

readItems(items, readOptions)

items: Item[] (see: Item Types)

readOptions: ?object

unzip(unzipPath, overwrite)

unzipPath: string

overwrite: boolean

onProgress = callback(step, totalStep, action)

Model

Item Types

SpineItem (extend Item)

NcxItem (extend Item)

CssItem (extend Item)

InlineCssItem (extend CssItem)

ImageItem (extend Item)

SvgItem (extend ImageItem)

FontItem (extend Item)

DeadItem (extend Item)

Parse Options

validatePackage: boolean

allowNcxFileMissing: boolean

unzipPath: ?string

overwrite: boolean

parseStyle: boolean

styleNamespacePrefix: string

additionalInlineStyle: ?string

Read Options

force: boolean

basePath: ?string

extractBody: boolean|function

serializedAnchor: Boolean

ignoreScript: boolean

removeAtrules: string[]

removeTagSelector: string[]

removeIdSelector: string[]

removeClassSelector: string[]

License

parseOptions: `?object`

item: `Item` (see: Item Types)

readOptions: `?object`

items: `Item[]` (see: Item Types)

readOptions: `?object`

unzipPath: `string`

overwrite: `boolean`

validatePackage: `boolean`

allowNcxFileMissing: `boolean`

unzipPath: `?string`

overwrite: `boolean`

parseStyle: `boolean`

styleNamespacePrefix: `string`

additionalInlineStyle: `?string`

basePath: `?string`

extractBody: `boolean|function`

serializedAnchor: `Boolean`

ignoreScript: `boolean`

removeAtrules: `string[]`

removeTagSelector: `string[]`

removeIdSelector: `string[]`

removeClassSelector: `string[]`