4.2.0 • Published 4 years ago

jmdict-streaming-parser v4.2.0

Weekly downloads
189
License
MIT
Repository
gitlab
Last release
4 years ago

jmdict-streaming-parser

Streaming parser for JMdict and related files.

API

import { createGunzip } from 'zlib'
import { createReadStream } from 'fs'
import { JmdictTransform } from 'jmdict'
import { pipeline } from 'stream'

// Stream style
pipeline(
  createReadStream("JMdict.gz"),
  createGunzip(),
  new JmdictTransform()
).on('data', data => { console.log(data) })

JmdictTransform

class JmdictTransform extends Duplex

A duplex stream that reads XML data and writes plain objects subject to the rules in § Object structure.

transform = new JmdictTransform(opts?: DuplexOptions)

Each object streamed from the transform can have one of the 3 following types. The data itself is stored in the property data while the type name is stored in the property type.

type === entity

An object containing keys name and value representing entities detected.

type === mdate

The modification date of the file, if detected. String type.

type === entities

The value of transform.entities when mdate is encountered.

type === node

Object structure

Each result object is transformed from the source XML.

  • Text nodes are transformed into a string value keyed by $text.
    • If the parent XML element only has a text node as its child, the resulting object is collapsed into just a string with the text.
      • This exploits the fact that JMdict does not contain mixed text nodes and XML elements.
    • Text nodes whose sole content is a newline are ignored.
  • XML elements are transformed into an object and appended into an array value in its corresponding parent object where the key is the name of the XML element.
    • Attributes of the element are merged into the object.
  • Children of the root node are streamed as output.
  • Entities are represented by the entity name.

This deliberate generalization is to allow for possible parsing of files similar to the JMdict.

transform.entities

Maps entity names to entity values.

4.1.4

4 years ago

4.1.5

4 years ago

4.2.0

4 years ago

4.1.1

4 years ago

4.0.3

5 years ago

4.0.2

5 years ago

4.0.1

5 years ago

4.0.0

5 years ago

3.0.0

6 years ago

2.0.1

6 years ago

1.0.3

6 years ago

1.0.2

6 years ago

1.0.1

6 years ago

1.0.0

6 years ago