1.2.0 • Published 1 year ago

xml-obj-transform-stream v1.2.0

Weekly downloads
1
License
ISC
Repository
github
Last release
1 year ago

Not Maintained

xml-obj-transform-stream is no longer being actively developed and is being abandoned. At the time I last worked on it I was having trouble getting it to operate correctly. At the same time I was working on async-saxophone and having more success. That project also appears to enjoy more download activity.

Description

The purpose of this module is to convert XML into a either a stream or an asynchronous iterator of the nodes (events) encountered upon parsing the XML. The nodes that are output include start and end tags, text nodes, and other SAX parser events.

This module wraps around Saxophone. Saxophone implements a readable stream and then emits events as XML nodes are parsed. This module implements a transform stream where input is sent to Saxophone for parsing and the resultant events are then pushed out as objects to the writable side of the stream. This module alternatively implements an asynchronous generator function that uses Saxophone to parse XML from an iterator, producing an iterable sequence of nodes. In either case a succession of nodes is output, where each node is represented by small tuple-like array containing the node type and associated data.

Please refer to the Saxophone documentation for more details about how it parses XML, its limits, and for benchmarks.

The transform stream expects XML to be piped into it from a readable stream. The asynchronous iterator expects input from an iterator, which could be a readable stream.

Installation

This library works in Node.JS ≥10. To install with npm:

$ npm install --save xml-obj-transform-stream

Tests

To run tests, use the following commands:

$ git clone https://github.com/randymized/xml-obj-transform-stream.git
$ cd xml-obj-transform-stream
$ npm install
$ npm test

Transform Stream Example

const {XMLTransform}= require('xml-obj-transform-stream');
const {pipeline,Readable,Writable}= require('stream')

class ConsoleDirWritable extends Writable {
    constructor(options) {
        super(Object.assign({},options, {objectMode: true}))
    }
    _write(chunk,encoding,callback) {
        console.dir(chunk)
        callback()
    }
}

const options= {noEmptyText:true}
pipeline(
    Readable.from('<root><subpart>some text</subpart></root>'),
    new XMLTransform(options),
    new ConsoleDirWritable(),
    (err) => {
        if(err) {
            console.dir(err);
        }
    }
)

Output:

[ 'tagopen', 'root', '' ]
[ 'tagopen', 'subpart', '' ]
[ 'text', 'some text' ]
[ 'tagclose', 'subpart' ]
[ 'tagclose', 'root' ]

Asynchronous Iterator Example

const {xmlNodeGenerator}= require('xml-obj-transform-stream');
const {Readable}= require('stream')

async function consoledirWriter(iterator) {
    for await (tuple of iterator) {
        console.dir(tuple)
    }
}

const options= {noEmptyText:true}

consoledirWriter(
    xmlNodeGenerator(options,
        Readable.from('<root><subpart>some text</subpart></root>')
    )
).catch(console.error)

Output (same as from transform stream):

[ 'tagopen', 'root', '' ]
[ 'tagopen', 'subpart', '' ]
[ 'text', 'some text' ]
[ 'tagclose', 'subpart' ]
[ 'tagclose', 'root' ]

Documentation

Exports:

const {XMLTransform,xmlNodeGenerator,parseAttrs,parseEntities,AvailableNodes,OpenTagAttributeParser}= require('xml-obj-transform-stream');

  • XMLTransform is a transform stream class. It is one of the two main alternative components of this module. The constructor is documented below.

  • xmlNodeGenerator is an async generator function. It is one of the two main alternative components of this module. The function is documented below.

  • parseAttrs As a convenience, Saxophone.parseAttrs is exported by this package. It parses a string of XML attributes, such as would be output as a result of parsing an opening tag.

  • parseEntities As a convenience, Saxophone.parseEntities is exported by this package. It expands all XML entities in a string to the characters represented by the entities.

  • OpenTagAttributeParser is a simple transform stream that can follow the main XMLTransform stream in a pipeline. It will apply parseAttrs to any non-blank attribute strings in opening tags.

  • AvailableNodes a list of all the types of nodes output from this function in the form of an array of strings. The types of nodes are: tagopen, tagclose, text, cdata, comment,and processinginstruction. See the include option below for one possible use.

new XMLTransform(options)

Creates a new XML transform stream. The stream takes a stream of XML and outputs a stream of objects that represent each node encountered in the input XML.

(options):

  • include a list of node types to be output. See AvailableNodes above for a complete list. If option = {include:['tagopen','tagclose']}, for example, only opening and closing tags will be output. If include is not specified, all nodes will be output.
  • noEmptyText If truish, empty text nodes, or text that is all whitespace will not be output.
  • reportSelfClosing If truish, tagopen objects will include a fourth, true, element if the tag is self-closing (<tag />).
  • any additional options will be passed through to the Tranform constructor.

Output:

an XMLTransform object is a Transform stream with object mode output. It outputs objects representing each XML node encountered to whatever it is piped to. Each object is an array representing the node, where the first element of the array is the type of node and the other elements contain additional data about that node. The types of nodes and their output are:

  • tagopen: ['tagopen', tag-name, attr-string, is-self-closing]. -- attr-string may be parsed with parseAttrs to convert it into a key/value object -- is-self-closing is only present if the reportSelfClosing option is truish.
  • tagclose: ['tagclose', tag-name]
  • text: ['text',content]. Entities in the text may be parsed with the parseEntities function. If the noEmptyText option is truish, empty text and text consisting only of whitespace will not be output.
  • cdata: ['cdata',content]
  • commment: ['comment',content]
  • processinginstruction: ['processinginstruction',content]. Content of the processing instruction is not parsed.

xmlNodeGenerator(options,iterator)

An async generator function that takes an iterator of XML as an argument and produces an asynchronously iterable succession of nodes.

  • The options are the same as XMLTransform, except that any additional options will not be passed through to a Tranform constructor, since this is a function and not a subclass of Transform.
  • The output is similar to that from XMLTransform, except that they are in the form of an asynchronous iterator over the nodes.

Licence:

Released under the ISC license.

1.2.0

1 year ago

1.1.0

4 years ago

1.0.5

4 years ago

1.0.4

4 years ago

1.0.3

4 years ago

1.0.2

4 years ago

1.0.1

4 years ago