@giancosta86/wiki-transform v1.3.0
wiki-transform
Stream transforming raw XML into wiki pages

wiki-transform provides a WikiTransform hybrid stream for NodeJS: it takes XML chunks and outputs WikiPage objects.
It is an extremely fast stream, because it internally uses a SAX parser combined with a hyper-minimalist algorithm.
Last but not least, WikiTransform is a standard stream, so you can use it in pipelines, or you can manually control it via the usual stream methods.
Installation
npm install @giancosta86/wiki-transformor
yarn add @giancosta86/wiki-transformThe public API entirely resides in the root package index, so you shouldn't reference specific modules.
Usage
Just create a new instance of WikiTransform - maybe passing options. You will then be able to:
add it to a pipeline - via a chain of
.pipe()method calls, or via thepipeline()function provided by NodeJScall its standard methods - like
.write(),.end(),.on()and.once()
Supported format
WikiTransform will create a WikiPage object whenever it encounters the following XML pattern:
<page>
<title>The title</title>
<text>The text</text>
</page>with the following rules:
The order of the subfields is ignored
Additional subfields are ignored
Ancestor nodes are ignored
Whitespace is ignored
XML entities like
>are substituted with their actual charactersCDATA blocks within significant fields are correctly parsed, and can be freely mixed with non-CDATA text
in lieu of
<page>, the root tag can be something else - just pass the related opening tag (without angle brackets) to thepageTagconstructor option
Please, note: this library does NOT support nested tags within the <text> element! To handle them, you should instead rely on dedicated SAX parsing.
Example
This basic but fairly general-purpose function:
extracts wiki pages from any source stream actually generating XML chunks - for example, an HTTP connection, or a file
outputs such
WikiPageobjects to the given target stream
import { Readable, Writable } from "node:stream";
import { pipeline } from "node:stream/promises";
import { WikiTransform } from "@giancosta86/wiki-transform";
export async function extractWikiPages(
source: Readable,
target: Writable
): Promise<void> {
const wikiTransform = new WikiTransform();
return pipeline(source, wikiTransform, target);
}Constructor parameters
pageTag: if present, defines the tag opening each page, without angle brackets. Default:"page"logger: aLoggerinterface, as exported by unified-logging. Default: no loggerhighWaterMark: if present, passed to the base constructorsignal: if present, passed to the base constructor
Additional notes
As a convenience utility, especially for testing, the package also provides a wikiPageToXml() function, which converts a WikiPage to XML - using a CDATA block in every field.
Further reference
For additional examples, please consult the unit tests in the source code repository.