@hgargg-0710/xml v0.1.1
xml
'xml' is a JavaScript library allowing for parsing, generation and tree-construction of XML.
Installation
npm install @hgargg-0710/xmlDocumentation
The following is the outline of the package's exports.
NOTE: much of terminology and types is from parsers.js, as this
library is based on it.
Exports (list)
The top-level exports are the following:
generator(module)char(module)element(module)tag(module)comment(module)entity(module)generate(function)parse(function)
Functions
function parse(xml: string): (XMLProlog | _XMLElement)[]Parses a given XML text and returns it as an AST.
function generate(XMLAST: (XMLProlog | _XMLElement)[]): stringTakes the XML AST (result of parser) and converts it to corresponding XML text.
NOTE: this is also the default export of the generator module.
Modules
Following are the modules of package. They provide means of parsing intermediate representations of XML, as well as their corresponding construction.
generator
Various intermediate abstractions related to XML-generation.
const xmlGenerator: PredicateMapThe IndexMap that defines the XMLGenerator SourceGenerator,
used as the primary component in the definition of generate function.
const XMLGenerator: SourceGeneratorA SourceGenerator, which takes a XMLStream and Source as input, and produces the Source,
containing the generated code as the contents of its .value property.
const XMLStream: StreamA TreeStream function intended as input for XMLGenerator.
function XMLTree(XMLAST: (_XMLElement | XMLProlog)[]): TreeConverts the XML AST (result of parser) into a Tree-interface (so as to be TreeStream-convertible).
Is a component of XMLStream.
char
This file contains abstractions related to the lowest level of parsing (character level, initial tokenization).
Tokens
const QOpBrack: TokenTypeA TokenType, corresponding to the sequence <? inside an XML file.
const QClBrack: TokenTypeA TokenType, corresponding to the sequence ?> inside an XML file.
const OpSlBrack: TokenTypeCorresponds to </ in XML file.
const ClSlBrack: TokenTypeCorresponds to /> in XML file.
const CommentBeginning: TokenTypeCorresponds <!-- in XML file.
const CommentEnding: TokenTypeCorresponds to --> in XML file.
const OpBrack: TokenTypeCorresponds to < in XML file (tokenized after all the other tokens including <).
const ClBrack: TokenTypeCorresponds to > in XML file (tokenized after all the other tokens including >).
const EqualitySign: TokenTypeCorresponds to = in XML file.
const Ampersand: TokenTypeCorresponds to & in XML file.
const Quote: TokenTypeCorresponds to one of ' or " in an XML file.
const Space: TokenTypeCorresponds to a /s/ regular expression in textual representation of an XML file.
const XMLSymbol: TokenTypeCorresponds to an arbitrary solitary symbol in an XML file (note: only when a symbol cannot be categorized otherwise)
Parsing
const xmlCharTokens: RegExpMapAn IndexMap used for defining XMLCharTokenizer, which
is a component of the parser function.
const charTokenizer: PatternTokenizerA PatternTokenizer based off xmlCharTokens.
function XMLCharTokenizer(xml: string): Token[]Takes in a string of XML returns an array of Tokens from the char module.
comment
This module contains abstractions related to the second XML parsing layer. The only thing it does is take separate comments from non-comment content.
Tokens
const XMLComment: TokenTypeRepresents an XML comment.
Parsing
function CommentParser(input: Stream): [XMLComment]A function used for parsing XML comments. The XMLCommentParser StreamParser is based on it.
const XMLCommentParser: StreamParserA parser, that takes out all comments from the given Stream of Tokens (uses CommentBeginning and CommentEnding).
The tokens between all the CommentBeginning and CommentEnding are replaced with a single XMLComment token.
entity
This is the third layer of parsing, which combines all the tokens between Ampersand and XMLSymbol(';').
Tokens
const XMLEntity: TokenTypeRepresents an XML entity (ex: & is the ampersand).
Parsing
function EntityParser(input: Stream): [XMLEntity]A function used for parsing XML entities. On it the XMLEntityParser is based.
const XMLEntityParser: StreamParserA StreamParser, expecting a Stream of tokens, that replaces all the
tokens between Ampersand and XMLSymbol(';') with an XMLEntity.
Entities survive all the other parsing stages and do not get evaluated further.
NOTE: This stage comes after the comments, because ampersands within them are not considered to be entities, but ordinary character sequences.
tag
This is the fourth parsing step, and it produces tokens used for XML tag-representation.
Tokens
const XMLName: TokenTypeAn internal TokenType - it will not appear in the final AST,
but it is used for intermediate construction of Tag-tokens.
Used for the Tag's name.
const XMLText: TokenTypeA sequence of non-entity and non-tag symbols that is a child of some element. Survives all the parsing steps, represents text data.
const XMLProlog: TokenTypeA "prolog" token (the <?xml ...?>, or some other name).
Survives the other parsing layers.
Its value is of shape { name: string, attrs: {[attrName: string]: [attrVal: (XMLSubstring | XMLEntity)[]]} }.
const XMLTag: TokenTypeAn basic opening tag token (ex: <A ...>). Does not survive the later parsing procedures.
const XMLSingleTag: TokenTypeA single XML tag (ex: <A .../>).
Does not survive the later parsing process.
const XMLClosingTag: TokenTypeThe XML tag representing closing tags (ex: </A>).
Does not survive further parsing.
const XMLAttribute: TokenTypeAn internal TokenType, used for representing attributes (does not appear in final AST).
Parsing
function TagArrayParser(input: Stream): [string, ...XMLAttribute[]]Parses the next opening tag <X attr1="..." ...>, and returns it as
an array with the first element being the name and the rest being the attributes.
Alters input
function SingleTagArrayParser(input: Stream): [string, ...XMLAttribute[]]Same as TagArrayParser, but for a single tag: <X attr1="..." ... />
function PrologArrayParser(input: Stream): [string, ...XMLAttribute[]]Same as TagArrayParser, but for a prolog tag: <?xml attr1="..." ... ?>
function ClosingTagArrayParser(input: Stream): [string]Same as TagArrayParser, but for a closing tag: </X>.
As they carry no attributes, the only element of the returned array is the name of the element.
function tagExtract(tagArray: [string, ...XMLAttribute[]]): {
name: string
attrs: { [k: string]: [v: (XMLSubstring | XMLEntity)[]] }
}Converts the [string, ...XMLAttribute[]] into the final attrs form.
It survives all the other parsing layers.
function ClosingTagParser(input: Stream): [XMLClosingParser]Conducts the parsing of a closing tag from beginning to end. Alters input.
function TextParser(input: Stream, parser: (input: Stream): any[]): [XMLText, ...any[]]Parses XMLText from beginning to end, and also the next token from input (if present).
function SpaceParser(input: Stream, parser: (input: Stream): any[]): any[]Skips all the Spaces, returning the result of parsing of the next token.
function TagParser(input: Stream, parser: (input: Stream): any[]): [XMLTag | XMLSingleTag]Parses either one of XMLSingleTag or XMLTag from beginning to end.
Alters input.
function PrologParser(input: Stream, parser: (input: Stream): any[]): [XMLProlog]Parses an XMLProlog from beginning to end. Alters input.
const tagParser: PredicateMapA PredicateMap, on which the XMLTagParser is based.
const XMLTagParser: StreamParserA StreamParser that transforms the previous layer of parsing into one containing tags.
Submodules
The module contains the only submodule of string, which handles the parsing
of the strings used for attribute values.
string
Tokens
const XMLSubstring: TokenTypeA TokenType representing an entity-free section of a given XML string "..."
Parsing
function StringParser(input: Stream): [XMLSubstring, XMLEntity?]Parses the next XMLSubstring fragment, together with (possibly),
the next entity.
Alters input.
const xmlStringParser: PredicateMapA PredicateMap, on which the XMLStringParser is based.
const XMLStringParser: StreamParserA StreamParser, which, given a Stream that only includes XMLEntity and
any other token type with .value property, returns an (XMLSubstring | XMLEntity)[].
element
This is the fifth and final layer of parsing of an XML document. It converts tags into elements and structures their contents into a tree.
Tokens
const _XMLElement: TokenTypeA TokenType representing final-level XML elements.
function XMLElement(
name: string,
attrs: { [attrName: string]: [attrVal: (XMLSubstring | XMLEntity)[]] },
value: (_XMLElement | XMLText | XMLComment | XMLEntity)[]
): _XMLElementA function that passes the three given values as an object to _XMLElement ({ name, attrs, value }).
Parsing
function ElementParser(input: Stream): [_XMLElement]Converts the current tag from the input Stream into an _XMLElement.
Alters input.
const xmlParser: PredicateMapA PredicateMap, on which the XMLElementParser is based.
const XMLElementParser: StreamParserA StreamParser representing the last parsing layer of XML, takes in the Stream of the previous parsing layer.
Usage
For usage examples and precise structure of final parse and generate functions, see test directory.