@ziord/robin NPM

Robin is an XML parser and processing library that supports a sane version of HTML. It features a set of DOM utilities, including support for XPath 1.0 for interacting with and manipulating XML/HTML documents. Typical use-cases would be processing XML or HTML files, web scraping, etc. Worthy to note that robin is a non-validating parser, which means that DTD structures are not used for validating the markup document.

Quick Start

All samples below are for the Node.js runtime.

Parsing a Document

JavaScript

const { Robin } = require("@ziord/robin");

const robin = new Robin("<tag id='1'>some value<data id='2'>123456</data></tag>", "XML"); // use "XML" mode - which is the default mode - for XML documents ("HTML" for HTML documents)

// pretty-printing the document
console.log(robin.prettify());

// alternatively
// const root = new Robin().parse("...some markup...");
// console.log(root.prettify());

TypeScript

import { Robin } from "@ziord/robin";

const robin = new Robin("<div id='1'>some value<span id='2'>123456</span></div>", "HTML"); // mode "HTML" for HTML documents
console.log(robin.prettify());

Finding an Element Using the DOM API

By Name

JavaScript

// find "data" element
const element = robin.dom(robin.getRoot()).find("data");

// pretty-print the element
console.log(element.prettify());

TypeScript

// find "data" element
import { ElementNode } from "@ziord/robin";

const element = robin.dom(robin.getRoot()).find<ElementNode>("span")!;

// pretty-print the element
console.log(element.prettify());

By Filters

JavaScript

const { DOMFilter } = require("@ziord/robin");

const root = robin.getRoot();
// find the first "data" element
robin.dom(root).find({filter: DOMFilter.ElementFilter("data")});

// find the first element having attribute "id"
robin.dom(root).find({filter: DOMFilter.AttributeFilter("id")});

// find the first element having attributes "id", "foo"
robin.dom(root).find({filter: DOMFilter.AttributeFilter(["id", "foo"])});

// find the first element having attribute "id"="2"
robin.dom(root).find({filter: DOMFilter.AttributeFilter({ id: "2" })});

// find the first "data" element having attribute "id"="2"
robin.dom(root).find({filter: DOMFilter.ElementFilter("data", { id: "2" })});

The TypeScript variant pretty much follows the same logic. There are also lots of other utility functions available in the API.

Finding an Element Using XPath

By Queries

JavaScript

// find "data" element
const element = robin.path(robin.getRoot()).queryOne("/tag/data");

// pretty-print the element
console.log(element.prettify());

TypeScript

// find "data" element
import { ElementNode } from "@ziord/robin";

const element = robin.path(robin.getRoot()).queryOne<ElementNode>("//span")!;

// pretty-print the element
console.log(element.prettify());

The XPath API also provides other utilities such as query, and queryAll

Finding an Attribute

From an element

JavaScript

// find "attributeKey" attribute
const attribute = element.getAttributeNode("attributeKey");
console.log(attribute.prettify());

From the DOM using the DOM API

JavaScript

// find "attributeKey" attribute from any "foo" element
const attribute = robin.dom(robin.getRoot()).findAttribute("foo", "attributeKey");
console.log(attribute.prettify());
console.log("key:", attribute.name.qname, "value:", attribute.value);

From the DOM using the XPath API

TypeScript

import { AttributeNode } from "@ziord/robin";
// find "attributeKey" attribute from any "foo" element
const attribute = robin.path(robin.getRoot()).queryOne<AttributeNode>("//foo[@attributeKey]/@attributeKey")!;
console.log("key:", attribute.name.qname, "value:", attribute.value);

Finding a Text

From the DOM using the DOM API

TypeScript

import { TextNode } from "@ziord/robin";
// find any text
const text = robin.dom(robin.getRoot()).find<TextNode>({text: { value: "some part of the text", match: "partial-ignoreCase" }})!; // match: "partial" | "exact" | "partial-ignoreCase" | "exact-ignoreCase"
console.log(text.stringValue());

From the DOM using the XPath API

TypeScript

import { TextNode } from "@ziord/robin";
// find any text
const text = robin.path(robin.getRoot()).queryOne<TextNode>("(//text())[1]")!;
console.log(text.stringValue());
console.log(text.prettify());

Finding a Comment

TypeScript

import { CommentNode } from "@ziord/robin";
// find a comment
const comment = robin.dom(robin.getRoot()).find<CommentNode>({comment: { value: "some part of the comment", match: "partial" }})!; // match: "partial" | "exact" | "partial-ignoreCase" | "exact-ignoreCase"
console.log(comment.stringValue());

Extracting Texts From an Element

JavaScript

// get the element's textual content
let text = robin.dom(element).text(); // string
console.log(text);

// alternatively
text = element.stringValue();
console.log(text);

See the web scraper example for more usage.