2.0.0 • Published 8 years ago
html-soup v2.0.0
html-soup
A Node.js package to do some basic HTML parsing and CSS selectors.
Usage
Parsing
htmlSoup.parse(htmlString, trimText = true) -> DOM
htmlString: The HTML to parse (string orBuffer). If an&is used followed by an alphanumeric character or#, it will be assumed to start an HTML escape sequence. If a tag that is supposed to have a closing tag does not have one, it will be assumed to continue until a closing tag that doesn't close an inner element or the end of the document is reached. Closing tags will close the innermost open tag preceding them regardless of whether the types match.trimText: Whether to trim all text (removing leading or trailing whitespace) between HTML tags. If the trimmed text is empty, no text node will be created.- DOM format: Either a single
TextNodeorHtmlTagor an array of instances of either class.TextNodehas a single field,textcontaining the text inside.HtmlTaghas the following fields: -type: The HTML tag type, e.g.div. If the document uses an uppercase tag, this field's value will be uppercased as well. -attributes: AnObjectmapping attribute names to string values if provided, ortrueif no value is provided. For example,<input type = "checkbox" checked />gives anattributesvalue of{type: 'checkbox', checked: true}. Attributes are automatically lower-cased. -children: AnArrayof child nodes. Each is either aTextNodeorHtmlTag. -parent: The parentHtmlTag. On the root node, this field has the valuenull.
When navigating the DOM tree, you can use htmlTag.child to get the first child of a tag. htmlTag.classes will give a set of classes of the tag.
Selecting
htmlSoup.select(dom, selectorString) -> Set<HtmlTag>
dom: DOM tree to search through (presumably an output ofhtmlSoup.parse())selectorString: A CSS selector string specifying which elements to select. Allowed parts of the selector (can be combined): -*: select elements of any type -tag: select elements of typetag(case-insensitive) -.class: select elements of classclass-#id: select elements of idid-selector1 selector2: select elements matchingselector2that are descendants of elements matchingselector1-selector1 > selector2: select elements matchingselector2that are children of elements matchingselector1-selector1 + selector2: select elements matchingselector2that are siblings of and directly follow elements matchingselector1-selector1 ~ selector2: select elements matchingselector2that are siblings of and follow elements matchingselector1-selector1, selector2: select elements matching eitherselector1orselector2-[attr]: select elements with attributeattrpresent -[attr=val]or[attr="val"]: select elements with attributeattrhaving the valueval-[attr~=val]or[attr~="val"]: select elements with attributeattr's value containingvalwithvalpreceded by a hypen, space, or at the start of the value andvalfollowed by a hypen, space, or at the end of the value -[attr|=val]or[attr|="val"]: select elements with attributeattr's value starting withvaland followed by a hypen, space, or at the end of the value -[attr^=val]or[attr^="val"]: select elements with attributeattr's value starting withval-[attr$=val]or[attr$="val"]: select elements with attributeattr's value ending withval-[attr*=val]or[attr*="val"]: select elements with attributeattr's value containingval- These CSS pseudo-classes are also supported::checked,:disabled,:empty,:first-child,:first-of-type,:indeterminate,:last-child,:last-of-type,:only-child,:only-of-type,:optional,:required,:root
Examples
let dom = htmlSoup.parse('<div id="one">Hi</div>');
/*
HtmlTag {
type: 'div',
attributes: { id: 'one' },
parent:
HtmlTag {
type: null,
attributes: {},
parent: null,
children: [ [Circular] ] },
children: [ TextNode { text: 'Hi' } ] }
*/
let text = dom.child; //TextNode { text: 'Hi' }
let firstYellow = htmlSoup.select(htmlSoup.parse('<p>One</p><p class="red yellow">Two</p><p class="yellow">Three</p>'), 'p.yellow:first-of-type');
/*
[ HtmlTag {
type: 'p',
attributes: { class: 'red yellow' },
parent: HtmlTag { type: null, attributes: {}, parent: null, children: [Object] },
children: [ TextNode { text: 'Two' } ] } ]
*/
let classes = htmlSoup.parse('<div class="one two three"></div>').classes;
//Set { 'one', 'two', 'three' }