0.2.10 • Published 2 years ago

@candlelib/html v0.2.10

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

CandleLibrary HTML is a HTML parser that builds a node graph of HTML elements. It provides methods for hooking into the parsing process to generate custom HTML node graphs.

Install

NPM

npm install --save @candlelib/html

Usage

note: This script uses ES2015 module syntax, and has the extension .mjs. To include this script in a project, you may need to use the node flag --experimental-modules; or, use a bundler that supports ES modules, such as rollup.

import html from "@candlelib/html"

html(`<div><a>hello world!</a></div>`).then(root=>{
	root.tag //=> div
	root.get //
})

Notes

CandleLibrary HTML makes use of a none standard attribute to provide asynchronous HTML building. The url attribute can be used to fetch arbitrary data and insert that into the inner HTML of the element that has the attribute.

e.g.

<!--file src.html -->
	<h1>
		<button style="background-color:red">Don't Touch</button>
	</h1>

In Javascript

//javascript file in same folder
html(`<div url="./src.html"></div>`).then( root=>{

	const button = root.getTag("button", true)[0];

	button.toString() //=> "<button style="background-color:red">Don't Touch</button>"
})

Members

HTMLNode

mixin @candlelib/ll - tree

import {HTMLNode} from "@candlelib/html"

Constructor

new HTMLNode ( )

Properties

  • class - String The class attribute value.

  • classList - Array Array of all class values.

  • DTD - Boolean True if the HTMLNode is a DTD element, such as a comment or <!DOCTYPE>.

  • id - String The id attribute value.

  • nextElementSibling - HTMLNode Returns the next sibling HTMLNode or null

  • parentElement - HTMLNode Returns the parent HTMLNode or null;

  • previousElementSibling HTMLNode Returns the previous sibling HTMLNode or null

  • single - Boolean True if the element is a single tag element, such as \<input>

  • tag - String The tag name of the object.

  • tagName - String Same as tag.

  • type (Read-Only) - Number 0 (HTML).

  • url - CandleLibrary URL If the element tag in the orignal HTML string contained an attribute named url, then value of that attribute is applied to url.

Methods

  • HTMLElement - build ( parent ) Builds an HTMLElement tree from parsed nodes. If an HTMLNode is passed as parent, the HTMLElements will be appended to parent.
  • Object - getAttrib ( prop ) Returns the value of an attribute whose name matches prop, or it returns null if no attributes match the value.
  • String - getClass ( class_name [ , INCLUDE_DESCENDANTS , array ] ) Returns an array of HTMLNodes that have values in their class attribute that matches _class. If INCLUDE_DESCENDANTS is set to true, all descendants of the node will searched, otherwise only the immediate children of the node will be searched. An optional Array can be passed as array to store the results in.
  • String - getID ( id , INCLUDE_DESCENDANTS ) Returns an array of HTMLNodes whose id property matches id. If INCLUDE_DESCENDANTS is set to true, all descendants of the node will searched, otherwise only the immediate children of the node will be searched. An optional Array can be passed as array to store the results in.
  • String - getTag ( tag [ , INCLUDE_DESCENDANTS , array ] ) Returns an array of HTMLNodes whose tag property matches tag. If INCLUDE_DESCENDANTS is set to true, all descendants of the node will searched, otherwise only the immediate children of the node will be searched. An optional Array can be passed as array to store the results in.
  • Promise - parse ( lex , url ) Parses HTML string. Accepts a Whind Lexer or a string as the value for lex.
  • String - toString ( offset ) Returns a string representation of the HTMLNode. This rebuilds the original HTML string starting at the calling node. A number can passed to offset to indent string offset spaces.

Private

  • TextNode - createTextNode ( lex , start , end ) Called by parseRunner to create a new TextNode.
  • parseOpenTag ( lex , DTD , old_wurl ) Called by parseRunner to parse an open HTML tag.
  • parseRunner ( lex , OPENED , IGNORE_TEXT_TILL_CLOSE_TAG , parent , last_url) Called by various methods to continue parsing an HTML input string.

Hooks - Methods that can be overridden in derived objects

  • HTMLNode - createHTMLNodeHook ( tag , start ) Override this method to create a different node type for the given value of tag. The start value is the character position offset at the start of the element open tag.

    If overridden, returned object should support:

    • Linked List methods and properties provided by @candlelib/ll mixins.
    • All properties and methods in HTMLNode
  • Boolean - endOfElementHook ( lex , parent ) Override this method to hook into the last stage of element parsing. lex will be set to just after the close tag of the element within the input string. The value of lex.off combined with the start value passed in createHTMLNodeHook define the bounds of the element in the input string, starting at the beginning of the open tag (start) through to the end the > character of the close tag (lex.off). parent is the parent HTMLNode.
  • Boolean - ignoreTillHook ( tag ) Override this method and return true to tell the parser to not to parse inner HTML data of a tag and simply skip over it.
  • Object - processAttributeHook ( name , lex ) Override this method to parse attribute data. The returned object of this function should contain name and value properties to allow the object to work with the getAttrib function eg: return {name:"id", value:"mango"}. If null is returned instead, nothing will be inserted into the attributes array. - name is a string value with the name of the attribute in the original HTML. - lex is a fenced Whind Lexer that contains the string value of the attribute.
  • Promise or null - processFetchHook ( lexer , OPENED , IGNORE_TEXT_TILL_CLOSE_TAG , parent , url ) Override this method to process how a url based resource is fetched.
    	> If overridden:  
    	>
    	> - This function should return either **null** or a **Promise**. If a **Promise** is returned, the parser will wait until the promise is resolved. This enables external content to be fetched and parsed.
    	>	
    	> - If you want to continue processing the returned data with the HTMLNode parse mechanism, call `this.parseRunner`, and pass the string value of the fetched data wrapped in a [Whind Lexer](https://github.com/galactrax/cfw-whind), **OPENED** , **IGNORE_TEXT_TILL_CLOSE_TAG**, **parent**, and **url** to the function. Passing these values will preserve the state of the parser.
    	>
    	> e.g:
    	> ```javascript
    	> import whind from "@candlelib/whind"
    	> /*...
    	>   ...
    	>   ...*/
    	> DerivedNode.prototype.processFetchHook = function(lexer, OPENED, IGNORE_TEXT_TILL_CLOSE_TAG, parent, url){
    	>  	return fetch(url)
    	> 	.then(res => {res.text()
    	>		.then(txt => this.parsesRunner(whind(txt), OPENED, IGNORE_TEXT_TILL_CLOSE_TAG, parent, url))
    	>	})
    	> }
    	> ```
    	> **Warning**: It is up to the implementer to follow best practices when dealing with external data with regard to client and server safety. Additional issues can occur if URL recursion is not taken into account, which can lead to an infinite fetching loop within the parser! Check that the URL has not already been fetched by an ancestor HTMLNode before attempting to fetch a resource.
  • TextNode - processTextNodeHook ( lex , IS_INNER_HTML ) Override this to process inner HTML text before creating and returning a TextNode. If null is returned, then the text data will be omitted from the resulting HTMLNode tree. - lex is a fenced Whind Lexer that contains the raw text data that is to inserted into the TextNode. - IS_INNER_HTML a Boolean value set to true if the lex data contains the entirety of the elements inner HTML. If false, then the data is the text data between sibling HTMLNodes.
  • Boolean - selfClosingTagHook ( tag ) Override this method and return true to tell the parser that the HTML tag nametag is self closing and to not look for a matching close tag. e.g. return (tag === "input") ? true : false;

TextNode

mixin @candlelib/ll - tree

import {TextNode} from "@candlelib/html"

Constructor

new TextNode ( str )

Properties

  • txt - String The string contents of the node.
  • type (Read-Only) - Number 1 (TEXT)

Methods

  • HTMLTextNode - build ( ) Builds a and returns a HTMLTextNode.
  • String - toString ( offset ) Returns a string representation of the TextNode. A number can passed to offset to indent string offset spaces.
0.2.9

2 years ago

0.2.10

2 years ago

0.2.5

3 years ago

0.2.4

3 years ago

0.2.3

3 years ago

0.2.2

3 years ago

0.2.1

3 years ago