12.3.0 • Published 1 year ago

mdast-util-to-hast-cooek v12.3.0

Weekly downloads
-
License
MIT
Repository
github
Last release
1 year ago

mdast-util-to-hast

Build Coverage Downloads Size Sponsors Backers Chat

mdast utility to transform to hast.

Contents

What is this?

This package is a utility that takes an mdast (markdown) syntax tree as input and turns it into a hast (HTML) syntax tree.

When should I use this?

This project is useful when you want to deal with ASTs and turn markdown to HTML.

The hast utility hast-util-to-mdast does the inverse of this utility. It turns HTML into markdown.

The remark plugin remark-rehype wraps this utility to also turn markdown to HTML at a higher-level (easier) abstraction.

Install

This package is ESM only. In Node.js (version 14.14+ and 16.0+), install with npm:

npm install mdast-util-to-hast

In Deno with esm.sh:

import {toHast} from 'https://esm.sh/mdast-util-to-hast@12'

In browsers with esm.sh:

<script type="module">
  import {toHast} from 'https://esm.sh/mdast-util-to-hast@12?bundle'
</script>

Use

Say we have the following example.md:

## Hello **World**!

…and next to it a module example.js:

import {fs} from 'node:fs/promises'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'

const markdown = String(await fs.readFile('example.md'))
const mdast = fromMarkdown(markdown)
const hast = toHast(mdast)
const html = toHtml(hast)

console.log(html)

…now running node example.js yields:

<h2>Hello <strong>World</strong>!</h2>

API

This package exports the identifiers defaultHandlers and toHast. There is no default export.

toHast(tree[, options])

Transform mdast to hast.

Parameters
Returns

hast tree (HastNode | null | undefined).

Notes
HTML

Raw HTML is available in mdast as html nodes and can be embedded in hast as semistandard raw nodes. Most utilities ignore raw nodes but two notable ones don’t:

  • hast-util-to-html also has an option allowDangerousHtml which will output the raw HTML. This is typically discouraged as noted by the option name but is useful if you completely trust authors
  • hast-util-raw can handle the raw embedded HTML strings by parsing them into standard hast nodes (element, text, etc). This is a heavy task as it needs a full HTML parser, but it is the only way to support untrusted content
Footnotes

Many options supported here relate to footnotes. Footnotes are not specified by CommonMark, which we follow by default. They are supported by GitHub, so footnotes can be enabled in markdown with mdast-util-gfm.

The options footnoteBackLabel and footnoteLabel define natural language that explains footnotes, which is hidden for sighted users but shown to assistive technology. When your page is not in English, you must define translated values.

Back references use ARIA attributes, but the section label itself uses a heading that is hidden with an sr-only class. To show it to sighted users, define different attributes in footnoteLabelProperties.

Clobbering

Footnotes introduces a problem, as it links footnote calls to footnote definitions on the page through id attributes generated from user content, which results in DOM clobbering.

DOM clobbering is this:

<p id=x></p>
<script>alert(x) // `x` now refers to the DOM `p#x` element</script>

Elements by their ID are made available by browsers on the window object, which is a security risk. Using a prefix solves this problem.

More information on how to handle clobbering and the prefix is explained in Example: headings (DOM clobbering) in rehype-sanitize.

Unknown nodes

Unknown nodes are nodes with a type that isn’t in handlers or passThrough. The default behavior for unknown nodes is:

  • when the node has a value (and doesn’t have data.hName, data.hProperties, or data.hChildren, see later), create a hast text node
  • otherwise, create a <div> element (which could be changed with data.hName), with its children mapped from mdast to hast as well

This behavior can be changed by passing an unknownHandler.

defaultHandlers

Default handlers for nodes (Handlers).

Handler

Handle a node (TypeScript).

Parameters
Returns

Result (HastNode | Array<HastNode> | null | undefined).

Handlers

Handle nodes (TypeScript).

Type
type Handlers = Record<string, Handler>

Options

Configuration (TypeScript).

Fields
  • allowDangerousHtml (boolean, default: false) — whether to persist raw HTML in markdown in the hast tree
  • clobberPrefix (string, default: 'user-content-') — prefix to use before the id attribute on footnotes to prevent it from clobbering
  • footnoteBackLabel (string, default: 'Back to content') — label to use from backreferences back to their footnote call (affects screen readers)
  • footnoteLabel (string, default: 'Footnotes') — label to use for the footnotes section (affects screen readers)
  • footnoteLabelProperties (Properties, default: {className: ['sr-only']}) — properties to use on the footnote label (note that id: 'footnote-label' is always added as footnote calls use it with aria-describedby to provide an accessible label)
  • footnoteLabelTagName (string, default: h2) — tag name to use for the footnote label
  • handlers (Handlers, optional) — extra handlers for nodes
  • passThrough (Array<string>, optional) — list of custom mdast node types to pass through (keep) in hast (note that the node itself is passed, but eventual children are transformed)
  • unknownHandler (Handler, optional) — handle all unknown nodes

Raw

Raw string of HTML embedded into HTML AST (TypeScript).

Type
import type {Literal} from 'hast'

interface Raw extends Literal {
  type: 'raw'
}

State

Info passed around about the current state (TypeScript type).

Fields
  • patch ((from: MdastNode, to: HastNode) => void) — copy a node’s positional info
  • applyData (<Type extends HastNode>(from: MdastNode, to: Type) => Type | HastElement) — honor the data of from and maybe generate an element instead of to
  • one ((node: MdastNode, parent: MdastNode | undefined) => HastNode | Array<HastNode> | undefined) — transform an mdast node to hast
  • all ((node: MdastNode) => Array<HastNode>) — transform the children of an mdast parent to hast
  • wrap (<Type extends HastNode>(nodes: Array<Type>, loose?: boolean) => Array<Type | HastText>) — wrap nodes with line endings between each node, adds initial/final line endings when loose
  • handlers (Handlers) — applied node handlers
  • footnoteById (Record<string, MdastFootnoteDefinition>) — footnote definitions by their uppercased identifier
  • footnoteOrder (Array<string>) — identifiers of order when footnote calls first appear in tree order
  • footnoteCounts (Record<string, number>) — counts for how often the same footnote was called

Examples

Example: supporting HTML in markdown naïvely

If you completely trust authors (or plugins) and want to allow them to HTML in markdown, and the last utility has an allowDangerousHtml option as well (such as hast-util-to-html) you can pass allowDangerousHtml to this utility (mdast-util-to-hast):

import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'

const markdown = 'It <i>works</i>! <img onerror="alert(1)">'
const mdast = fromMarkdown(markdown)
const hast = toHast(mdast, {allowDangerousHtml: true})
const html = toHtml(hast, {allowDangerousHtml: true})

console.log(html)

…now running node example.js yields:

<p>It <i>works</i>! <img onerror="alert(1)"></p>

⚠️ Danger: observe that the XSS attack through the onerror attribute is still present.

Example: supporting HTML in markdown properly

If you do not trust the authors of the input markdown, or if you want to make sure that further utilities can see HTML embedded in markdown, use hast-util-raw. The following example passes allowDangerousHtml to this utility (mdast-util-to-hast), then turns the raw embedded HTML into proper HTML nodes (hast-util-raw), and finally sanitizes the HTML by only allowing safe things (hast-util-sanitize):

import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {raw} from 'hast-util-raw'
import {sanitize} from 'hast-util-sanitize'
import {toHtml} from 'hast-util-to-html'

const markdown = 'It <i>works</i>! <img onerror="alert(1)">'
const mdast = fromMarkdown(markdown)
const hast = raw(toHast(mdast, {allowDangerousHtml: true}))
const safeHast = sanitize(hast)
const html = toHtml(safeHast)

console.log(html)

…now running node example.js yields:

<p>It <i>works</i>! <img></p>

👉 Note: observe that the XSS attack through the onerror attribute is no longer present.

Example: footnotes in languages other than English

If you know that the markdown is authored in a language other than English, and you’re using micromark-extension-gfm and mdast-util-gfm to match how GitHub renders markdown, and you know that footnotes are (or can?) be used, you should translate the labels associated with them.

Let’s first set the stage:

import {fromMarkdown} from 'mdast-util-from-markdown'
import {gfm} from 'micromark-extension-gfm'
import {gfmFromMarkdown} from 'mdast-util-gfm'
import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'

const markdown = 'Bonjour[^1]\n\n[^1]: Monde!'
const mdast = fromMarkdown(markdown, {
  extensions: [gfm()],
  mdastExtensions: [gfmFromMarkdown()]
})
const hast = toHast(mdast)
const html = toHtml(hast)

console.log(html)

…now running node example.js yields:

<p>Bonjour<sup><a href="#user-content-fn-1" id="user-content-fnref-1" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>
<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
<ol>
<li id="user-content-fn-1">
<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref class="data-footnote-backref" aria-label="Back to content">↩</a></p>
</li>
</ol>
</section>

This is a mix of English and French that screen readers can’t handle nicely. Let’s say our program does know that the markdown is in French. In that case, it’s important to translate and define the labels relating to footnotes so that screen reader users can properly pronounce the page:

@@ -9,7 +9,10 @@ const mdast = fromMarkdown(markdown, {
   extensions: [gfm()],
   mdastExtensions: [gfmFromMarkdown()]
 })
-const hast = toHast(mdast)
+const hast = toHast(mdast, {
+  footnoteLabel: 'Notes de bas de page',
+  footnoteBackLabel: 'Arrière'
+})
 const html = toHtml(hast)

 console.log(html)

…now running node example.js with the above patch applied yields:

@@ -1,8 +1,8 @@
 <p>Bonjour<sup><a href="#user-content-fn-1" id="user-content-fnref-1" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>
-<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
+<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Notes de bas de page</h2>
 <ol>
 <li id="user-content-fn-1">
-<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref class="data-footnote-backref" aria-label="Back to content">↩</a></p>
+<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref class="data-footnote-backref" aria-label="Arrière">↩</a></p>
 </li>
 </ol>
 </section>

Example: supporting custom nodes

This project supports CommonMark and the GFM constructs (footnotes, strikethrough, tables) and the frontmatter constructs YAML and TOML. Support can be extended to other constructs in two ways: a) with handlers, b) through fields on nodes.

For example, when we represent a mark element in markdown and want to turn it into a <mark> element in HTML, we can use a handler:

import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'

const mdast = {
  type: 'paragraph',
  children: [{type: 'mark', children: [{type: 'text', value: 'x'}]}]
}

const hast = toHast(mdast, {
  handlers: {
    mark(state, node) {
      return {
        type: 'element',
        tagName: 'mark',
        properties: {},
        children: state.all(node)
      }
    }
  }
})

console.log(toHtml(hast))

We can do the same through certain fields on nodes:

import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'

const mdast = {
  type: 'paragraph',
  children: [
    {
      type: 'mark',
      children: [{type: 'text', value: 'x'}],
      data: {hName: 'mark'}
    }
  ]
}

console.log(toHtml(toHast(mdast)))

Algorithm

This project by default handles CommonMark, GFM (footnotes, strikethrough, tables) and common frontmatter (YAML, TOML).

Existing handlers can be overwritten and handlers for more nodes can be added. It’s also possible to define how mdast is turned into hast through fields on nodes.

Default handling

The following table gives insight into what input turns into what output:

blockquote

> A greater than…

element (blockquote)

<blockquote>
<p>A greater than…</p>
</blockquote>

break

A backslash\
before a line break…

element (br)

<p>A backslash<br>
before a line break…</p>

code

```js
backtick.fences('for blocks')
```

element (pre and code)

<pre><code className="language-js">backtick.fences('for blocks')
</code></pre>

delete (GFM)

Two ~~tildes~~ for delete.

element (del)

<p>Two <del>tildes</del> for delete.</p>

emphasis

Some *asterisks* for emphasis.

element (em)

<p>Some <em>asterisks</em> for emphasis.</p>

footnoteReference, footnoteDefinition (GFM)

With a [^caret].

[^caret]: Stuff

element (section, sup, a)

<p>With a <sup><a href="#fn-caret" …>1</a></sup>.</p>…

heading

# One number sign…
###### Six number signs…

element (h1h6)

<h1>One number sign…</h1>
<h6>Six number signs…</h6>

html

<kbd>CMD+S</kbd>

Nothing (default), raw (when allowDangerousHtml: true)

n/a

image

![Alt text](/logo.png "title")

element (img)

<p><img src="/logo.png" alt="Alt text" title="title"></p>

imageReference, definition

![Alt text][logo]

[logo]: /logo.png "title"

element (img)

<p><img src="/logo.png" alt="Alt text" title="title"></p>

inlineCode

Some `backticks` for inline code.

element (code)

<p>Some <code>backticks</code> for inline code.</p>

link

[Example](https://example.com "title")

element (a)

<p><a href="https://example.com" title="title">Example</a></p>

linkReference, definition

[Example][]

[example]: https://example.com "title"

element (a)

<p><a href="https://example.com" title="title">Example</a></p>

list, listItem

* asterisks for unordered items

1. decimals and a dot for ordered items

element (li and ol or ul)

<ul>
<li>asterisks for unordered items</li>
</ul>
<ol>
<li>decimals and a dot for ordered items</li>
</ol>

paragraph

Just some text…

element (p)

<p>Just some text…</p>

root

Anything!

root

<p>Anything!</p>

strong

Two **asterisks** for strong.

element (strong)

<p>Two <strong>asterisks</strong> for strong.</p>

text

Anything!

text

<p>Anything!</p>

table, tableRow, tableCell

| Pipes |
| ----- |

element (table, thead, tbody, tr, td, th)

<table>
<thead>
<tr>
<th>Pipes</th>
</tr>
</thead>
</table>

thematicBreak

Three asterisks for a thematic break:

***

element (hr)

<p>Three asterisks for a thematic break:</p>
<hr>

toml (frontmatter)

+++
fenced = true
+++

Nothing

n/a

yaml (frontmatter)

---
fenced: yes
---

Nothing

n/a

👉 Note: GFM prescribes that the obsolete align attribute on td and th elements is used. To use style attributes instead of obsolete features, combine this utility with @mapbox/hast-util-table-cell-style.

🧑‍🏫 Info: this project is concerned with turning one syntax tree into another. It does not deal with markdown syntax or HTML syntax. The preceding examples are illustrative rather than authoritative or exhaustive.

Fields on nodes

A frequent problem arises when having to turn one syntax tree into another. As the original tree (in this case, mdast for markdown) is in some cases limited compared to the destination (in this case, hast for HTML) tree, is it possible to provide more info in the original to define what the result will be in the destination? This is possible by defining data on mdast nodes, which this utility will read as instructions on what hast nodes to create.

An example is math, which is a nonstandard markdown extension, that this utility doesn’t understand. To solve this, mdast-util-math defines instructions on mdast nodes that this plugin does understand because they define a certain hast structure.

The following fields can be used:

  • node.data.hName — define the element’s tag name
  • node.data.hProperties — define extra properties to use
  • node.data.hChildren — define hast children to use
hName

node.data.hName sets the tag name of an element. The following mdast:

{
  type: 'strong',
  data: {hName: 'b'},
  children: [{type: 'text', value: 'Alpha'}]
}

…yields (hast):

{
  type: 'element',
  tagName: 'b',
  properties: {},
  children: [{type: 'text', value: 'Alpha'}]
}
hProperties

node.data.hProperties sets the properties of an element. The following mdast:

{
  type: 'image',
  src: 'circle.svg',
  alt: 'Big red circle on a black background',
  data: {hProperties: {className: ['responsive']}}
}

…yields (hast):

{
  type: 'element',
  tagName: 'img',
  properties: {
    src: 'circle.svg',
    alt: 'Big red circle on a black background',
    className: ['responsive']
  },
  children: []
}
hChildren

node.data.hChildren sets the children of an element. The following mdast:

{
  type: 'code',
  lang: 'js',
  data: {
    hChildren: [
      {
        type: 'element',
        tagName: 'span',
        properties: {className: ['hljs-meta']},
        children: [{type: 'text', value: '"use strict"'}]
      },
      {type: 'text', value: ';'}
    ]
  },
  value: '"use strict";'
}

…yields (hast):

{
  type: 'element',
  tagName: 'pre',
  properties: {},
  children: [{
    type: 'element',
    tagName: 'code',
    properties: {className: ['language-js']},
    children: [
      {
        type: 'element',
        tagName: 'span',
        properties: {className: ['hljs-meta']},
        children: [{type: 'text', value: '"use strict"'}]
      },
      {type: 'text', value: ';'}
    ]
  }]
}

👉 Note: the pre and language-js class are normal mdast-util-to-hast functionality.

CSS

Assuming you know how to use (semantic) HTML and CSS, then it should generally be straightforward to style the HTML produced by this plugin. With CSS, you can get creative and style the results as you please.

Some semistandard features, notably GFMs tasklists and footnotes, generate HTML that be unintuitive, as it matches exactly what GitHub produces for their website. There is a project, sindresorhus/github-markdown-css, that exposes the stylesheet that GitHub uses for rendered markdown, which might either be inspirational for more complex features, or can be used as-is to exactly match how GitHub styles rendered markdown.

The following CSS is needed to make footnotes look a bit like GitHub:

/* Style the footnotes section. */
.footnotes {
  font-size: smaller;
  color: #8b949e;
  border-top: 1px solid #30363d;
}

/* Hide the section label for visual users. */
.sr-only {
  position: absolute;
  width: 1px;
  height: 1px;
  padding: 0;
  overflow: hidden;
  clip: rect(0, 0, 0, 0);
  word-wrap: normal;
  border: 0;
}

/* Place `[` and `]` around footnote calls. */
[data-footnote-ref]::before {
  content: '[';
}

[data-footnote-ref]::after {
  content: ']';
}

Syntax tree

The following interfaces are added to hast by this utility.

Nodes

Raw

interface Raw <: Literal {
  type: 'raw'
}

Raw (Literal) represents a string if raw HTML inside hast. Raw nodes are typically ignored but are handled by hast-util-to-html and hast-util-raw.

Types

This package is fully typed with TypeScript. It also exports Handler, Handlers, Options, Raw, and State types.

It also registers the Raw node type with @types/mdast. If you’re working with the syntax tree (and you pass allowDangerousHtml: true), make sure to import this utility somewhere in your types, as that registers the new node type in the tree.

/**
 * @typedef {import('mdast-util-to-hast')}
 */

import {visit} from 'unist-util-visit'

/** @type {import('hast').Root} */
const tree = { /* … */ }

visit(tree, (node) => {
  // `node` can now be `raw`.
})

Compatibility

Projects maintained by the unified collective are compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. Our projects sometimes work with older versions, but this is not guaranteed.

Security

Use of mdast-util-to-hast can open you up to a cross-site scripting (XSS) attack. Embedded hast properties (hName, hProperties, hChildren), custom handlers, and the allowDangerousHtml option all provide openings.

The following example shows how a script is injected where a benign code block is expected with embedded hast properties:

const code = {type: 'code', value: 'alert(1)'}

code.data = {hName: 'script'}

Yields:

<script>alert(1)</script>

The following example shows how an image is changed to fail loading and therefore run code in a browser.

const image = {type: 'image', url: 'existing.png'}

image.data = {hProperties: {src: 'missing', onError: 'alert(2)'}}

Yields:

<img src="missing" onerror="alert(2)">

The following example shows the default handling of embedded HTML:

# Hello

<script>alert(3)</script>

Yields:

<h1>Hello</h1>

Passing allowDangerousHtml: true to mdast-util-to-hast is typically still not enough to run unsafe code:

<h1>Hello</h1>
&#x3C;script>alert(3)&#x3C;/script>

If allowDangerousHtml: true is also given to hast-util-to-html (or rehype-stringify), the unsafe code runs:

<h1>Hello</h1>
<script>alert(3)</script>

Use hast-util-sanitize to make the hast tree safe.

Related

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT © Titus Wormer