1.0.0 • Published 5 months ago

@radically-straightforward/html v1.0.0

Weekly downloads
-
License
MIT
Repository
github
Last release
5 months ago

Radically Straightforward · HTML

📄 HTML in Tagged Templates

Installation

$ npm install @radically-straightforward/html

Note: We recommend the following tools:

Prettier: A code formatter that supports HTML in tagged templates.

Prettier - Code formatter: A Visual Studio Code extension to use Prettier more ergonomically.

es6-string-html: A Visual Studio Code extension to syntax highlight HTML in tagged templates.

Note: This tool is primarily designed for rendering HTML on the server with Node.js, but it also works in the browser.

Usage

import html, { HTML } from "@radically-straightforward/html";
import * as htmlHelpers from "@radically-straightforward/html";

HTML

export type HTML = string;

A type alias to make your type annotations more specific.

html()

export default function html(
  templateStrings: TemplateStringsArray,
  ...substitutions: (string | string[])[]
): HTML;

A tagged template for HTML:

html`<p>Leandro Facchinetti</p>`;

Sanitizes interpolations to prevent injection attacks:

html`<p>${"Leandro Facchinetti"}</p>`;
// => `<p>Leandro Facchinetti</p>`
html`<p>${`<script>alert(1);</script>`}</p>`;
// => `<p>&lt;script&gt;alert(1);&lt;/script&gt;</p>`

Note: Sanitization is only part of the defense against injection attacks. Also deploy the following measures:

  • Serve your pages with UTF-8 encoding.
  • Have your server send the header Content-Type: text/html; charset=utf-8.
  • If you want to be extra sure that the encoding will be picked up by the browser, include a <meta charset="utf-8" /> meta tag. (But HTML 5 documents must be encoded in UTF-8, so it should be sufficient to declare your document as HTML 5 by starting it with <!DOCTYPE html>.)
  • Always use quotes around HTML attributes (for example, href="https://leafac.com" instead of href=https://leafac.com).
  • See https://wonko.com/post/html-escaping/.

Note: This library works by concatenating strings. It doesn’t prettify the output (if you need that you may, for example, call Prettier programmatically on the output of html`___`), and it doesn’t generate any kind of virtual DOM. The virtues of this approach are that this library is conceptually simple and it is one order of magnitude faster than ReactDOMServer.renderToStaticMarkup() (performance matters because rendering may be one of the most time-consuming tasks in responding to a request).

Opt out of sanitization with $${___} instead of ${___}:

html`<div>$${`<p>Leandro Facchinetti</p>`}</div>`;
// => `<div><p>Leandro Facchinetti</p></div>`

Note: Only opt out of sanitization if you are sure that the interpolated string is safe, in particular it must not contain user input, otherwise you’d be open to injection attacks:

html`<div>$${`<script>alert(1);</script>`}</div>`;
// => `<div><script>alert(1);</script></div>`

Note: You must opt out of sanitization when the interpolated string is itself the result of html`___`, otherwise the escaping would be doubled:

html`
  <div>
    Good (escape once): $${html`<p>${`<script>alert(1);</script>`}</p>`}
  </div>
`;
// =>
// `
//   <div>
//     Good (escape once): <p>&lt;script&gt;alert(1);&lt;/script&gt;</p>
//   </div>
// `

html`
  <div>
    Bad (double escaping): ${html`<p>${`<script>alert(1);</script>`}</p>`}
  </div>
`;
// =>
// `
//   <div>
//     Bad (double escaping): &lt;p&gt;&amp;lt;script&amp;gt;alert(1);&amp;lt;/script&amp;gt;&lt;/p&gt;
//   </div>
// `

Note: As an edge case, if you need a literal $ before an interpolation, interpolate the $ itself:

html`<p>${"$"}${"Leandro Facchinetti"}</p>`;
// => `<p>$Leandro Facchinetti</p>`

Interpolated lists are joined:

html`<p>${["Leandro", " ", "Facchinetti"]}</p>`;
// => `<p>Leandro Facchinetti</p>`

Note: Interpolated lists are sanitized:

html`
  <p>${["Leandro", " ", "<script>alert(1);</script>", " ", "Facchinetti"]}</p>
`;
// =>
// `
//   <p>Leandro &lt;script&gt;alert(1);&lt;/script&gt; Facchinetti</p>
// `

You may opt out of the sanitization of interpolated lists by using $${___} instead of ${___}:

html`
  <ul>
    $${[html`<li>Leandro</li>`, html`<li>Facchinetti</li>`]}
  </ul>
`;
// =>
// `
//   <ul>
//     <li>Leandro</li><li>Facchinetti</li>
//   </ul>
// `

sanitize()

export function sanitize(
  text: string,
  replacement: string = sanitize.replacement,
): string;

Sanitize text for safe insertion in HTML.

sanitize() escapes characters that are meaningful in HTML syntax and replaces invalid XML characters with a string of your choosing—by default, an empty string (""). You may provide the replacement as a parameter or set a new default by overwriting sanitize.replacement. For example, to use the Unicode replacement character:

sanitize.replacement = "�";

Note: The html`___` tagged template already calls sanitize(), so you must not call sanitize() yourself or the sanitization would happen twice.

Note: The sanitization that we refer to here is at the character level, not cleaning up certain tags while preserving others. For that, we recommend rehype-sanitize.

Note: Even this sanitization isn’t enough in certain contexts, for example, HTML attributes without quotes <a href=${sanitize(___)}> could still lead to XSS attacks.

escape()

export function escape(text: string): string;

Escape characters that are meaningful in HTML syntax.

What sets this implementation apart from existing ones are the following:

  • Performance.

    The performance of the escape() function matters because it’s used a lot to escape user input when rendering HTML with the html`___` tagged template.

    The following are some details on how this implementation is made faster:

    • The relatively new string function .replaceAll() when used with a string parameter is faster than .replace() with a global regular expression.

    • Perhaps surprisingly, calling .replaceAll() multiple times is faster than using a single regular expression of the kind /[&<>"']/g.

    • And even if we were to use a single regular expression, using switch/case would have been faster than the lookup tables that most other implementations use.

    • And also if we were to use regular expressions, using the flag v incurs on a very small but consistent performance penalty.

    • And also if we were to use regular expressions, .replace() is marginally but consistently faster than .replaceAll().

    • Measurements performed in Node.js 21.2.0.

  • Supports modern browsers only.

    Most other implementations treat characters such as `, which could cause problems in Internet Explorer 8 and older.

    Some other implementations avoid transforming ' into the entity &apos;, because that entity isn’t understood by some versions of Internet Explorer.

References

invalidXMLCharacters

export const invalidXMLCharacters: RegExp;

A regular expression that matches invalid XML characters.

Use this to remove or replace invalid XML characters, or simply to detect that a string doesn’t contain them. This is particularly useful when generating XML based on user input.

This list is based on Extensible Markup Language (XML) 1.1 (Second Edition), § 2.2 Characters (https://www.w3.org/TR/xml11/#charsets). In particular, it includes:

  1. \u{0}, which is always invalid in XML.
  2. The gaps between the allowed ranges in the production rule for [2] Char from the “Character Range” grammar.
  3. The discouraged characters from the Note in that section of the document.

Notably, it does not include the “"compatibility characters", as defined in Unicode” mentioned in that section of the document, because that list was difficult to find and doesn’t seem to be very important.

Example

someUserInput.replace(invalidXMLCharacters, ""); // Remove invalid XML characters.
someUserInput.replace(invalidXMLCharacters, "�"); // Replace invalid XML characters with the Unicode replacement character.
someUserInput.match(invalidXMLCharacters); // Detect whether there are invalid XML characters.

References

Related Work

html-template-tag

  • Was a major inspiration for this. Its design is simple and great. In particular, I love (and stole) the idea of using $${...} to mark unsafe interpolation of trusted HTML.
  • Doesn’t encode arrays by default.

common-tags

  • Doesn’t encode interpolated values by default.
  • Uses the safeHtml tag, which isn’t recognized by Prettier or the Visual Studio Code extension es6-string-html extension.

escape-html-template-tag

  • Less ergonomic API with escapeHtml.safe() and escapeHtml.join() instead of the $${} trick.

lit-html, nanohtml, htm, and viperhtml

  • Have the notion of virtual DOM instead of simple string concatenation.