1.0.0 • Published 5 years ago

normalize-html-whitespace v1.0.0

Weekly downloads
1,019,089
License
MIT
Repository
github
Last release
5 years ago

normalize-html-whitespace NPM Version Build Status

Safely remove repeating whitespace from HTML text.

Using \s to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:

  • \u00a0 or   (non-breaking space)
  • \ufeff or  (zero-width non-breaking space)

…as well as these lesser-known ones:

  • \u1680​ or   (Ogham space mark)
  • \u180e or ᠎ (Mongolian vowel separator)
  • \u2000​ or   (en quad)
  • \u2001 or   (em quad)
  • \u2002 or   (en space)
  • \u2003 or   (em space)
  • \u2004 or   (three-per-em space)
  • \u2005 or   (four-per-em space)
  • \u2006 or   (six-per-em space)
  • \u2007 or   (figure space)
  • \u2008 or   (punctuation space)
  • \u2009 or   (thin space)
  • \u200a or   (hair space)
  • \u2028 or 
 (line separator)
  • \u2029 or 
 (paragraph separator)
  • \u202f or   (narrow non-breaking space)
  • \u205f or   (medium mathematical space)
  • \u3000 or   (ideographic space)

For the sake of completeness, the following characters which are not part of \s will also not be affected:

  • \u200b or ​ (zero-width breaking space)

Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.

Installation

Node.js >= 8 is required. Type this at the command line:

npm install normalize-html-whitespace

Usage

const normalizeWhitespace = require('normalize-html-whitespace');

normalizeWhitespace('  foo bar     baz ');
//-> ' foo bar baz '