1.0.0 • Published 7 years ago
normalize-html-whitespace v1.0.0
normalize-html-whitespace

Safely remove repeating whitespace from HTML text.
Using \s to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:
\u00a0or (non-breaking space)\ufeffor(zero-width non-breaking space)
…as well as these lesser-known ones:
\u1680 or (Ogham space mark)\u180eor᠎(Mongolian vowel separator)\u2000or (en quad)\u2001or (em quad)\u2002or (en space)\u2003or (em space)\u2004or (three-per-em space)\u2005or (four-per-em space)\u2006or (six-per-em space)\u2007or (figure space)\u2008or (punctuation space)\u2009or (thin space)\u200aor (hair space)\u2028or
(line separator)\u2029or
(paragraph separator)\u202for (narrow non-breaking space)\u205for (medium mathematical space)\u3000or (ideographic space)
For the sake of completeness, the following characters which are not part of \s will also not be affected:
\u200bor​(zero-width breaking space)
Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.
Installation
Node.js >= 8 is required. Type this at the command line:
npm install normalize-html-whitespaceUsage
const normalizeWhitespace = require('normalize-html-whitespace');
normalizeWhitespace(' foo bar baz ');
//-> ' foo bar baz '