1.0.2 • Published 1 year ago

supernormalize v1.0.2

Weekly downloads
-
License
MIT
Repository
github
Last release
1 year ago

supernormalize

supernormalize is a JavaScript library that agressively normalizes text to a standard form. Use cases include:

  • Mitigating homoglyph attacks
  • Normalizing text for comparison
  • Preparation for indexing text in a search engine
  • Preparation for blacklisting text

Steps

The library performs the following steps:

  1. Remove all marks (i.e. diacritics) and perform compatibility normalization
  2. Convert the text to lowercase
  3. Normalize homoglyphs using a mapping based on this list from the Unicode Consortium in version 15.1.0 (the used list does not include homoglyphs that are already normalized in steps 1 and 2)
  4. Replace all whitespace characters with a single space and trim the text

Installation

npm install supernormalize

Usage

import { supernormalize } from "supernormalize";

const text = "⋿╳⍺rñ⍴lé";
const normalizedText = supernormalize(text);
console.log(normalizedText); // 'examp1e'

Examples

InputOutputNote
⋿╳⍺rñ⍴léexamp1eBelow rules can be combined
𝕋𝕙𝕚𝕤 𝕚𝕤 𝕒 𝕥𝕖𝕤𝕥!th1s 1s a test!Homoglyphs are normalized to a common form
D̴̝̼̅i̴̱̐͊́a̵̢͎͒͝ĉ̵͓̈́̽r̶͂͝ͅi̷͔͜͝ṭ̴͋͆͘i̵͔̅c̷̛͉̪͂͊s̵̞̝̲͊d1acr1t1csDiacritics are removed
AАΑaaaLatin, Cyrillic, and Greek characters are normalized to the same form
rnmMultiletter homoglyphs are normalized
ffi…ff1...Ligatures are normalized to letters
\tHELLO WORLD \nhe110 w0r1dWhitespace and casing is normalized

Functions

supernormalize(text: string): string

Normalizes the given text performing the steps described above.

supernormalize.normalizeCase(text: string): string

Converts the given text to lowercase.

supernormalize.normalizeMarks(text: string): string

Removes all marks (i.e. diacritics) and performs compatibility normalization on the given text.

supernormalize.normalizeHomoglyphs(text: string): string

Normalizes homoglyphs using a mapping based on this list from the Unicode Consortium.

supernormalize.normalizeWhitespace(text: string): string

Replaces all whitespace characters with a single space. Trim the text.

License

This project is licensed under the MIT License - see the LICENSE file for details.

1.0.2

1 year ago

1.0.1

1 year ago

1.0.0

1 year ago