1.0.2 • Published 1 year ago
supernormalize v1.0.2
supernormalize
supernormalize
is a JavaScript library that agressively normalizes text to a standard form. Use cases include:
- Mitigating homoglyph attacks
- Normalizing text for comparison
- Preparation for indexing text in a search engine
- Preparation for blacklisting text
Steps
The library performs the following steps:
- Remove all marks (i.e. diacritics) and perform compatibility normalization
- Convert the text to lowercase
- Normalize homoglyphs using a mapping based on this list from the Unicode Consortium in version 15.1.0 (the used list does not include homoglyphs that are already normalized in steps 1 and 2)
- Replace all whitespace characters with a single space and trim the text
Installation
npm install supernormalize
Usage
import { supernormalize } from "supernormalize";
const text = "⋿╳⍺rñ⍴lé";
const normalizedText = supernormalize(text);
console.log(normalizedText); // 'examp1e'
Examples
Input | Output | Note |
---|---|---|
⋿╳⍺rñ⍴lé | examp1e | Below rules can be combined |
𝕋𝕙𝕚𝕤 𝕚𝕤 𝕒 𝕥𝕖𝕤𝕥! | th1s 1s a test! | Homoglyphs are normalized to a common form |
D̴̝̼̅i̴̱̐͊́a̵̢͎͒͝ĉ̵͓̈́̽r̶͂͝ͅi̷͔͜͝ṭ̴͋͆͘i̵͔̅c̷̛͉̪͂͊s̵̞̝̲͊ | d1acr1t1cs | Diacritics are removed |
AАΑ | aaa | Latin, Cyrillic, and Greek characters are normalized to the same form |
rn | m | Multiletter homoglyphs are normalized |
ffi… | ff1... | Ligatures are normalized to letters |
\tHELLO WORLD \n | he110 w0r1d | Whitespace and casing is normalized |
Functions
supernormalize(text: string): string
Normalizes the given text performing the steps described above.
supernormalize.normalizeCase(text: string): string
Converts the given text to lowercase.
supernormalize.normalizeMarks(text: string): string
Removes all marks (i.e. diacritics) and performs compatibility normalization on the given text.
supernormalize.normalizeHomoglyphs(text: string): string
Normalizes homoglyphs using a mapping based on this list from the Unicode Consortium.
supernormalize.normalizeWhitespace(text: string): string
Replaces all whitespace characters with a single space. Trim the text.
License
This project is licensed under the MIT License - see the LICENSE file for details.