unzalgo v3.0.0
unzalgo
Transforms ť͈̓̆h̏̔̐̑ì̭ͯ͞s̈́̄̑͋ into this without breaking internationalization.
Installation
$ npm install unzalgoAbout
You can use unzalgo to both detect Zalgo text and transform it back into normal text without breaking internationalization. For example, you could transform:
T͘H͈̩̬̺̩̭͇I͏̼̪͚̪͚S͇̬̺ ́E̬̬͈̮̻̕V҉̙I̧͖̜̹̩̞̱L͇͍̝ ̺̮̟̙̘͎U͝S̞̫̞͝E͚̘͝R IṊ͍̬͞P̫Ù̹̳̝͓̙̙T̜͕̺̺̳̘͝into
THIS EVIL USER INPUTwhile also keeping
thiŝ te̅xt unchanged, since some lângûaĝes aĉtuallŷ uŝe thêse sŷmbo̅ls,and, at the same time, keep all diacritics in
Z nich ovšem pouze předposlední sdílí s výše uvedenou větou příliš žluťoučký kůň úpěl […]which remains unchanged after a transformation.
Is there a demo?
Yes! You can check it out here. You can edit the text at the top; the lower part shows the text after clean using the default threshold.
How does it work?
In Unicode, every character is assigned to a character category. Zalgo text uses characters that belong to the categories Mn (Mark, Nonspacing) or Me (Mark, Enclosing).
First, the text is divided into words; each word is then assigned to a score that corresponds to the usage of the categories above, combined with small use of statistics. If the score exceeds a threshold, we're able to detect Zalgo text (which allows us to strip away all characters from the above categories).
Getting started
Regular cleaning
import { clean } from "unzalgo";
assert("this" === clean("ť͈̓̆h̏̔̐̑ì̭ͯ͞s̈́̄̑͋"));Configuring detection
import { clean } from "unzalgo";
/* Clean only if there are no "normal" characters in the word (t, h, i and s are "normal") */
assert("ť͈̓̆h̏̔̐̑ì̭ͯ͞s̈́̄̑͋" === clean("ť͈̓̆h̏̔̐̑ì̭ͯ͞s̈́̄̑͋", {
thresholds: {
detection: 1
}
}));/* Clean only if there is at least one combining character */
import { clean } from "unzalgo";
assert("francais" === clean("français", {
thresholds: {
detection: 0
}
}));import { clean } from "unzalgo";
/* `français` remains intact by default */
assert("français" === clean("français"));Internationalization
import { isZalgo } from "unzalgo";
/* "français" is not a Zalgo text, of course */
assert(isZalgo("français") === false);import { isZalgo } from "unzalgo";
/* Unless you define the Zalgo property as containing combining characters */
assert(isZalgo("français", 0) === true);import { isZalgo } from "unzalgo";
/* You can also define the Zalgo property as consisting of nothing but combining characters */
assert(isZalgo("français", 1) === false);Detection threshold
Some of this library's functions accept a detectionThreshold option that let you configure how sensitively unzalgo behaves. The number detectionThreshold is a number from 0 to 1 and defaults to 0.55.
A detection threshold of 0 indicates that a string should be classified as Zalgo text if at least 0 % of its codepoints have the Unicode category Mn or Me.
A detection threshold of 1 indicates that a string should be classified as Zalgo text if at least 100 % of its codepoints have the Unicode category Mn or Me.
Exports
clean(string, options): string default export
Removes all combining characters for every word in a string if the word is classified as Zalgo text.
If targetDensity is specified, not all the Zalgo characters will be removed. Instead, they will be thinned out uniformly.
Returns a cleaned, more readable string.
Arguments:
string: stringA string for which combining characters are removed for every word whose Zalgo property is met.options: objectAn object of options.options.detectionThreshold: number = 0.55A threshold ∈ 0, 1. The higher the threshold, the more combining characters are needed for it to be detected as Zalgo text.options.targetDensity: number = 0A threshold ∈ 0, 1. The higher the density, the more Zalgo characters will be part of the resulting string. The result is guaranteed to have a Zalgo-character density that is less than or equal to the one provided. A target density of0indicates that none of the combining characters should be part of the resulting string. A target density of1indicates that all combining characters should be part of the resulting string.
computeScores(string): number[]
Computes a score ∈ [0, 1] for every word in the input string. Each score represents the ratio of Zalgo characters to total characters in a word.
Returns An array of scores where each score describes the Zalgo ratio of a word.
Arguments:
string: stringThe input string for which to compute scores.
isZalgo(string, detectionThreshold = 0.55): boolean
Determines if the string consists of Zalgo text. Note that the occurrence of a combining character is not enough to trigger the detection. Instead, it computes a ratio for the input string and checks if it exceeds a given threshold. Thus, internationalized strings aren't automatically classified as Zalgo text.
Returns whether the string is a Zalgo text string.
Arguments:
string: stringA string for which a Zalgo text check is run.detectionThreshold: number = 0.55A threshold ∈ 0, 1. The higher the threshold, the more combining characters are needed for it to be detected as Zalgo text.