1.0.8-3 • Published 2 months ago

string-to-unicode-variant2 v1.0.8-3

Weekly downloads
-
License
MIT
Repository
github
Last release
2 months ago

𝗎҉ toUnicodeVariant

Javascript function to convert a string into different kind of ⓤⓝⓘⓒⓞⓓⓔ variants.

toUnicodeVariant is an attempt to utilize unicode in a structured, organized and logical manner.

browser

<script src="path/to/toUnicodeVariant.js"></script>

nodejs

const toUnicodeVariant = require('path/to/toUnicodeVariant.js') 

Usage

Pass a string and the name of a variant (or alias), and you get the unicoded' string in return :

toUnicodeVariant(string, variant, combinings)
...
toUnicodeVariant('monospace', 'm') //like first row below 
VariantAliasDescriptionExample
monospacemMonospace𝚖𝚘𝚗𝚘𝚜𝚙𝚊𝚌𝚎
boldbBold text𝐛𝐨𝐥𝐝
italiciItalic text𝑖𝑡𝑎𝑙𝑖𝑐
bold italicbibold+italic text𝒃𝒐𝒍𝒅 𝒊𝒕𝒂𝒍𝒊𝒄
scriptcHandwriting style𝓈𝒸𝓇𝒾𝓅𝓉
bold scriptbcBolder handwriting𝓫𝓸𝓵𝓭 𝓼𝓬𝓻𝓲𝓹𝓽
gothicgGothic (fraktur)𝔤𝔬𝔱𝔥𝔦𝔠
gothic boldbgGothic in bold𝖌𝖔𝖙𝖍𝖎𝖈 𝖇𝖔𝖑𝖉
doublestruckdOutlined text𝕕𝕠𝕦𝕓𝕝𝕖𝕤𝕥𝕣𝕦𝕔𝕜
𝗌𝖺𝗇𝗌sSans-serif style𝗌𝖺𝗇𝗌
bold 𝗌𝖺𝗇𝗌bsBold sans-serif𝗯𝗼𝗹𝗱 𝘀𝗮𝗻𝘀
italic 𝗌𝖺𝗇𝗌isItalic sans-serif𝘪𝘵𝘢𝘭𝘪𝘤 𝘴𝘢𝘯𝘴
bold italic sansbisBold italic sans-serif𝙗𝙤𝙡𝙙 𝙞𝙩𝙖𝙡𝙞𝙘 𝙨𝙖𝙣𝙨
circledoLetters within circlesⓒⓘⓡⓒⓛⓔⓓ
circled negativeon-- negative🅒🅘🅡🅒🅛🅔🅓
squaredqLetters within squares🅂🅀🅄🄰🅁🄴🄳
squared negativeqn-- negative🆂🆀🆄🅰🆁🅴🅳
paranthesispLetters within paranthesis⒫⒜⒭⒠⒩⒯⒣⒠⒮⒤⒮
fullwidthwWider monospace fontfullwidth
flagsfRegional codes🇩🇰 🇺 🇳 🇮 🇨 🇴 🇩 🇪
numbers dotndNumbers with trailing dot⒈⒉⒊⒋
numbers commancNumbers with trailing comma🄂🄃🄄🄅
number double circledndcNumbers within double circle⓵⓶⓷⓸
romanrRoman numeralsⅠ, Ⅱ, ⅯⅯⅩⅩⅢ

Combining with underline, strike and other diacritical marks

The unicoded' text can be combined with a broad range of diacritical marks

toUnicodeVariant('underlined', 'bold italic', 'underline-double')//𝒖̳𝒏̳𝒅̳𝒆̳𝒓̳𝒍̳𝒊̳𝒏̳𝒆̳𝒅̳

You can control the space between each character by using space-combinings. In the above table, rendering of the halo- and enclose- samples are used along with a space-en to make them look nicer.

Combinings can be combined

You can use two, three or more combinings either by passing a comma separated string, or by passing an array of strings :

toUnicodeVariant('The quick brown fox jumps ...', 'sans', 'underline, overline, strike')
toUnicodeVariant('The quick brown fox jumps ...', 'sans', ['underline', 'overline', 'strike'])

𝖳̶̲̅𝗁̶̲̅𝖾̶̲̅ ̶̲̅𝗊̶̲̅𝗎̶̲̅𝗂̶̲̅𝖼̶̲̅𝗄̶̲̅ ̶̲̅𝖻̶̲̅𝗋̶̲̅𝗈̶̲̅𝗐̶̲̅𝗇̶̲̅ ̶̲̅𝖿̶̲̅𝗈̶̲̅𝗑̶̲̅ ̶̲̅𝗃̶̲̅𝗎̶̲̅𝗆̶̲̅𝗉̶̲̅𝗌̶̲̅ ̶̲̅𝗈̶̲̅𝗏̶̲̅𝖾̶̲̅𝗋̶̲̅ ̶̲̅𝗍̶̲̅𝗁̶̲̅𝖾̶̲̅ ̶̲̅𝗅̶̲̅𝖺̶̲̅𝗓̶̲̅𝗒̶̲̅ ̶̲̅𝖽̶̲̅𝗈̶̲̅𝗀̶̲̅

You can use shorthand aliases or a mix, 'u,o,s', ['u','o','strike'] etc.

Special chars

Language specific special chars like ç, ò or ø are not supported by any unicode "variant", and will almost certainly never be in any future. The script and gothic fonts are in fact just various kind of mathematical symbols (see references below). For many of the variants, converting a special char like ø will at best look odd, probably ruin the entire string (vary on reader / browser).

But -- by using the base latin character as fallback, and inject a makeover of diacritical marks, we can experimentally try to mimick some language specific characters. Adding diacritics fails with the figurative variants, but it works okay with most of the rest.

toUnicodeVariant('üničode', 'bold italic') //𝒖̈𝒏𝒊𝒄̌𝒐𝒅𝒆
toUnicodeVariant('ÜNIĈODE', 'bold italic') //𝑼𝑵𝑰𝑪𝑶𝑫𝑬

Additions, limitations

Besides the limitations you can see in the various compatibility tables above, some variants offers extra unique features - other variants are reduced to one single feature alone.

Ⅻ roman, continued

If you pass a number (integer) instead of a string, that number will be romanized automatically before converting to unicode

 toUnicodeVariant(2023, 'roman') //ⅯⅯⅩⅩⅢ

flags, f

az-AZ only. Based on the highly special regional indicator symbols (see references below, U1F100.pdf). Using that you'll need to pass a string with whitespace between each character (otherwise expect weird output, there is no fallback to monospace) :

toUnicodeVariant('U N I C O D E', 'f') //🇺 🇳 🇮 🇨 🇴 🇩 🇪

However, if you pass a string that contain a country code, or even the name of some international organization, many readers will render the corresponding flag instead :

toUnicodeVariant('DK EU UN', 'flags') //🇩🇰 🇪🇺 🇺🇳

Reset a unicoded' string

Use String.normalize()

See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

'𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍𝖎𝖏𝖐𝖑𝖒𝖓𝖔𝖕𝖖𝖗𝖘𝖙𝖚𝖛𝖜𝖝𝖞𝖟'.normalize('NFKC') //or NFKD

returns abcdefghijklmnopqrstuvwxyz

Test

Browser: test/browser.html Node: test$ node node.js

These tests show all variants and their coverage az-AZ-09, along with flag combinations For reference, in Chrome (Ubuntu 20.04, 112.x) variants looks like this :

-- Or you can review a sample output, test/result-sample.html.txt. Try it out in different browsers - there are significant difference in coverage.

References

https://www.unicode.org/charts/PDF/UFF00.pdf https://www.unicode.org/charts/PDF/U1F100.pdf https://www.unicode.org/charts/PDF/U1D400.pdf https://www.unicode.org/charts/PDF/U2150.pdf https://www.unicode.org/charts/PDF/U2460.pdf https://www.unicode.org/charts//PDF/Unicode-3.2/U32-2000.pdf https://www.unicode.org/charts//PDF/Unicode-4.0/U40-0300.pdf

Playground

For now, visit https://detfrieord.dk/tekst-til-unicode (in danish, sorry)