utf32char v1.4.1
UTF32Char
A minimalist, dependency-free implementation of immutable 4-byte-width (UTF-32) characters for easy manipulation of characters and glyphs, including simple emoji.
Also includes an immutable unsigned 4-byte-width integer data type, UInt32 and easy conversions from and to UTF32Char.
Motivation
If you want to allow a single "character" of input, but consider emoji to be single characters, you'll have some difficulty using basic JavaScript strings, which use UTF-16 encoding by default. While ASCII characters all have length-1...
console.log("?".length) // 1...many emoji have length > 1
console.log("ðĐ".length) // 2...and with modifiers and accents, that number can get much larger
console.log("!ĖŋĖÍĨÍĨĖÍĢĖĖĖÍÍÍĖŽĖ°ĖĖ".length) // 17As all Unicode characters can be expressed with a fixed-length UTF-32 encoding, this package mitigates the problem a bit, though it doesn't completely solve it. Note that I do not claim to have solved this issue, and this package accepts any group of one to four bytes as a "single UTF-32 character", whether or not they are rendered as a single grapheme. See this package if you want to split text into graphemes, regardless of the number of bytes required to render each grapheme.
If you just want a simple, dependency-free API to deal with 4-byte strings, then this package is for you.
This package provides an implementation of 4-byte, UTF-32 "characters" UTF32Char and corresponding unsigned integers UInt32. The unsigned integers have an added benefit of being usable as safe array indices.
Installation
Install from npm with
$ npm i utf32char
Or try it online at npm.runkit.com
var lib = require("utf32char")
let char = new lib.UTF32Char("ðŪ")Use
Create new UTF32Chars and UInt32s like so
let index: UInt32 = new UInt32(42)
let char: UTF32Char = new UTF32Char("ðŪ")You can convert to basic JavaScript types
console.log(index.toNumber()) // 42
console.log(char.toString()) // ðŪEasily convert between characters and integers
let indexAsChar: UTF32Char = index.toUTF32Char()
let charAsUInt: UInt32 = char.toUInt32()
console.log(indexAsChar.toString()) // *
console.log(charAsUInt.toNumber()) // 3627933230...or skip the middleman and convert integers directly to strings, or strings directly to integers:
console.log(index.toString()) // *
console.log(char.toNumber()) // 3627933230Edge Cases
UInt32 and UTF32Char ranges are enforced upon object creation, so you never have to worry about bounds checking:
let tooLow: UInt32 = UInt32.fromNumber(-1)
// range error: UInt32 has MIN_VALUE 0, received -1
let tooHigh: UInt32 = UInt32.fromNumber(2**32)
// range error: UInt32 has MAX_VALUE 4294967295 (2^32 - 1), received 4294967296
let tooShort: UTF32Char = UTF32Char.fromString("")
// invalid argument: cannot convert empty string to UTF32Char
let tooLong: UTF32Char = UTF32Char.fromString("hey!")
// invalid argument: lossy compression of length-3+ string to UTF32CharBecause the implementation accepts any 4-byte string as a "character", the following are allowed
let char: UTF32Char = UTF32Char.fromString("hi")
let num: number = char.toNumber()
console.log(num) // 6815849
console.log(char.toString()) // hi
console.log(UTF32Char.fromNumber(num).toString()) // hiFloating-point values are truncated to integers when creating UInt32s, like in many other languages:
let pi: UInt32 = UInt32.fromNumber(3.141592654)
console.log(pi.toNumber()) // 3
let squeeze: UInt32 = UInt32.fromNumber(UInt32.MAX_VALUE + 0.9)
console.log(squeeze.toNumber()) // 4294967295Compound emoji -- created using variation selectors and joiners -- are often larger than 4 bytes wide and will therefore throw errors when used to construct UTF32Chars:
let smooch: UTF32Char = UTF32Char.fromString("ðĐââĪïļâðâðĐ")
// invalid argument: lossy compression of length-3+ string to UTF32Char
console.log("ðĐââĪïļâðâðĐ".length) // 11...but many basic emoji are fine:
// emojiTest.ts
let emoji: Array<string> = [ "ð", "ð", "ðĨš", "ðĪĢ", "âĪïļ", "âĻ", "ð", "ð", "ð", "ðĨ°", "ð", "ð", "ðĪ", "ðĐââĪïļâðâðĐ" ]
for (const e of emoji) {
try {
UTF32Char.fromString(e)
console.log(`â
: ${e}`)
} catch (_) {
console.log(`â: ${e}`)
}
}$ npx ts-node emojiTest.ts
â
: ð
â
: ð
â
: ðĨš
â
: ðĪĢ
â
: âĪïļ
â
: âĻ
â
: ð
â
: ð
â
: ð
â
: ðĨ°
â
: ð
â
: ð
â
: ðĪ
â: ðĐââĪïļâðâðĐArithmetic, Comparison, and Immutability
UInt32 provides basic arithmetic and comparison operators
let increased: UInt32 = index.plus(19)
console.log(increased.toNumber()) // 61
let comp: boolean = increased.greaterThan(index)
console.log(comp) // trueVerbose versions and shortened aliases of comparison functions are available
ltandlessThangtandgreaterThanleandlessThanOrEqualTogeandgreaterThanOrEqualTo
Since UInt32s are immutable, plus() and minus() return new objects, which are of course bounds-checked upon creation:
let whoops: UInt32 = increased.minus(100)
// range error: UInt32 has MIN_VALUE 0, received -39Contact
Feel free to open an issue with any bug fixes or a PR with any performance improvements.
Support me @ Ko-fi!
Check out my DEV.to blog!