out-of-character v1.2.2
Unicode has a few-dozen characters that do not render anything, on purpose.
This is cool for cultural idiosyncracies in historical languages. More often though, their use is unintentional (or nefarious!), and these characters end-up causing problems parsing text formats.
• these are sometimes called 'zero-width', 'ignorable', or 'tag-characters' •
This library helps spot and remove these funboys, before they cause some trouble.
Please remember that some text is meant to have Khmer-vowels, or Kaithi-alphabet characters.
CLI
detect invisible characters in all files in a directory
out-of-character ./path/to/dirremove them from all files in a directory
out-of-character ./path/to/dir --replacedetect invisible characters in a file
out-of-character ./path/to/file.txtremove invisible characters from a file
out-of-character ./path/to/file.txt --replaceJavascript API
import {detect, replace} from 'out-of-character'
let str='nothing s͏neak឵y here' //actually, there is.
console.log(detect(str))
/* 😮 😮 😮
[
{
name: 'KHMER VOWEL INHERENT AA',
code: 'U+17B5',
offset: 15,
replacement: ''
},
{
name: 'MONGOLIAN VOWEL SEPARATOR',
code: 'U+180E',
offset: 19,
replacement: ''
}
]*/
// get rid of them!
let after = replace(str)
console.log(str !== after)
// truefixing/detecting in files can be done like:
const fs = require('fs')
const {detect, replace} = require('out-of-character')
let text = fs.readFileSync('./some-file.txt').toString()
console.log(detect(text))
// yikes.
// ok, fix it
fs.writeFileSync('./some-file.txt', replace(text))
// ok, double-check it.
let goodNow = fs.readFileSync('./some-file.txt').toString()
console.log(detect(goodNow))
// fhew.Thank you to character.construction/blanks by Jan Lelis
and a tale of characters in Unicode by Stefan Judis
See also
- printable-characters - by Vit Gordon
- unzalgo - by kdex
MIT