1.1.0 • Published 5 months ago

drunicode v1.1.0

Weekly downloads
-
License
-
Repository
github
Last release
5 months ago

DrUnicode

DrUnicode is a heuristic utility for detecting and diagnosing common string corruption, encoding issues, and alterations. It helps developers identify and validate string integrity, ensuring that data is correctly encoded and displayed without unintended alterations. By using a set of integrity checkers, it allows for the detection of common encoding errors and anomalies across various languages.

It is envisioned for situations where user engagement is declining due to string-related issues, but even pinpointing the cause is challenging. It is intended to be used in production environments to provide real-time diagnostics when a problem is suspected, enabling appropriate logging or responsive actions.

Features

  • Detects double UTF-8 encoding anomalies across multiple languages including Spanish, French, Russian, Hebrew, Arabic, Japanese, Korean, and Chinese.
  • Detection of unexpected invisible characters that should not be present in the text.

Future Features

  • Detection of invalid bidirectional characters used (Bidi).
  • Identification of common confusables used.

Installation

You can install DrUnicode via npm or yarn.

npm install drunicode

or

yarn add drunicode

Usage

Once installed, you can use DrUnicode to analyze strings for encoding issues.

Basic Example

import { DrUnicode } from 'drunicode';

const drUnicode = new DrUnicode();

const result = drUnicode.analyze("Let's go now!");
console.log(result); // Outputs: 'valid'

Detect Double UTF-8 Anomalies Example

const drUnicode = new DrUnicode();

const result = drUnicode.analyze("¡Vámonos ahora mismo!");
console.log(result); // Outputs: 'invalid'
drUnicode.analyze("Давай прџмо сейчас!", (invalidString, message) => {
  console.log('Message:', message); // Outputs: 'Double UTF-8 encoding corruption detected of Russian'
});

Analyzing DOM for Invalid Strings

DrUnicode can also analyze the full content of a webpage by checking for invalid strings within the DOM.

const drUnicode = new DrUnicode();

drUnicode.analyzeDom((invalidString, nodeLocation, message) => {
  console.log('Invalid String:', invalidString);
  console.log('Node Location:', nodeLocation);
  console.log('Message:', message);
});

Tests

The project includes a suite of tests to ensure correctness. You can run the tests with:

npm run test
1.1.0

5 months ago

1.0.2

6 months ago

1.0.1

6 months ago

1.0.0

7 months ago