1.0.1 • Published 2 years ago

js-romanian-diacritics v1.0.1

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

js-romanian-diacritics

Functions for converting Romanian diacritics from a text (converting characters - standard to traditional or traditional to standard, remove diacritics marks)

Usage

Standard/official diacritics

The standard function takes a string as an argument and replaces all of its traditional/incorrect Romanian diacritics with their corresponding standard/official counterpart.

const diacritics = require("js-romanian-diacritics");
diacritics.standard("Aştept să vinã colegii de muncã.");
  // Output: Aștept să vină colegii de muncă.

Traditional/incorrect diacritics

The traditional function takes a string as an argument and replaces all of its standard/official Romanian diacritics with their corresponding traditional/incorrect counterpart.

const diacritics = require("js-romanian-diacritics");
diacritics.traditional("Aștept să vină colegii de muncă.");
  // Output: Aştept să vinã colegii de muncã.

Without diacritics

The without function takes a string as an argument and replaces all the Romanian diacritics (the official character set, as well as incorrect representations of them) with their accentless counterpart.

const diacritics = require("js-romanian-diacritics");
diacritics.without("Aştept să vină colegii de muncă.");
  // Output: Astept sa vina colegii de munca.

Character set

There are six Romanian-specific characters that are incorrectly implemented in all Microsoft Windows versions before Vista:

U+0218 Ș LATIN CAPITAL LETTER S WITH COMMA BELOW – incorrectly implemented as U+015E Ş LATIN CAPITAL LETTER S WITH CEDILLA U+0219 ș LATIN SMALL LETTER S WITH COMMA BELOW – incorrectly implemented as U+015F ş LATIN SMALL LETTER S WITH CEDILLA U+021A Ț LATIN CAPITAL LETTER T WITH COMMA BELOW – incorrectly implemented as U+0162 Ţ LATIN CAPITAL LETTER T WITH CEDILLA U+021B ț LATIN SMALL LETTER T WITH COMMA BELOW – incorrectly implemented as U+0163 ţ LATIN SMALL LETTER T WITH CEDILLA U+0102 Ă LATIN CAPITAL LETTER A WITH BREVE – incorrectly implemented as U+00E3 ã LATIN CAPITAL LETTER A WITH TILDE U+0103 ă LATIN SMALL LETTER A WITH BREVE – incorrectly implemented as U+00C3 Ã LATIN SMALL LETTER A WITH TILDE

These Unicode characters are the correct representations of Romanian diacritics:

CharacterUnicode valueUnicode name
ĂU+0102LATIN CAPITAL LETTER A WITH BREVE
ăU+0103LATIN SMALL LETTER A WITH BREVE
ÂU+00C2LATIN CAPITAL LETTER A WITH CIRCUMFLEX
âU+00E2LATIN SMALL LETTER A WITH CIRCUMFLEX
ÎU+00CELATIN CAPITAL LETTER S WITH CIRCUMFLEX
îU+00EELATIN SMALL LETTER S WITH CIRCUMFLEX
ȚU+021ALATIN CAPITAL LETTER T WITH COMMA BELOW
șU+0219LATIN SMALL LETTER S WITH COMMA BELOW
ȚU+021ALATIN CAPITAL LETTER T WITH COMMA BELOW
țU+021BLATIN SMALL LETTER T WITH COMMA BELOW

Inspired by:

This project is inspired and it is a refinement of js-ro-diacritics

https://github.com/esevo-tech/js-ro-diacritics

1.0.1

2 years ago

1.0.0

2 years ago