0.3.1 • Published 12 months ago

t13n v0.3.1

Weekly downloads
-
License
MIT
Repository
-
Last release
12 months ago

T13N: Transliteration Made Right

Welcome the the T13N - a transliteration library for Cyrilic languages written for Javascript applications.

What is it for?

Transliteration can be made for different purposes, such as:

  • Friendly URLs for Content Management Systems that publish content using Cyrilic languages;
  • Longer SMS;
  • Passport names;
  • Text interpretations for languages that has a dedicated alternative latin alphabets etc.

Status

IN DEVELOPMENT / PHASE 0 PREVIEW

Implementation Checklist

Implementation is split on so-called "phases" for better prioritization.

Phase Zero: Belarusian-To-Latin (BGN/PCGN 1979)

  • Define a basic set of rules for each letter;
  • Define a set of flags calculated for each letter for better context;g
  • Define alternative variations for some letters that require it (like 'г');
  • Support the most basic in-between-words separators (dash, underscore) for URL creation support and resolve "similar" symbols ("’" into "'");
  • Ignore already available latin symbols and digits;
  • Extend configurations via settings;
  • Pack everything as v0.1

Phase One: Other Belarusian-To-Latin variations ("Latinka", ICAO, ISO 9)

  • Switch to Typescript;
  • Schematize a language JSON;
  • Reorganize code to support other variations of one language;
  • Add Belarusian Latin alphabet ("Łacinka");
  • Add ICAO standard;
  • Add ISO 9 standard;

Phase Two: Ukrainian-To-Latin

  • Reorganize code to support multiple languages;
  • Add Ukrainian alphabet and transliteration rules;

Phase Three: Russian-To-Latin

  • Add Russian alphabet and transliteration rules.

(Other languages to be supported later on)

Ruleset & Dictionary

Every transformation rule is explicit and described in a so-called Ruleset It's a compilation of rule that explains transliteration behavior of the script. It may be compact and descriptive at the same time, depending on needs.

A result of Ruleset compilation is a Dictionary, that's used for pre-processing analysis and later transliteration.

There are three types of Rules which can possibly be used:

Rule TypeDescription
LDescribing a rule for a letter that should be altered on a Latin manner
SEvery special symbol that should be kept as-is or transformed / corrected
RThere are some common sets of characters (like latin letters or digits) that described one after one and should be labeled in the same way
0.3.1

12 months ago

0.3.0

12 months ago

0.2.1

12 months ago

0.2.0

12 months ago

0.1.2

1 year ago

0.1.1

1 year ago

0.1.0

1 year ago

0.0.9

1 year ago

0.0.5

1 year ago

0.0.3

1 year ago