french-to-sms v1.1.8
french-to-sms
Experimental project that converts French sentences to French sms style sentences in JavaScript.
It should lead to smaller sentences that are still readable, even though some vocabulary may be known by younger people only 😄
Installation
npm install french-to-sms
Usage
const frenchToSms = require('french-to-sms');
frenchToSms("coucou");
// => "cc"
frenchToSms("Bonjour tout le monde ! J'espère que vous allez bien ! Moi la patate !");
// => "bjr tt lmond ! jspr k vs allé b1 ! mwa la patate !"
frenchToSms("S'il vous plaît, pouvez-vous faire moins de bruit ? Merci.");
// => "svp, pouvé vs fR - 2 brui ? marci."
Demo
You can test the algorithm out on this demo page.
Algorithm
The algorithm behind this project is based upon a custom-made glossary.
It performs one by one the characters replacements defined in the glossary.
Glossary
The glossary in its current state should enable a good quantity of french words and sentences to be shortened rather correctly. It was built from scratch by kind of reverse engineering the SMS French language and how it can be constructed.
Replacements
The glossary is divided in five distinct replacement categories:
anywhere
: replacements contained in this category will be performed anywhere within the input text (Useful for general rules, eg: double consonants is often uselessapprends
=>aprends
)endOfWords
: replacements contained in this category will be performed only at the end of words (Useful for general rules at the end of words, eg: thee
in words ending withe
is often silent so we can get rid of it;pomme
=>pomm
)startOfWords
: replacements contained in this category will be performed only at the start of words (Useful for general rules at the start of words, eg: theh
is often silent so we can get rid of it;haricot
=>aricot
)wholeWords
: replacements contained in this category will be performed only if they exactly match a whole word (Useful for words that need a specific conversion that does not follow general rules, eg:monsieur
=>mr
)endOfWordsFollowedByASpace
: replacements contained in this category will be performed only at the end of words that are followed by a space (Useful to replace the space as well, eg:je
can often be contracted with what follows it;je suis
=>jsuis
)
Actions
The glossary supports three types of actions:
replace
: to replace some characters by some other charactersdisable_modification
: to prevent some characters from being replacedenable_modification
: to re-allow some characters to be replaced
Disable/enable modification
By default, the whole text input is subject to replacements. Though, some characters can be protected from replacements for a given time.
For instance, we may want to replace every occurence of si
by 6
as it is a good sms equivalent (sinon
would become 6non
, aussi
would become au6
).
But some sounds like sin
often sound like zin
so replacing si
by 6
would be wrongly interpreted (usine
would become u6ne
).
So we may want to disable replacements on sin
while we replace all si
occurrences by 6
, then re-enable further replacements on sin
.
Contributing
If for some reason you would like to enhance the glossary, feel free to do a pull request containing your modifications within the glossary as well as tests fixtures covering what you enhanced.