1.1.8 • Published 3 years ago
@ocelotbot/tinyld v1.1.8
TinyLD

Tiny Language Detector, simply detect the language of a unicode UTF-8 text:
- pure javascript, no api call, and no dependency (node and browser compatible)
- alternative to libraries like CLD
- blazing fast and low memory footprint (unlike ML methods)
- support 62 languages (30 for the web version)
- format ISO 639-1
Extra
Getting Started
Install
yarn add tinyld # or npm install --save tinyldAPI
import { detect, detectAll } from 'tinyld'
// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en
// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]TinyLD CLI
tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]Benchmark
Benchmark done on tatoeba dataset (~9M sentences) on 16 of the most common languages.
| Library | Script | Properly Identified | Improperly identified | Not identified | Avg Execution Time | Disk Size |
|---|---|---|---|---|---|---|
| TinyLD | yarn bench:tinyld | 96.1747% | 2.6938% | 1.1315% | 0.1315ms. | 778KB |
| TinyLD Web | yarn bench:tinyld-light | 92.1169% | 3.9536% | 3.9295% | 0.0616ms. | 89KB |
| node-cld | yarn bench:cld | 88.9148% | 1.7489% | 9.3363% | 0.0612ms. | > 10MB |
| node-lingua | yarn bench:lingua | 82.3157% | 0.2158% | 17.4685% | 0.7085ms. | ~100MB |
| franc | yarn bench:franc | 68.7783% | 26.3432% | 4.8785% | 0.1381ms. | 267KB |
| franc-min | yarn bench:franc-min | 65.5163% | 23.5794% | 10.9044% | 0.0614ms. | 119KB |
| languagedetect | yarn bench:languagedetect | 61.6068% | 12.295% | 26.0982% | 0.1585ms. | 240KB |
Remark
- For each category, top3 results are in Bold
- Language evaluated in this benchmark:
- Asia:
jpn,cmn,kor,hin - Europe:
fra,spa,por,ita,nld,eng,deu,fin,rus - Middle east: ,
tur,heb,ara
- Asia:
- This kind of benchmark is not perfect and % can vary over time, but it gives a good idea of overall performances
Conclusion
Recommended
- For NodeJS:
TinyLDornode-cld(fast and accurate) - For Browser:
TinyLD Lightorfranc-min(small, decent accuracy, franc is less accurate but support more languages)
Not recommended
node-linguais just too big and slowlanguagedetectis light but just not accurate enough, really focused on indo-european languages (support kazakh but not chinese, korean or japanese)