1.1.8 • Published 2 years ago

@ocelotbot/tinyld v1.1.8

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

TinyLD

npm npm CDN Download License

logo

Tiny Language Detector, simply detect the language of a unicode UTF-8 text:

  • pure javascript, no api call, and no dependency (node and browser compatible)
  • alternative to libraries like CLD
  • blazing fast and low memory footprint (unlike ML methods)
  • support 62 languages (30 for the web version)
  • format ISO 639-1

Extra


Getting Started

Install

yarn add tinyld # or npm install --save tinyld

API

import { detect, detectAll } from 'tinyld'

// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en

// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]

More Information


TinyLD CLI

tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]

More Information


Benchmark

Benchmark done on tatoeba dataset (~9M sentences) on 16 of the most common languages.

LibraryScriptProperly IdentifiedImproperly identifiedNot identifiedAvg Execution TimeDisk Size
TinyLDyarn bench:tinyld96.1747%2.6938%1.1315%0.1315ms.778KB
TinyLD Webyarn bench:tinyld-light92.1169%3.9536%3.9295%0.0616ms.89KB
node-cldyarn bench:cld88.9148%1.7489%9.3363%0.0612ms.> 10MB
node-linguayarn bench:lingua82.3157%0.2158%17.4685%0.7085ms.~100MB
francyarn bench:franc68.7783%26.3432%4.8785%0.1381ms.267KB
franc-minyarn bench:franc-min65.5163%23.5794%10.9044%0.0614ms.119KB
languagedetectyarn bench:languagedetect61.6068%12.295%26.0982%0.1585ms.240KB

Remark

  • For each category, top3 results are in Bold
  • Language evaluated in this benchmark:
    • Asia: jpn, cmn, kor, hin
    • Europe: fra, spa, por, ita, nld, eng, deu, fin, rus
    • Middle east: , tur, heb, ara
  • This kind of benchmark is not perfect and % can vary over time, but it gives a good idea of overall performances

Conclusion

Recommended

  • For NodeJS: TinyLD or node-cld (fast and accurate)
  • For Browser: TinyLD Light or franc-min (small, decent accuracy, franc is less accurate but support more languages)

Not recommended

  • node-lingua is just too big and slow
  • languagedetect is light but just not accurate enough, really focused on indo-european languages (support kazakh but not chinese, korean or japanese)