0.1.1 • Published 4 years ago

lingthing v0.1.1

Weekly downloads
2
License
MIT
Repository
github
Last release
4 years ago

⚡lingthing⚡

A library for n-gram-based character-level language modeling in JavaScript, intended for use in the browser.

A json file containing counts of n-grams in some training corpus can be created using the script scripts/count_grams.py (or you can use my example based on the Lancaster-Oslo/Bergen corpus, in scripts/LOB_ngrams.json).

The resulting json data can then be used, along with the lingthing.log_prob function, to estimate the (log) probability of a string (with Laplace smoothing applied, and maybe other smoothing options in the future if we're lucky).

Installation:

npm install lingthing

Usage Examples:

In Node:

const lt = require('lingthing');
const fs = require('fs');

let counts = JSON.parse(fs.readFileSync('scripts/LOB_ngrams.json'));

test_sentence = "Test sentence."
info = lt.corpus_info(counts)
log_probability = lt.log_prob(test_sentence,counts,"laplace",
    info.n,info.d,info.N);

console.log("Probability of sentence '" + test_sentence + "' is " 
    + Math.exp(log_probability));

In the browser:

<script src="lingthing-browser-0.0.1.js"></script>
<script src="ngrams.js"></script> <!-- var counts = { 
        ... data generated by scripts/count_grams.py ...};
        ... or you could load the json data by e.g. XMLHttpRequest -->
<script type="text/javascript">
    // Note: importing the browser script is equivalent to:
    // var lingthing = require('lingthing');

    test_sentence = "Test sentence."
    info = lingthing.corpus_info(counts)
    log_probability = lingthing.log_prob(test_sentence,counts,"laplace",
        info.n,info.d,info.N);

    console.log("Probability of sentence '" + test_sentence + "' is " 
        + Math.exp(log_probability));
</script>

Build:

To build the browser-friendly distribution, run npm install to install dev-dependencies, and then run npm run-script browser.

The bundled file will appear in the dist directory.