1.0.0-beta.1 • Published 5 years ago

nlpeezy v1.0.0-beta.1

Weekly downloads
4
License
MIT
Repository
github
Last release
5 years ago

nlpeezy

Build Status npm version

A natural language processing package written in Javascript.

by Aaron Caffrey

Table of Contents

Contributors

Thanks goes to the package contributors as found on the GitHub Contributors page.

Also, a special thanks goes to Michal Měchura, on whose Lemmatization Lists repository the lemmas feature of this package relies. Go raibh míle!

Dependencies

The following must be installed before nlpeezy can be used:

Getting Started

Install

As of the 1.0 beta, the recommendation is to install nlpeezy as a global dependency, like so:

npm install -g nlpeezy

Build

The build step is required to use data-dependent features (eg. lemmatization.)

nlpeezy-build

As of the 1.0 beta, the script's only function is to clone Michal Měchura's Lemmatization Lists repository. As new features are introduced (eg. the POS tagger), the build script will likely expand to gather other open source NLP data.

Load Datastore

In addition to the build step, data-dependent features require the datastore (Redis) to be prepared for use. As of the 1.0 beta, lemmatization is the only data-dependent feature, and is managed by the lemma-manager script.

$ lemma-manager --help
usage: lemma-manager [-h] [-v] [-a] [--datastore {redis}] [-l {en,es,fr,ga}]

Lemma manager CLI

Optional arguments:
  -h, --help            Show this help message and exit.
  -v, --version         Show program\'s version number and exit.
  -a, --all             Load all lemmatization list files into data store.
  --datastore {redis}   Supported data store to use. Default: "redis".
  -l {en,es,fr,ga}, --language {en,es,fr,ga}
                        Supported language code for which to load
                        lemmatization list file into data store.

Here is a simple example where the cache is loaded with the lemmas for Irish:

lemma-manager --l ga

Note this script takes a moment to complete.

First Run

This example demonstrates how to print an array of tokens parsed from a sample value in Irish.

const nlp = require('nlpeezy');

let value = 'Maraithe le tae agus maraithe gan é.';

nlp.analyze(value, {language: 'ga'}, (err, tokenGroups) => {
  if (err) {
    return console.error(err);
  }

  // `tokenGroups` is an Array where each element is a `TokenGroup` instance
  // that represents a line of text. The `children` of that element is an Array
  // of the tokens themselves.

  console.log(tokenGroups[0].children);
});

This logs something like:

[
  BeginLineToken {
    index: 0,
    info: {},
    value: undefined,
    lemma: null
  },
  LexicalToken {
    index: 0,
    info: { hasEclipsis: false, hasLenition: false, hasMutation: false },
    value: 'Maraithe',
    lemma: Lemma { value: 'marú' }
  },
  SpaceToken {
    index: 8,
    info: {},
    value: ' ',
    lemma: null
  },
  LexicalToken {
    index: 9,
    info: { hasEclipsis: false, hasLenition: false, hasMutation: false },
    value: 'le',
    lemma: Lemma { value: 'le' }
  },
  SpaceToken {
    index: 11,
    info: {},
    value: ' ',
    lemma: null
  },
  LexicalToken {
    index: 12,
    info: { hasEclipsis: false, hasLenition: false, hasMutation: false },
    value: 'tae',
    lemma: Lemma { value: 'tae' }
  },
  SpaceToken {
    index: 15,
    info: {},
    value: ' ',
    lemma: null
  },
  LexicalToken {
    index: 16,
    info: { hasEclipsis: false, hasLenition: false, hasMutation: false },
    value: 'is',
    lemma: Lemma { value: 'is' }
  },
  SpaceToken {
    index: 18,
    info: {},
    value: ' ',
    lemma: null
  },
  LexicalToken {
    index: 19,
    info: { hasEclipsis: false, hasLenition: false, hasMutation: false },
    value: 'maraithe',
    lemma: Lemma { value: 'marú' }
  },
  SpaceToken {
    index: 27,
    info: {},
    value: ' ',
    lemma: null
  },
  LexicalToken {
    index: 28,
    info: { hasEclipsis: false, hasLenition: false, hasMutation: false },
    value: 'gan',
    lemma: Lemma { value: 'gan' }
  },
  SpaceToken {
    index: 31,
    info: {},
    value: ' ',
    lemma: null
  },
  LexicalToken {
    index: 32,
    info: { hasEclipsis: false, hasLenition: false, hasMutation: false },
    value: 'é',
    lemma: Lemma { value: 'é' }
  },
  OrdinaryPunctuationToken {
    index: 33,
    info: {},
    value: '.',
    lemma: null
  }
]