2.1.2 • Published 4 years ago

tdm-teeft v2.1.2

Weekly downloads
2
License
OpenBSD
Repository
github
Last release
4 years ago

tdm-teeft

tdm-teeft is a tdm module for terme exctraction of unstructured text. It can be used to get keywords of document.

Installation

Using npm :

$ npm i -g tdm-teeft
$ npm i --save tdm-teeft

Using Node :

/* require of Teeft module */
const Teeft = require('tdm-teeft');

/* Build new Instance of Tagger */
let tagger = new Teeft.Tagger();

/* Build new Instance of Filter */
let filter = new Teeft.Filter();

/* Build new Instance of Indexator */
let indexator = new Teeft.Indexator();

/* Build new Instance of TermExtraction */
let termextraction = new Teeft.TermExtraction();

Launch tests

$ npm run test

Build documentation

$ npm run docs

API Documentation

Classes

Filter

Kind: global class

new Filter(options)

Returns: Filter - - An instance of Filter

ParamTypeDescription
optionsObjectOptions of constructor
options.minOccurNumberNumber of minimal occurence
options.noLimitStrengthNumberStrength limit
options.lengthStepsNumberSteps length

Example (Example usage of 'contructor' (with paramters))

let options = {
  // Will allow to assign a 'value' depending on the length of indexed text (nb of tokens)
  'lengthSteps': {
    'values': [ // store intermediate steps here,
      { // here : value '4' will be used for text length > 1000 tokens && text length <= 3000 tokens
        'lim': 3000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 4
      },
      { // here : value '5' will be used for text length > 3000 tokens && text length <= 4000 tokens
        'lim': 4000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 5
      }
    ],
    'min': { // 'value' depending of minimum 'lim' length of text (here : value '1' will be used for text length <= 1000 tokens)
      'lim': 1000,
      'value': 1
    },
    'max': { // 'value' depending of maximum 'lim' length of text (here : value '7' will be used for text length > 6000 tokens)
      'lim': 6000,
      'value': 7
    }
  },
  'minOccur': 3, // Minimal number of occurence (of tokens) used by default : here 3. This value will be updated depending on the length of indexed text when 'configure' function is called
  'noLimitStrength': 2 //
  },
  defaultFilter = new Filter(options);
// returns an instance of Filter with properties :
// - minOccur : 3
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}, {'lim': 4000, 'value': 5}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

Example (Example usage of 'contructor' (with default values))

let defaultFilter = new Filter();
// returns an instance of Filter with properties :
// - minOccur : 7
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

filter.call(occur, strength) ⇒ Boolean

Check values depending of filter conditions

Kind: instance method of Filter
Returns: Boolean - Return true if conditions are respected

ParamTypeDescription
occurNumberOccurence value
strengthNumberStrength value

Example (Example usage of 'call' function)

let defaultFilter = new Filter();
defaultFilter.configure(500);
defaultFilter.call(1, 1); // returns true
defaultFilter.configure(5000);
defaultFilter.call(1, 1); // returns false

filter.configure(length) ⇒ Number

Configure the filter depending of lengthSteps

Kind: instance method of Filter
Returns: Number - Return configured minOccur value

ParamTypeDescription
lengthNumberText length

Example (Example usage of 'configure' function)

let defaultFilter = new Filter();
defaultFilter.configure(500); // returns 1
defaultFilter.configure(5000); // returns 7
defaultFilter.configure('test'); // returns null

Indexator

Kind: global class

new Indexator(options)

Returns: Indexator - - An instance of Indexator

ParamTypeDescription
optionsObjectOptions of constructor
options.filterFilterOptions given to extractor of this instance of Indexator
options.lexiconObjectLexicon used by tagger of this instance of Indexator
options.stopwordsObjectStopwords used by this instance of Indexator
options.lemmatizerObjectLemmatizer used by tagger of this instance of Indexator
options.stemmerObjectStemmer used by this instance of Indexator
options.dictionaryObjectDictionnary used by this instance of Indexator

Example (Example usage of 'contructor' (with paramters))

let options = {
    'filter': customFilter // According customFilter contain your custom settings
  },
  indexator = new Indexator(options);
// returns an instance of Indexator with custom Filter

Example (Example usage of 'contructor' (with default values))

let indexator = new Indexator();
// returns an instance of Indexator with default options

indexator.tokenize(text) ⇒ Array

Extract token from a text

Kind: instance method of Indexator
Returns: Array - Array of tokens

ParamTypeDescription
textStringFulltext

Example (Example usage of 'tokenize' function)

let indexator = new Indexator();
indexator.tokenize('my sample sentence'); // return ['my', 'sample', 'sentence']

indexator.translateTag(tag) ⇒ String

Translate the tag of Tagger to Lemmatizer

Kind: instance method of Indexator
Returns: String - Tag who match with a Lemmatizer tag (or false)

ParamTypeDescription
tagStringTag given by Tagger

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.translateTag(RB); // return 'adv';
indexator.translateTag(JJ); // return 'adj';
indexator.translateTag(NN); // return 'noun';
indexator.translateTag(NNP); // return 'noun';
indexator.translateTag(VBG); // return 'verb';
indexator.translateTag(VBN); // return 'verb';

indexator.sanitize(terms) ⇒ Array

Sanitize list of terms (with some filter)

Kind: instance method of Indexator
Returns: Array - Liste of sanitized terms

ParamTypeDescription
termsArrayList of terms

Example (Example usage of 'sanitize' function)

let indexator = new Indexator();
indexator.sanitize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.lemmatize(terms) ⇒ Array

Lemmatize a list of tagged terms (add a property lemma & stem)

Kind: instance method of Indexator
Returns: Array - List of tagged terms with a lemma

ParamTypeDescription
termsArrayList of tagged terms

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.lemmatize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.index(data) ⇒ Object

Index a fulltext

Kind: instance method of Indexator
Returns: Object - Return a representation of fulltext (indexation & more informations/statistics about tokens/terms)

ParamTypeDescription
dataStringFulltext who need to be indexed

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.index('This is a sample sentence'); // return an object representation of indexation

Indexator.compare(a, b) ⇒ Number

Compare the specificity of two objects between them

Kind: static method of Indexator
Returns: Number - -1, 1, or 0

ParamTypeDescription
aObjectFirst object
bObjectSecond object

Example (Example usage of 'compare' function)

Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 2 }); // return 1
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 1 }); // return 0
Indexator.compare({ 'term': 'a', 'specificity': 2 }, { 'term': 'b', 'specificity': 1 }); // return -1

Tagger

Kind: global class

new Tagger(options)

Returns: Tagger - - An instance of Tagger

ParamTypeDescription
optionsObjectOptions of constructor

Example (Example usage of 'contructor' (with paramters))

let lexicon = { ... },
  tagger = new Tagger(options);
// returns an instance of Tagger with custom lexion

Example (Example usage of 'contructor' (with default values))

let tagger = new Tagger();
// returns an instance of Tagger with default lexion

tagger.tag(terms) ⇒ Array

Tag terms

Kind: instance method of Tagger
Returns: Array - List of tagged terms

ParamTypeDescription
termsArrayList of terms

Example (Example usage of 'tag' function)

let tagger = new Tagger();
tagger.tag(['this', 'is', 'a', 'test']); // return [{ 'term': 'this', 'tag': 'DT' }, { 'term': 'is', 'tag': 'VBZ' }, { 'term': 'a', 'tag': 'DT' }, { 'term': 'test', 'tag': 'NN' }]

TermExtractor

Kind: global class

new TermExtractor(options)

Returns: TermExtractor - - An instance of TermExtractor

ParamTypeDescription
optionsObjectOptions of constructor
options.taggerTaggerAn instance of Tagger
options.filterFilterAn instance of Filter

Example (Example usage of 'contructor' (with paramters))

let myTagger = new Tagger(), // According myTagger contain your custom settings
  myFilter = new Filter(), // According myFilter contain your custom settings
  termExtractor = new TermExtractor({ 'tagger': myTagger, 'filter': myFilter });
// returns an instance of TermExtractor with custom options

Example (Example usage of 'contructor' (with default values))

let termExtractor = new TermExtractor();
// returns an instance of TermExtractor with default options

termExtractor.extract(taggedTerms) ⇒ Object

Extract temrs

Kind: instance method of TermExtractor
Returns: Object - Return all extracted terms

ParamTypeDescription
taggedTermsArrayList of tagged terms

Example (Example usage of 'extract' function)

let termExtractor = new TermExtractor(),
  myDefaultTagger = new Tagger(),
  taggedTerms = myDefaultTagger.tag('This is a sample test for this module. It index any fulltext. It is a sample test.');
termExtractor.extract(taggedTerms);
// return
// { 'sample': { 'frequency': 2, 'strength': 1 }, 'test': { 'frequency': 2, 'strength': 1 },
// 'sample test': { 'frequency': 2, 'strength': 2 },
// 'module': { 'frequency': 1, 'strength': 1 },
// 'index': { 'frequency': 1, 'strength': 1 },
// 'fulltext': { 'frequency': 1, 'strength': 1 }
// };

termExtractor._startsWith(str, prefix) ⇒ Boolean

Check if prefix of given string match with given prefix

Kind: instance method of TermExtractor
Returns: Boolean - Return true if the prefix of the string is correct, else false

ParamTypeDescription
strStringString where the prefix will be searched
prefixStringPrefix used for the research