Tdm-teeft NPM | npm.io

tdm-teeft

tdm-teeft is a tdm module for terme exctraction of unstructured text. It can be used to get keywords of document.

Installation

Using npm :

$ npm i -g tdm-teeft
$ npm i --save tdm-teeft

Using Node :

/* require of Teeft module */
const Teeft = require('tdm-teeft');

/* Build new Instance of Tagger */
let tagger = new Teeft.Tagger();

/* Build new Instance of Filter */
let filter = new Teeft.Filter();

/* Build new Instance of Indexator */
let indexator = new Teeft.Indexator();

/* Build new Instance of TermExtraction */
let termextraction = new Teeft.TermExtraction();

Launch tests

$ npm run test

Build documentation

$ npm run docs

API Documentation

Classes

Filter

Kind: global class

Filter
- new Filter([options])
- .call(occur, strength) ⇒ Boolean
- .configure(length) ⇒ Number

new Filter(options)

Returns: Filter - - An instance of Filter

Param	Type	Description
options	Object	Options of constructor
options.minOccur	Number	Number of minimal occurence
options.noLimitStrength	Number	Strength limit
options.lengthSteps	Number	Steps length

Example (Example usage of 'contructor' (with paramters))

let options = {
  // Will allow to assign a 'value' depending on the length of indexed text (nb of tokens)
  'lengthSteps': {
    'values': [ // store intermediate steps here,
      { // here : value '4' will be used for text length > 1000 tokens && text length <= 3000 tokens
        'lim': 3000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 4
      },
      { // here : value '5' will be used for text length > 3000 tokens && text length <= 4000 tokens
        'lim': 4000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 5
      }
    ],
    'min': { // 'value' depending of minimum 'lim' length of text (here : value '1' will be used for text length <= 1000 tokens)
      'lim': 1000,
      'value': 1
    },
    'max': { // 'value' depending of maximum 'lim' length of text (here : value '7' will be used for text length > 6000 tokens)
      'lim': 6000,
      'value': 7
    }
  },
  'minOccur': 3, // Minimal number of occurence (of tokens) used by default : here 3. This value will be updated depending on the length of indexed text when 'configure' function is called
  'noLimitStrength': 2 //
  },
  defaultFilter = new Filter(options);
// returns an instance of Filter with properties :
// - minOccur : 3
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}, {'lim': 4000, 'value': 5}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

Example (Example usage of 'contructor' (with default values))

let defaultFilter = new Filter();
// returns an instance of Filter with properties :
// - minOccur : 7
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

filter.call(occur, strength) ⇒ Boolean

Check values depending of filter conditions

Kind: instance method of Filter
Returns: Boolean - Return true if conditions are respected

Param	Type	Description
occur	Number	Occurence value
strength	Number	Strength value

Example (Example usage of 'call' function)

let defaultFilter = new Filter();
defaultFilter.configure(500);
defaultFilter.call(1, 1); // returns true
defaultFilter.configure(5000);
defaultFilter.call(1, 1); // returns false

filter.configure(length) ⇒ Number

Configure the filter depending of lengthSteps

Kind: instance method of Filter
Returns: Number - Return configured minOccur value

Param	Type	Description
length	Number	Text length

Example (Example usage of 'configure' function)

let defaultFilter = new Filter();
defaultFilter.configure(500); // returns 1
defaultFilter.configure(5000); // returns 7
defaultFilter.configure('test'); // returns null

Indexator

Kind: global class

Indexator
- new Indexator([options])
- instance
  - .tokenize(text) ⇒ Array
  - .translateTag(tag) ⇒ String
  - .sanitize(terms) ⇒ Array
  - .lemmatize(terms) ⇒ Array
  - .index(data) ⇒ Object
- static
  - .compare(a, b) ⇒ Number

new Indexator(options)

Returns: Indexator - - An instance of Indexator

Param	Type	Description
options	Object	Options of constructor
options.filter	Filter	Options given to extractor of this instance of Indexator
options.lexicon	Object	Lexicon used by tagger of this instance of Indexator
options.stopwords	Object	Stopwords used by this instance of Indexator
options.lemmatizer	Object	Lemmatizer used by tagger of this instance of Indexator
options.stemmer	Object	Stemmer used by this instance of Indexator
options.dictionary	Object	Dictionnary used by this instance of Indexator

Example (Example usage of 'contructor' (with paramters))

let options = {
    'filter': customFilter // According customFilter contain your custom settings
  },
  indexator = new Indexator(options);
// returns an instance of Indexator with custom Filter

Example (Example usage of 'contructor' (with default values))

let indexator = new Indexator();
// returns an instance of Indexator with default options

indexator.tokenize(text) ⇒ Array

Extract token from a text

Kind: instance method of Indexator
Returns: Array - Array of tokens

Param	Type	Description
text	String	Fulltext

Example (Example usage of 'tokenize' function)

let indexator = new Indexator();
indexator.tokenize('my sample sentence'); // return ['my', 'sample', 'sentence']

indexator.translateTag(tag) ⇒ String

Translate the tag of Tagger to Lemmatizer

Kind: instance method of Indexator
Returns: String - Tag who match with a Lemmatizer tag (or false)

Param	Type	Description
tag	String	Tag given by Tagger

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.translateTag(RB); // return 'adv';
indexator.translateTag(JJ); // return 'adj';
indexator.translateTag(NN); // return 'noun';
indexator.translateTag(NNP); // return 'noun';
indexator.translateTag(VBG); // return 'verb';
indexator.translateTag(VBN); // return 'verb';

indexator.sanitize(terms) ⇒ Array

Sanitize list of terms (with some filter)

Kind: instance method of Indexator
Returns: Array - Liste of sanitized terms

Param	Type	Description
terms	Array	List of terms

Example (Example usage of 'sanitize' function)

let indexator = new Indexator();
indexator.sanitize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.lemmatize(terms) ⇒ Array

Lemmatize a list of tagged terms (add a property lemma & stem)

Kind: instance method of Indexator
Returns: Array - List of tagged terms with a lemma

Param	Type	Description
terms	Array	List of tagged terms

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.lemmatize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.index(data) ⇒ Object

Index a fulltext

Kind: instance method of Indexator
Returns: Object - Return a representation of fulltext (indexation & more informations/statistics about tokens/terms)

Param	Type	Description
data	String	Fulltext who need to be indexed

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.index('This is a sample sentence'); // return an object representation of indexation

Indexator.compare(a, b) ⇒ Number

Compare the specificity of two objects between them

Kind: static method of Indexator
Returns: Number - -1, 1, or 0

Param	Type	Description
a	Object	First object
b	Object	Second object

Example (Example usage of 'compare' function)

Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 2 }); // return 1
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 1 }); // return 0
Indexator.compare({ 'term': 'a', 'specificity': 2 }, { 'term': 'b', 'specificity': 1 }); // return -1

Tagger

Kind: global class

Tagger
- new Tagger([options])
- .tag(terms) ⇒ Array

new Tagger(options)

Returns: Tagger - - An instance of Tagger

Param	Type	Description
options	Object	Options of constructor

Example (Example usage of 'contructor' (with paramters))

let lexicon = { ... },
  tagger = new Tagger(options);
// returns an instance of Tagger with custom lexion

Example (Example usage of 'contructor' (with default values))

let tagger = new Tagger();
// returns an instance of Tagger with default lexion

tagger.tag(terms) ⇒ Array

Tag terms

Kind: instance method of Tagger
Returns: Array - List of tagged terms

Param	Type	Description
terms	Array	List of terms

Example (Example usage of 'tag' function)

let tagger = new Tagger();
tagger.tag(['this', 'is', 'a', 'test']); // return [{ 'term': 'this', 'tag': 'DT' }, { 'term': 'is', 'tag': 'VBZ' }, { 'term': 'a', 'tag': 'DT' }, { 'term': 'test', 'tag': 'NN' }]

TermExtractor

Kind: global class

TermExtractor
- new TermExtractor([options])
- .extract(taggedTerms) ⇒ Object
- ._startsWith(str, prefix) ⇒ Boolean

new TermExtractor(options)

Returns: TermExtractor - - An instance of TermExtractor

Param	Type	Description
options	Object	Options of constructor
options.tagger	Tagger	An instance of Tagger
options.filter	Filter	An instance of Filter

Example (Example usage of 'contructor' (with paramters))

let myTagger = new Tagger(), // According myTagger contain your custom settings
  myFilter = new Filter(), // According myFilter contain your custom settings
  termExtractor = new TermExtractor({ 'tagger': myTagger, 'filter': myFilter });
// returns an instance of TermExtractor with custom options

Example (Example usage of 'contructor' (with default values))

let termExtractor = new TermExtractor();
// returns an instance of TermExtractor with default options

termExtractor.extract(taggedTerms) ⇒ Object

Extract temrs

Kind: instance method of TermExtractor
Returns: Object - Return all extracted terms

Param	Type	Description
taggedTerms	Array	List of tagged terms

Example (Example usage of 'extract' function)

let termExtractor = new TermExtractor(),
  myDefaultTagger = new Tagger(),
  taggedTerms = myDefaultTagger.tag('This is a sample test for this module. It index any fulltext. It is a sample test.');
termExtractor.extract(taggedTerms);
// return
// { 'sample': { 'frequency': 2, 'strength': 1 }, 'test': { 'frequency': 2, 'strength': 1 },
// 'sample test': { 'frequency': 2, 'strength': 2 },
// 'module': { 'frequency': 1, 'strength': 1 },
// 'index': { 'frequency': 1, 'strength': 1 },
// 'fulltext': { 'frequency': 1, 'strength': 1 }
// };

termExtractor._startsWith(str, prefix) ⇒ Boolean

Check if prefix of given string match with given prefix

Kind: instance method of TermExtractor
Returns: Boolean - Return true if the prefix of the string is correct, else false

Param	Type	Description
str	String	String where the prefix will be searched
prefix	String	Prefix used for the research

async auto-tu javascript-lemmatizer lodash mocha snowball-stemmers tdm-utils

@everything-registry/sub-chunk-2883 tdm-skeeft @infinitebrahmanuniverse/nolb-td

4 years ago

4 years ago

4 years ago

5 years ago

5 years ago

5 years ago