Word-ngrams NPM

####Getting Started Install package with:

  npm install word-ngrams

####Features:

buildNGrams
listAllNGrams
getNGramsByFrequency
getMostCommonNGrams
listNGramsByCount

Documentation

buildNGrams: function(text, unit , options)
- Maps all nGrams within input text with input unit length (1=unigram, 2=bigram, 3=trigram, ...)
- In constructing the nGram, terminal sentence punctuation (such as periods, question marks, and exclamation marks) and semicolons are considered words, as they also carry meaning. Apostrophes and compound word hyphens are ignored. To signify the end of a paragraph or body of text, null will be used.
- Options include caseSensitive and includePunctuation.
  - If includePunctuation is set to false, then terminal sentence punctuation and the end of the body of text are not included in the nGram.
  - Both caseSensitive and includePunctuation both default to false.
- Example:
```
  buildNGrams(“Hello, World!  How’s the world weather today? Hello, World!”, 2, {caseSensitive: true, includePunctuation: true})
  // returns { Hello: { ,: 2 },
               ,: { World: 2 },
               World: { !: 2 },
               !: { How’s: 1, null: 1},
               How’s: { the: 1 },
               the: { world: 1 },
               world: { weather: 1 },
               weather: { today: 1 },
               today: { ?: 1 },
               ?: { Hello: 1 }
             }
```

listAllNGrams: function(nGrams)

Given an input set of nGrams (of the same format as the buildNGrams output), listAllNGrams will return a list of unique nGrams found in the text.
Example:

  // Example input nGram for “Hello World.  Goodbye World!”, without punctuation
  listAllNGrams({ Hello: { World: 1 }, Goodbye: { world: 1 }})
  // returns [“hello world”, “goodbye world”]

getNGramsByFrequency: function(nGrams, frequency)
- Given an input set of nGrams (of the same format as the buildNGrams output), getNGramsByFrequency will return a list of all nGrams that occur that many times.
- Example:
```
  // Example input nGram for “Hello World”, without punctuation
  getNGramsByFrequency({ hello: { world: 1 }, 1)
  // returns [ “hello world”]
```

getMostCommonNGrams: function(nGrams)

Given an input set of nGrams (of the same format as the buildNGrams output), getMostCommonNGrams will return a list of the most common nGrams.
Example:

  // Example input nGram for “Hello World!  Goodbye World!”, with punctuation
  getMostCommonNGrams({ Hello: { World: 1 }, World: { !: 2 }, !: { Goodbye: 1, null: 1 }, Goodbye: { world: 1 }})
  // returns [“World!”]

listNGramsByCount: function(nGrams)

Given an input set of nGrams (of the same format as the buildNGrams output), listNGramsByCount will return all nGrams sorted into buckets by count.
Example:

  // Example input for “Hello, World!  How’s the weather?  Goodbye, World!”
  listNGramsByCount({ hello: 1, world: 2, “how’s”: 1, the: 1, weather: 1, goodbye: 1})
  // returns { 1: [“hello”, “how’s”, “the”, “weather”, “goodbye”], 2: [“world”]}

View the full specs and check out more text analysis in my Text Analysis Suite.

nGrams ngrams n grams text analysis

@everything-registry/sub-chunk-3160 document-tfidf scribe-plugin-words-bigrams-sentences scribe-plugin-ngrams

12 years ago

12 years ago

12 years ago

12 years ago