0.2.0 • Published 10 years ago

word-ngrams v0.2.0

Weekly downloads
12
License
ISC
Repository
github
Last release
10 years ago

####Getting Started Install package with:

  npm install word-ngrams

####Features:

  • buildNGrams
  • listAllNGrams
  • getNGramsByFrequency
  • getMostCommonNGrams
  • listNGramsByCount

Documentation

  • buildNGrams: function(text, unit , options)
    • Maps all nGrams within input text with input unit length (1=unigram, 2=bigram, 3=trigram, ...)
    • In constructing the nGram, terminal sentence punctuation (such as periods, question marks, and exclamation marks) and semicolons are considered words, as they also carry meaning. Apostrophes and compound word hyphens are ignored. To signify the end of a paragraph or body of text, null will be used.
    • Options include caseSensitive and includePunctuation.
      • If includePunctuation is set to false, then terminal sentence punctuation and the end of the body of text are not included in the nGram.
      • Both caseSensitive and includePunctuation both default to false.
    • Example:
      buildNGrams(“Hello, World!  How’s the world weather today? Hello, World!”, 2, {caseSensitive: true, includePunctuation: true})
      // returns { Hello: { ,: 2 },
                   ,: { World: 2 },
                   World: { !: 2 },
                   !: { How’s: 1, null: 1},
                   How’s: { the: 1 },
                   the: { world: 1 },
                   world: { weather: 1 },
                   weather: { today: 1 },
                   today: { ?: 1 },
                   ?: { Hello: 1 }
                 }
  • listAllNGrams: function(nGrams)
    • Given an input set of nGrams (of the same format as the buildNGrams output), listAllNGrams will return a list of unique nGrams found in the text.
    • Example:
      // Example input nGram for “Hello World.  Goodbye World!”, without punctuation
      listAllNGrams({ Hello: { World: 1 }, Goodbye: { world: 1 }})
      // returns [“hello world”, “goodbye world”]
  • getNGramsByFrequency: function(nGrams, frequency)
    • Given an input set of nGrams (of the same format as the buildNGrams output), getNGramsByFrequency will return a list of all nGrams that occur that many times.
    • Example:
      // Example input nGram for “Hello World”, without punctuation
      getNGramsByFrequency({ hello: { world: 1 }, 1)
      // returns [ “hello world”]
  • getMostCommonNGrams: function(nGrams)
    • Given an input set of nGrams (of the same format as the buildNGrams output), getMostCommonNGrams will return a list of the most common nGrams.
    • Example:
      // Example input nGram for “Hello World!  Goodbye World!”, with punctuation
      getMostCommonNGrams({ Hello: { World: 1 }, World: { !: 2 }, !: { Goodbye: 1, null: 1 }, Goodbye: { world: 1 }})
      // returns [“World!”]
  • listNGramsByCount: function(nGrams)
    • Given an input set of nGrams (of the same format as the buildNGrams output), listNGramsByCount will return all nGrams sorted into buckets by count.
    • Example:
      // Example input for “Hello, World!  How’s the weather?  Goodbye, World!”
      listNGramsByCount({ hello: 1, world: 2, “how’s”: 1, the: 1, weather: 1, goodbye: 1})
      // returns { 1: [“hello”, “how’s”, “the”, “weather”, “goodbye”], 2: [“world”]}

View the full specs and check out more text analysis in my Text Analysis Suite.