0.0.8 • Published 11 years ago
word-freq v0.0.8
word-freq
Calculates the word frequency of a text document, by tokenising or tokenising and stemming the string.
Version
0.0.7Converts all text to lowercase.0.0.6Messed upnpmversioning.0.0.5Moved stemmer into its own module. Removed direct dependency on tokeniser.0.0.4Moved tokeniser into its own module.0.0.3Added stop words removal feature.0.0.2Improved, added testing.0.0.1Release.
Usage
Frequency (wf.freq(text, noStopWords, shouldStem))
Returns an object containing the frequency of terms in the text provided.
textis the string (text document) in which the calculations are to be performed on.noStopWordsdefaults totrue. Set tofalseif you want to include stop words–e.g words such as "I" and "the".shouldStemdefaults totrue. Set tofalseif you want words not to be stemmed.
var str = "@waltercfilho tweeted about houses: housing is the most expensive thing ever f#!*";
var frequency = wf.freq(str); // shouldStem -> `true`
>> {
"waltercfilho" : 1,
"tweet" : 1,
"hous" : 2,
"expens" : 1
}Tokenising (wf.tokenise(text, noStopWords))
Simply returns an array of terms, without punctuation.
textis the string (text document) in which the calculations are to be performed on.noStopWordsdefaults totrue. Set tofalseif you want to include stop words–e.g words such as "I" and "the".
var wf = require('word-freq');
var str = "you're simply a test, a mere test";
var tokenised = wf.tokenise(str);
>> ['simply', 'test', 'mere', 'test']Stemming (wf.stem(text, noStopWords))
Returns an array of terms, stemmed and without punctuation.
textis the string (text document) in which the calculations are to be performed on.noStopWordsdefaults totrue. Set tofalseif you want to include stop words–e.g words such as "I" and "the".
Note: This is basically a wrapper around the stem-porter library by kastor.
var wf = require('word-freq');
var str = "you're simply a simplistic house, made for housing";
var tokenised = wf.stem(str);
>> ["simpli", "simplist", "hous", "hous"]