text-analyzer
Analyze text: count characters, words, sentences, paragraphs, and reading time.
Installation
# npm
npm install text-analyzer
# pnpm
pnpm add text-analyzer
# bun
bun add text-analyzer
Usage
import {
countCharacters,
countLines,
countParagraphs,
countSentences,
countSequenceOccurrences,
countWords,
getAverageWordLength,
getReadingTime,
getWordFrequency,
} from "text-analyzer"
API
countCharacters(text, options?)
Count the number of characters in a text.
options.unit:"grapheme"(default) counts user-perceived characters (e.g. the emoji"π¨βπ©βπ§"counts as 1)."code-unit"counts UTF-16 code units, matchingString.prototype.length.options.locale: BCP 47 locale tag passed toIntl.Segmenter. Only used whenunitis"grapheme".options.normalize: whentrue, normalize the text to NFC before counting. Defaults tofalse.
countCharacters("text") // 4
countCharacters("π¨βπ©βπ§") // 1
countCharacters("π¨βπ©βπ§", { unit: "code-unit" }) // 8
countWords(text)
Count the number of words in a text. Words are separated by any whitespace.
Note: punctuation stays attached, so "hello, world" counts as 2 words
("hello," and "world"). For a linguistic word count, use
getWordFrequency.
countWords("one two three") // 3
countWords(" one\ttwo\r\nthree ") // 3
countLines(text)
Count the number of lines in a text. Handles \n, \r\n, and \r. A
trailing line terminator does not add an extra empty line.
countLines("one\ntwo\nthree") // 3
countLines("one\n") // 1
countSentences(text, options?)
Count the number of sentences using Intl.Segmenter, so decimals and
abbreviations don't accidentally split a sentence.
options.locale: BCP 47 locale tag passed toIntl.Segmenter.
countSentences("Hello. World!") // 2
countSentences("The value is 3.14. Done.") // 2
countParagraphs(text)
Count the number of paragraphs. Paragraphs are separated by one or more blank lines.
countParagraphs("one\n\ntwo\n\n\nthree") // 3
countSequenceOccurrences(text, sequence, options?)
Count the number of times a sequence occurs in a text.
options.caseSensitive: defaults totrue.options.overlapping: whentrue, overlapping matches are counted (e.g."aa"matches 3 times in"aaaa"). Defaults tofalse.options.locale: BCP 47 locale tag used for case folding (only relevant whencaseSensitiveisfalse).options.normalize: whentrue, normalize bothtextandsequenceto NFC before searching. Defaults tofalse.
countSequenceOccurrences("dolor Dolor dolor", "dolor") // 2
countSequenceOccurrences("dolor Dolor dolor", "dolor", { caseSensitive: false }) // 3
countSequenceOccurrences("aaaa", "aa") // 2
countSequenceOccurrences("aaaa", "aa", { overlapping: true }) // 3
getWordFrequency(text, options?)
Count how many times each word occurs in a text. Words are detected with
Intl.Segmenter, so punctuation is excluded and contractions are kept as one
word. Returns a Map<string, number> sorted by count in descending order.
options.caseSensitive: defaults totrue. Passfalsefor typical natural-language frequency analysis where"The"and"the"should be treated as the same word.options.locale: BCP 47 locale tag passed toIntl.Segmenterand used for case folding.
getWordFrequency("The cat sat on the mat.")
// Map { "The" => 1, "cat" => 1, "sat" => 1, "on" => 1, "the" => 1, "mat" => 1 }
getWordFrequency("The cat sat on the mat.", { caseSensitive: false })
// Map { "the" => 2, "cat" => 1, "sat" => 1, "on" => 1, "mat" => 1 }
getAverageWordLength(text, options?)
Compute the average length of words in a text. Returns 0 when the text
contains no words. Word splitting is whitespace-based, matching countWords.
options.unit: passed tocountCharacters("grapheme"by default).options.locale: passed tocountCharacters.
getAverageWordLength("aa bbb cccc") // 3
getReadingTime(text, options?)
Estimate the reading time for a text.
options.wordsPerMinute: reading speed. Must be greater than0. Defaults to200.
getReadingTime("one two three")
// { words: 3, minutes: 0.015, milliseconds: 900 }
getReadingTime("one two three", { wordsPerMinute: 100 })
// { words: 3, minutes: 0.03, milliseconds: 1800 }
License
MIT