segmenter v2.0.1
segmenter
Work with grapheme, words, and sentences with small, simple, and fast API using
Intl.Segmenter
Install
npm install segmenterWhy
Intl.Segmenteris supported in all major browsers and94%of users have it available — it's time for adoption.- If you have a use case other than iterating over all graphemes/words/sentences in a text, then
Intl.Segmentermight be a little hard to work with. - In many cases, working with graphemes is preferable to characters. Graphemes are what the end user sees. For example, the emoji
👨🔧️is a single grapheme but consists of 6 characters.forloop will make 6 iterations,for oflooping👨🔧️will make 4 iterations — it's confusing, just use graphemes. - Before
Intl.Segmenter, working with graphemes required libraries likegraphemerthat is94KBin size.
Usage
import { graphemeAt, graphemeRangeAt, wordAt, wordRangeAt } from "segmenter";
graphemeAt("👨🔧️ the fixer", 3); // 👨🔧️
graphemeRangeAt("👨🔧️ the fixer", 3); // { start: 0, end: 6 }
wordAt("hello-world"); // "hello"
wordRangeAt("hello-world"); // { start: 0, end: 5 }API
Graphemes
graphemeAt(string: string, position: number): string | undefined
Get the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.
graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start and end positions of the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.
graphemes(string: string): string[]
Get all graphemes in the string as Array.
Words
wordAt(string: string, position: number): string | undefined
Get the word at position in string. Returns undefined if position is out of bounds or string is empty.
wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start and end positions of the word at position in string. Returns undefined if position is out of bounds or string is empty.
words(string: string): string[]
Get all words in the string as Array.
Sentences
Note: Intl.Segmenter doesn't do a perfect job of detecting sentences. For example, I went to Dr. Smith's office will be split into two sentences.
sentenceAt(string: string, position: number): string | undefined
Get the sentence at position in string. Returns undefined if position is out of bounds or string is empty.
sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start and end positions of the sentence at position in string. Returns undefined if position is out of bounds or string is empty.
sentences(string: string): string[]
Get all sentences in the string as Array.