segmenter v2.0.1
segmenter
Work with grapheme, words, and sentences with small, simple, and fast API using
Intl.Segmenter
Install
npm install segmenter
Why
Intl.Segmenter
is supported in all major browsers and94%
of users have it available — it's time for adoption.- If you have a use case other than iterating over all graphemes/words/sentences in a text, then
Intl.Segmenter
might be a little hard to work with. - In many cases, working with graphemes is preferable to characters. Graphemes are what the end user sees. For example, the emoji
👨🔧️
is a single grapheme but consists of 6 characters.for
loop will make 6 iterations,for of
looping👨🔧️
will make 4 iterations — it's confusing, just use graphemes. - Before
Intl.Segmenter
, working with graphemes required libraries likegraphemer
that is94KB
in size.
Usage
import { graphemeAt, graphemeRangeAt, wordAt, wordRangeAt } from "segmenter";
graphemeAt("👨🔧️ the fixer", 3); // 👨🔧️
graphemeRangeAt("👨🔧️ the fixer", 3); // { start: 0, end: 6 }
wordAt("hello-world"); // "hello"
wordRangeAt("hello-world"); // { start: 0, end: 5 }
API
Graphemes
graphemeAt(string: string, position: number): string | undefined
Get the grapheme at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start
and end
positions of the grapheme at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
graphemes(string: string): string[]
Get all graphemes in the string
as Array
.
Words
wordAt(string: string, position: number): string | undefined
Get the word at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start
and end
positions of the word at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
words(string: string): string[]
Get all words in the string
as Array
.
Sentences
Note: Intl.Segmenter
doesn't do a perfect job of detecting sentences. For example, I went to Dr. Smith's office
will be split into two sentences.
sentenceAt(string: string, position: number): string | undefined
Get the sentence at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start
and end
positions of the sentence at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
sentences(string: string): string[]
Get all sentences in the string
as Array
.