1.0.1 • Published 11 months ago

tiptap-word-count-multilingual v1.0.1

Weekly downloads
-
License
MIT
Repository
-
Last release
11 months ago

tiptap-word-count-multilingual

Background

When counting words, the official Tiptap CharacterCount extension uses a simple method: it splits the text using text.split(' ') and counts the resulting array's length. This approach only works for languages where words are separated by spaces.

However, this method fails to accurately count words in languages like those using CJK characters, where words are not separated by spaces.

What this extension does

This extension addresses this issue by leveraging Alfaaz. As described on its page:

Alfaaz is the fastest multilingual word counter that can count millions of words per second (up to 0.9 GB/s 100x faster than RegExp based solutions). It has built-in support for CJK texts & words in many different languages such as Urdu & Arabic.

Compared to the original extension, this extension introduces one single change:

// Original extension
this.storage.words = options => {
    const node = options?.node || this.editor.state.doc
    const text = node.textBetween(0, node.content.size, ' ', ' ')
    const words = text.split(' ').filter(word => word !== '')

    return words.length
}

// This extension
import { countWords } from 'alfaaz'

this.storage.words = options => {
    const node = options?.node || this.editor.state.doc
    const text = node.textBetween(0, node.content.size, ' ', ' ')

    return countWords(text)
}

All other aspects remain identical, allowing for a seamless replacement of the original extension without significant modifications.

Installation & Usage

Install

npm install tiptap-word-count-multilingual

Usage

import WordCount from "tiptap-word-count-multilingual";

const editor = new Editor ({
    extensions: [
        Document,
        Paragraph,
        Text,
        WordCount
    ]
})

// Get the number of words for the current document.
editor.storage.characterCount.words()

// Get the number of words for a specific node.
editor.storage.characterCount.words({ node: someCustomNode })

Since this extension only modifies the word counting mechanism, all settings and storage functionalities from the official extension should still work the same way.

If you are migrating from the official extension, simply update the import statement.

1.0.1

11 months ago

1.0.0

11 months ago