2.2.1 • Published 10 months ago
@isdk/nlp-jieba v2.2.1
@isdk/nlp-jieba
Introduction
@isdk/nlp-jieba uses the Chinese segmentation tool jieba_rs in a WebAssembly (WASM) environment. It provides seamless integration with JavaScript, allowing you to perform Chinese text segmentation, part-of-speech tagging, and other NLP tasks efficiently in both Node.js and browser environments.
Features
- Segmentation Modes: Supports default (precise), full, and search modes.
- Dictionary Management: Can load default or custom dictionaries and supports dynamic addition and removal of words.
- Frequency Adjustment: Provides suggestions for word frequency and allows manual setting of word frequencies.
- Part-of-Speech Tagging: Supports part-of-speech tagging based on the Hidden Markov Model (HMM).
Installation
Using npm
npm install @isdk/nlp-jiebaExamples
Basic Usage in the Browser
import init, {addDict, split} from '@isdk/nlp-jieba';
async function main() {
await init(); // Initialize the WASM module, only required in the browser
/ Load custom dictionary (optional)
const dictContent = `word1 100\nword2 200`;
await addDict(dictContent);
// Segmentation example
const result = split("我爱北京天安门");
console.log(result); // Output segmentation result
}
main();Basic Usage in Node.js
import {addDict, split} from '@isdk/nlp-jieba';
// Load custom dictionary (optional)
const dictContent = `word1 100\nword2 200`;
await addDict(dictContent);
// Segmentation example
const result = split("我爱北京天安门");
console.log(result); // Output segmentation resultAdvanced Usage
Custom Segmentation Options
const options = {
mode: "Search", // Options: "Default", "All", "Search"
hmm: true // Enable HMM model, default is false
};
const words = split("我喜欢编程", options);
console.log(words);Adding New Words
addWord("新词汇", 100, "n"); // Add a noun
console.log(hasWord("新词汇")); // Check if the word existsPart-of-Speech Tagging
const tags = tag("我喜欢吃苹果", true); // true: enable HMM
tags.forEach(tag => {
console.log(`${tag.word}: ${tag.tag}`);
});Tokenize
The tokenize function returns segmentation results along with position information.
const options = {
mode: "Search", // Options: "Default", "Search"
hmm: true // Enable HMM model, default is false
};
const tokens = tokenize("我喜欢吃苹果", options);
tokens.forEach(token => {
console.log(`Word: ${token.word}, Start: ${token.start}, End: ${token.end}`);
});API Documentation
split(text: string, options?: JiebaSplitOptions): Segment the text.tokenize(text: string, options?: JiebaSplitOptions): Segment the text and return position information.addDict(dict_content: string | Uint8Array, clear?: boolean): Load a custom dictionary. The optional parameterclearindicates whether to clear the existing dictionary.addDefaultDict(clear?: boolean): Load the default dictionary. The optional parameterclearindicates whether to clear the existing dictionary.clear(): Clear all loaded words.suggestFreq(segment: string): Get suggested word frequency.addWord(word: string, freq?: number, tag?: string): Add a new word.removeWord(word: string): Remove a word.hasWord(word: string): Check if a word exists.tag(sentence: string, hmm?: boolean): Perform part-of-speech tagging on a sentence.
Contribution
Welcome contributions in any form, including bug reports, code improvements, and documentation enhancements.
License
@isdk/nlp-jieba is licensed under the MIT License.