3.0.0-alpha.6 • Published 1 year ago

@dwscdv3/pinyin v3.0.0-alpha.6

Weekly downloads
-
License
MIT
Repository
github
Last release
1 year ago

pīnyīn (v3)

pinyin, The convert tool of chinese pinyin.


NPM version Build Status Coverage Status Language Grade: JavaScript NPM downloads

Web Site: 简体中文 | English

README: 简体中文 | English

Convert Han to pinyin. useful for phonetic notation, sorting, and searching.

Note: This module both support Node and Web browser.

Python version see mozillazg/python-pinyin


Feature

  • Segmentation for heteronym words.
  • Support Traditional and Simplified Chinese.
  • Support multiple pinyin style.

Install

via npm:

npm install pinyin@alpha --save

Usage

for developer:

import pinyin from "pinyin";

console.log(pinyin("中心"));    // [ [ 'zhōng' ], [ 'xīn' ] ]

console.log(pinyin("中心", {
  heteronym: true                // Enable heteronym mode.
}));                            // [ [ 'zhōng', 'zhòng' ], [ 'xīn' ] ]

console.log(pinyin("中心", {
  heteronym: true,              // Enable heteronym mode.
  segment: true                 // Enable Chinese words segmentation, fix most heteronym problem.
}));                            // [ [ 'zhōng' ], [ 'xīn' ] ]

console.log(pinyin("我喜欢你", {
  segment: true,                // Enable segmentation. Needed for grouping.
  group: true                   // Group pinyin segments
}));                            // [ [ 'wǒ' ], [ 'xǐhuān' ], [ 'nǐ' ] ]

console.log(pinyin("中心", {
  style: pinyin.STYLE_INITIALS, // Setting pinyin style.
  heteronym: true
}));                            // [ [ 'zh' ], [ 'x' ] ]

console.log(pinyin("华夫人", {
  mode: "surname",              // 姓名模式。
}));                            // [ ['huà'], ['fū'], ['rén'] ]

for cli:

$ pinyin 中心
zhōng xīn
$ pinyin -h

Types

IPinyinOptions

The types for the second argument of pinyin method.

export interface IPinyinOptions {
  style?: IPinyinStyle; // output style of pinyin.
  mode?: IPinyinMode, // mode of pinyin.
  segment?: IPinyinSegment | boolean;
  heteronym?: boolean;
  group?: boolean;
  compact?: boolean;
}

IPinyinStyle

The output style of pinyin.

export type IPinyinStyle =
  "normal" | "tone" | "tone2" | "to3ne" | "initials" | "first_letter" | // Suggest.
  "NORMAL" | "TONE" | "TONE2" | "TO3NE" | "INITIALS" | "FIRST_LETTER" |
  0        | 1      | 2       | 5       | 3          | 4;               // compatibility.

IPinyinMode

The mode of pinyin.

// - NORMAL: Default mode is normal mode.
// - SURNAME: surname mode, for chinese surname.
export type IPinyinMode =
  "normal" | "surname" |
  "NORMAL" | "SURNAME";

IPinyinSegment

The segment method.

  • Default is disable segment: false
  • If set true, use "segmentit" module for segment in Web, use "@node-rs/jieba" for segment in Node.
  • Also specify follow string for segment (bug just "segmentit" in web):
export type IPinyinSegment = "segmentit" | "@node-rs/jieba";

API

<Array> pinyin(words[, options])

Convert Han (汉字) to pinyin.

options argument is optional, for sepcify heteronym mode and pinyin styles.

Return a Array<Array<String>>. If one of Han is heteronym word, it would be have multiple pinyin.

Number pinyin.compare(a, b)

Default compare implementation for pinyin.

Options

<Boolean> options.segment

Enable Chinese word segmentation. Segmentation is helpful for fix heteronym problem, but performance will be more slow, and need more CPU and memory.

Default is false.

<Boolean> options.heteronym

Enable or disable heteronym mode. default is disabled, false.

<Boolean> options.group

Group pinyin by phrases. for example:

我喜欢你
wǒ xǐhuān nǐ

<Object> options.style

Specify pinyin style. please use static properties like STYLE_*. default is .STYLE_TONE. see Static Property for more.

options.mode

pinyin mode, default is pinyin.MODE_NORMAL. If you cleared in surname scene, use pinyin.MODE_SURNAME maybe better.

Static Property

.STYLE_NORMAL

Normal mode.

Example: pin yin

.STYLE_TONE

Tone style, this is default.

Example: pīn yīn

.STYLE_TONE2

tone style by postfix number 0-4.

Example: pin1 yin1

.STYLE_TO3NE

tone style by number 0-4 after phonetic notation character.

Example: pin1 yin1

.STYLE_INITIALS

Initial consonant (of a Chinese syllable).

Example: pinyin of 中国 is zh g

Note: when a Han (汉字) without initial consonant, will convert to empty string.

.STYLE_FIRST_LETTER

First letter style.

Example: p y

pinyin.MODE_NORMAL

Normal mode. This is the default mode.

pinyin.MODE_SURNAME

Surname mode. If chinese word is surname, The pinyin of surname is prioritized.

Test

npm test

Q&A

What's the different Node version and Web version?

pinyin support Node and Web browser now, the API and usage is complete same.

But the Web version is simple than Node version. Just frequently-used dict, without segmentation, and the dict is compress for web.

Because of Traditional and Segmentation, the convert result will be not complete same. and the test case have some different too.

FeatureWeb versionNode version
DictFrequently-used Dict, Compress.Complete Dict, without Compress.
SegmentationNOSegmentation options.
TraditionalNOFull Traditional support.

How to sort by pinyin?

This module provide default compare implementation:

const pinyin = require('pinyin');

const data = '我要排序'.split('');
const sortedData = data.sort(pinyin.compare);

But if you need different implementation, do it like:

const pinyin = require('pinyin');

const data = '我要排序'.split('');

// Suggest you to store pinyin result by data persistence.
const pinyinData = data.map(han => ({
  han: han,
  pinyin: pinyin(han)[0][0], // Choose you options and styles.
}));
const sortedData = pinyinData.sort((a, b) => {
  return a.pinyin.localeCompare(b.pinyin);
}).map(d => d.han);

Donate

If this module is helpful for you, please Star this repository.

And you have chioce donate to me via Aliapy or WeChat:

License

MIT