1.3.0 • Published 4 years ago

codepoints v1.3.0

Weekly downloads
18
License
MIT
Repository
github
Last release
4 years ago

codepoints

A parser for files in the Unicode database. Produces a giant array of codepoint objects for every character represented by Unicode, with many properties derived from files in the Unicode database.

BUILD SCRIPTS ONLY: Use in production is not recommended as the parsers are not optimized for speed, the text files are huge, and the resulting array uses a huge amount of memory. To access this data in real world applications, use modules that have precompiled the data into a compressed form:

Installation

Install using npm:

npm install codepoints

Usage

Basic usage:

codepoints = require('codepoints');

The parser generates data by reading the text files contained in the Unicode Character Database. By default, it will use the database bundled with this package. To use a custom version of UCD, use codepoints/parser instead, which accepts an optional path to a directory containing the uncompressed UCD data:

parser = require('codepoints/parser');
codepoints = parser('/path/to/UCD');

Codepoint data

Each element in the generated array is either undefined (for unassigned code points), or an object containing the following properties:

  • code - the code point index
  • name - character name
  • unicode1Name - legacy name used by Unicode 1
  • category - Unicode category
  • block - the block name this character is a part of
  • script - the script this character belongs to
  • eastAsianWidth - the east asian width for this character
  • combiningClass - numeric combining class value
  • combiningClassName - a string name for the combining class
  • bidiClass - class for the Unicode bidirectional algorithm
  • bidiMirrored - whether the character is mirrored in the bidi algorithm
  • numeric - the numeric value for this character
  • uppercase - an array of code points mapping this character to upper case, if any
  • lowercase - an array of code points mapping this character to lower case, if any
  • titlecase - an array of code points mapping this character to title case, if any
  • folded - an array of code points mapping this character to a folded equivalent, if any
  • caseConditions - conditions used during case mapping for this character
  • decomposition - an array of code points that this character decomposes into. Used by the Unicode normalization algorithm.
  • compositions - a dictionary mapping of compositions for this character
  • isCompat - whether the decomposition is a compatibility one
  • isExcluded - whether the character is excluded from composition
  • NFC_QC - quickcheck value for NFC (0 = YES, 1 = NO, 2 = MAYBE)
  • NFKC_QC - quickcheck value for NFKC (0 = YES, 1 = NO, 2 = MAYBE)
  • NFD_QC - quickcheck value for NFD (0 = YES, 1 = NO)
  • NFKD_QC - quickcheck value for NFKD (0 = YES, 1 = NO)
  • joiningType - arabic joining type
  • joiningGroup - arabic joining group

License

MIT

1.3.0

4 years ago

1.2.1

8 years ago

1.2.0

9 years ago

1.1.0

9 years ago

1.0.0

9 years ago