1.1.0 • Published 4 months ago

@shelf/text-normalizer v1.1.0

Weekly downloads
-
License
MIT
Repository
-
Last release
4 months ago

text-normalizer CircleCInpm.io

Originally took from openai/whisperer and rewrote to TS

TypeScript library for normalizing English text. It provides a utility class EnglishTextNormalizer with methods for normalizing various types of text, such as contractions, abbreviations, and spacing. EnglishTextNormalizer consists of other classes you can reuse independently:

  • EnglishSpellingNormalizer - uses a dictionary of English words and their American spelling. The dictionary is stored in a JSON file named english.json
  • EnglishNumberNormalizer - works specifically to normalize text from English words to actually numbers
  • BasicTextNormalizer - provides methods for removing special characters and diacritics from text, as well as splitting words into separate letters.

Install

$ yarn add @shelf/text-normalizer

Usage

import {EnglishTextNormalizer} from '@shelf/text-normalizer'

const normalizer = new EnglishTextNormalizer()

console.log(normalizer.normalize("Let's")); // Output: let us
console.log(normalizer.normalize("he's like")); // Output: he is like
console.log(normalizer.normalize("she's been like")); // Output: she has been like
console.log(normalizer.normalize('10km')); // Output: 10 km
console.log(normalizer.normalize('10mm')); // Output: 10 mm
console.log(normalizer.normalize('RC232')); // Output: rc 232
console.log(
  normalizer.normalize('Mr. Park visited Assoc. Prof. Kim Jr.')
); // Output: mister park visited associate professor kim junior

Publish

$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags

License

MIT © Shelf

1.1.0

4 months ago

1.0.3

1 year ago

1.0.2

1 year ago

1.0.1

1 year ago

1.0.0

1 year ago