bmpm v0.1.1
Beider-Morse Phonetic Matching
A JavaScript port of the original PHP library. It is heavily based on the Go port from F1monkey Labs BMPM is a phonetic algorithm used for indexing and matching names in multiple languages. Contains a huge amount of different rules to transform a word to its phonetic representation.
This implementation is missing the Ashkenazi and Sephardic rulesets. They'll be added in a later release!
What is the Beider-Morse Phonetic Matching algorithm?
Phonetic matching is the process of comparing two words based on their phonetic encoding, rather than their spelling. A phonetic encoding is a sound-based representation of a word. Popular examples of phonetic encoding algorithms, which take a written word and produce a sound-based phonetic encoding of that word, include Soundex and Metaphone. These algorithms are often used to search for names in census databases, where they may be misspelled or where spellings may have diverged over time.
In 2008, Alexander Beider and Stephen Morse developed a new phonetic encoding algorithm that supports 19 languages, with additional rules for Ashkenazi and Sephardic names in those languages. The primary purpose of this algorithm was to search for Jewish names in large census and immigration databases. More information about the algorithm can be found in these articles from Beider and Morse: https://stevemorse.org/phonetics/bmpm.htm, https://stevemorse.org/phonetics/bmpm2.htm
Using bmpm
Install bmpm:
npm install bmpm # or yarn or pnpm or bun or what-have-you
Construct an encoder with the desired language and accuracy:
import { Encoder, Accuracy } from "bmpm"
const encoder = new Encoder(
// Uses the approximate encoding, which will produce more
// encodings that less precisely match the input word's
// pronunciation
Accuracy.APPROX,
"english",
)
const encodings = encoder.encode("orange")
console.log(encodings)
// => [
// 'orink', 'ornk', 'oringY', 'oringi',
// 'orngY', 'orngi', 'orinS', 'ornS',
// 'orinzY', 'orinzi', 'ornzY', 'ornzi',
// 'oronk', 'orongY', 'orongi', 'oronS',
// 'oronzY', 'oronzi', 'orank', 'orangY',
// 'orangi', 'oranS', 'oranzY', 'oranzi',
// 'arink', 'arnk', 'aringY', 'aringi',
// 'arngY', 'arngi', 'arinS', 'arnS',
// 'arinzY', 'arinzi', 'arnzY', 'arnzi',
// 'aronk', 'arongY', 'arongi', 'aronS',
// 'aronzY', 'aronzi', 'arank', 'arangY',
// 'arangi', 'aranS', 'aranzY', 'aranzi'
// ]