string-comparisons v0.0.20
String Comparisons
This library offers a range of functions to calculate text similarity, allowing you to measure the likeness of text data in an application. It implements well-established similarity metrics. The library currently supports the following algorithms:
- Cosine Similarity
- Jaccard Similarity
- Jaro Similarity
- Damerau-Levenshtein Distance
- Hamming Distance
- Levenshtein Distance
- Smith-Waterman Alignment
- Sørensen-Dice Coefficient
- Jaccard Similarity based on Trigrams
- Szymkiewicz Simpson Overlap
- N-Gram
- Q-Gram
- Optimal String Alignment
Installation
Assuming you have Node.js and npm/yarn/pnpm installed, install the library using:
# Install the 'string-comparisons' package using npm
npm install string-comparisons
# Alternatively, install the 'string-comparisons' package using yarn
yarn add string-comparisons
# Or, install the 'string-comparisons' package using pnpm
pnpm add string-comparisons
Docs
Find more information on the algorithms by accessing the class documentation of each implemented algorithm.
String Similarity Algorithm Comparison
Algorithm | Normalized | Metric | Similarity | Distance | Space Complexity |
---|---|---|---|---|---|
cosine.js | Yes | Vector Space Model | ✓ | O(n) | |
jaro.js | No | Edit Distance | ✓ | O(min(n, m)) | |
jaccard.js | No | Set Theory | ✓ | O(min(n, m)) | |
damerauLevenshtein.js | No | Edit Distance | ✓ | O(max(n, m)²) | |
hammingDistance.js | No | Bitwise Operations | ✓ | O(1) | |
jaroWinkler.js | No | Edit Distance | ✓ | O(min(n, m)) | |
levenshtein.js | No | Edit Distance | ✓ | O(max(n, m)²) | |
smithWaterman.js | No | Dynamic Programming (Local Alignment) | ✓ | O(n * m) | |
sorensenDice.js | No | Set Theory | ✓ | O(min(n, m)) | |
trigram.js | No | N-gram Overlap | ✓ | O(n²) | |
szymkiewiczSimpsonOverlap.js | Yes | Overlap Coefficient | ✓ | O(min(m, n)) | |
nGram.js | Yes | Jaccard similarity coefficient | ✓ | O(m * n) | |
qGram.js | Yes | Jaccard similarity coefficient | ✓ | O(n + m) | |
optimalStringAlignment.js | No | Edit distance | ✓ | O(max(n, m)²) |
Explanation of Columns:
- Normalized: Indicates whether the algorithm produces a score between 0 and 1 (normalized).
- Metric: The underlying mathematical concept used for comparison.
- Similarity: Whether the algorithm outputs a higher score for more similar strings.
- Distance: Whether the algorithm outputs a lower score for more similar strings. (One algorithm might use similarity, another distance - they provide the opposite information).
- Space Complexity: The amount of extra memory the algorithm needs to run the comparison.
Notes:
- ✓ indicates the algorithm applies to that category.
- Some algorithms can be used for both similarity and distance calculations depending on the interpretation of the score.
Example Usage
import StringComparisons from 'string-comparisons';
const { Cosine, Jaccard, Jaro, DamerauLevenshtein, HammingDistance, JaroWrinker, Levenshtein, SmithWaterman, SorensenDice, Trigram } = StringComparisons;
const string1 = 'programming';
const string2 = 'programmer';
console.log('Jaro-Winkler similarity:', JaroWrinker.similarity(string1, string2)); // Output: ~0.9054545454545454
console.log('Levenshtein distance:', Levenshtein.similarity(string1, string2)); // Output: 3
console.log('Smith-Waterman similarity:', SmithWaterman.similarity(string1, string2)); // Output: 16
const set1 = new Set([1, 2, 3]);
const set2 = new Set([2, 3, 4]);
console.log('Sørensen-Dice similarity:', SorensenDice.similarity(set1, set2)); // Output: 0.6666666666666667
const trigram1 = 'hello';
const trigram2 = 'world';
console.log('Trigram Jaccard similarity:', Trigram.similarity(trigram1, trigram2)); // Output: 0 (no shared trigrams)
// so on
Contributing
We encourage contributions to this library! Feel free to fork the repository, make your changes, and submit pull requests.
Support the Project ⭐
If you feel awesome and want to support us in a small way, please consider starring and sharing the repo! This helps us get visibility and allow the community to grow. 🙏
Contact Us
If you have any questions or feedback, please don't hesitate to contact us at sumn2u@gmail.com, or reach out to Suman directly. We hope you find this resource helpful 💜.
License Information
This project is licensed under the MIT , which means that you are free to use, modify, and distribute the code as long as you comply with the terms of the license.
Resources
- String Similarity Comparison in JS with Examples
- Cosine similarity between two sentences
- The complete guide to string similarity algorithms
- N-Gram Similarity and Distance
- Approximate string-matching with q-grams and maximal matches
- Research on string similarity algorithm based on Levenshtein Distance
- String similarity search and join: a survey
21 days ago
21 days ago
27 days ago
30 days ago
30 days ago
30 days ago
30 days ago
29 days ago
29 days ago
29 days ago
29 days ago
1 month ago
1 month ago
1 month ago
1 month ago
1 month ago
1 month ago
1 month ago
1 month ago
1 month ago