1.0.4 • Published 11 months ago

group-similar v1.0.4

Weekly downloads
-
License
MIT
Repository
github
Last release
11 months ago

Group Similar

Discord Twitter Follow

Group similar items together.

Runtime complexity is O(N^2 * (M + α(N))), where N is the number of elements in items, M is the runtime complexity of the similarityFunction, and α(N) is the inverse Ackermann function (amortized constant time for all practical purposes).

Space complexity is O(N).

Getting started

npm i group-similar

Examples

import { groupSimilar } from "group-similar";
import { distance } from "fastest-levenshtein";

function levenshteinSimilarityFunction(a: string, b: string): number {
  return a.length === 0 && b.length === 0
    ? 1
    : 1 - distance(a, b) / Math.max(a.length, b.length);
}

groupSimilar({
  items: ["cat", "bat", "kitten", "dog", "sitting"],
  mapper: (i) => i,
  similarityFunction: levenshteinSimilarityFunction,
  similarityThreshold: 0.5,
});

// [ [ 'cat', 'bat' ], [ 'kitten', 'sitting' ], [ 'dog' ] ]
import { groupSimilar } from "group-similar";

function evenOddSimilarityFunction(a: number, b: number): number {
  return Number(a % 2 === b % 2);
}

groupSimilar({
  items: [1, 5, 10, 0, 2, 123],
  mapper: (i) => i,
  similarityFunction: evenOddSimilarityFunction,
  similarityThreshold: 1,
});

// [ [ 1, 5, 123 ], [ 10, 0, 2 ] ]
import { groupSimilar } from "group-similar";
import { distance } from "fastest-levenshtein";

function nestedMapper(object: { a: { b: { value: string } } }): string {
  return object.a.b.value;
}

function levenshteinSimilarityFunction(a: string, b: string): number {
  return a.length === 0 && b.length === 0
    ? 1
    : 1 - distance(a, b) / Math.max(a.length, b.length);
}

groupSimilar({
  items: [
    { a: { b: { value: "sitting" } } },
    { a: { b: { value: "dog" } } },
    { a: { b: { value: "kitten" } } },
    { a: { b: { value: "bat" } } },
    { a: { b: { value: "cat" } } },
  ],
  mapper: nestedMapper,
  similarityFunction: levenshteinSimilarityFunction,
  similarityThreshold: 0.5,
});

// [
//   [{ a: { b: { value: "sitting" } } }, { a: { b: { value: "kitten" } } }],
//   [{ a: { b: { value: "dog" } } }],
//   [{ a: { b: { value: "bat" } } }, { a: { b: { value: "cat" } } }],
// ]

Syntax

groupSimilar(options);
groupSimilar({ items, mapper, similarityFunction, similarityThreshold });

Parameters

ParameterTypeRequiredDefaultDescription
optionsObjectYesnoneArguments to pass to the function

Options

PropertyTypeRequiredDefaultDescription
itemsT[]YesnoneArray of items to group
mapper(t: T) => KYesnoneFunction to apply to each element in items prior to measuring similarity
similarityFunction(a: K, b: K) => numberYesnoneFunction to measure similarity between mapped items
similarityThresholdnumberYesnoneThreshold at which items whose similarity value is greater than or equal to it are grouped together

Return value

The return value is a new nested array of type T[][] containing elements of items grouped by similarity. If there are no elements in items, an empty array will be returned.

Benchmark

Benchmark test results where N is the number of items being grouped, higher ops/sec is better.

LibraryN=16N=32N=64N=128N=256N=512N=1024N=2048
group-similar8686717538606715944441717527
set-clustering28506625818314551213061

Benchmark configuration details can be found here.

1.0.4

11 months ago

1.0.3

2 years ago

1.0.2

2 years ago

1.0.1

2 years ago

1.0.0

3 years ago

0.0.10

3 years ago

0.0.11

3 years ago

0.0.9

3 years ago

0.0.8

3 years ago

0.0.7

3 years ago

0.0.3

3 years ago

0.0.5

3 years ago

0.0.4

3 years ago

0.0.6

3 years ago

0.0.2

3 years ago

0.0.1

3 years ago