0.6.1 • Published 7 months ago
@jbeuckm/k-means-js v0.6.1
K-Means Clustering
A basic Javascript implementation of the cluster analysis algorithm.
Install
npm i @jbeuckm/k-means-js --save
Usage
- Optionally, normalize the data.
The normalizer will scale numerical data between 0,1 and will generate n outputs of either zero or one for discrete data, eg. category.
// Tell the normalizer about the category field.
const params = {
category: "discrete",
};
// Category is a discrete field with two possible values.
// Value is a linear field with continuous possible values.
const data = [
{
category: "a",
value: 25,
},
{
category: "b",
value: 7.6,
},
{
category: "a",
value: 28,
},
];
import { dataset } from "@jbeuckm/k-means-js";
// Get ranges for normalizing and denormalizing the data
const ranges = dataset.findRanges(params, data);
// Optionally, set the relative importance of one or more fields
// *The default weight for any field is one.*
const weights = { category: 2 };
const normalized = dataset.normalize(data, ranges, weights);- Run the algorithm.
// This non-normalized sample data with n=k is a pretty awful example.
var points = [
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9],
];
var k = 3;
import kmeans from "@jbeuckm/k-means-js";
const means = kmeans.cluster(points, k, console.log);The call to cluster() will find the data's range in each dimension, generate k=3 random points, and iterate until the means are static.
- Find the best K
The method described by Pham, et al. is implemented. The algorithm evaluates K-means repeatedly for different values of K, and returns the best (guess) value for K as well as the set of means found during evaluation.
import { phamBestK } from "@jbeuckm/k-means-js";
const maxKToTest = 10;
const result = phamBestK.findBestK(points, maxKToTest);
console.log("this data has " + result.K + " clusters");
console.log("cluster centroids = " + result.means);- Denormalize data
Denormalization can be used to show the means discovered:
for (let i = 0, l = result.means.length; i < l; i++) {
console.log(dataset.denormalizeDatum(result.means[i], ranges));
}Todo
- ~denormalize data~
- provide ability to label data points, dimensions and means
- build an asynchronous version of the algorithm
- Typescript
