1.5.4 • Published 6 months ago

clusternova v1.5.4

Weekly downloads
-
License
MIT
Repository
github
Last release
6 months ago

Clusternova 🚀

Simple data clustering library in TS

Discover Hidden Patterns in Your Data

Drop millions of recent social media posts and instantly surface trending topics. Feed in your support tickets and instantly have related issues be clustered together.

This TypeScript implementation easily brings clustering to your JavaScript ecosystem, whether you're running in Node.js (or another backend JS runtime) or directly in the browser. Built for real-world applications:

  • 📱 Social Media Intelligence: Surface trending topics from millions of posts in real-time
  • 🎯 Customer Insights: Transform raw feedback into actionable patterns
  • 🤖 AI/ML Pipelines: Cluster high-dimensional embeddings
  • 📊 Real-time Analytics: Process streaming data to detect emerging patterns
  • 🔍 Anomaly Detection: Automatically identify outliers and unusual behaviors

This TypeScript implementation of HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) is based on the paper "Density-Based Clustering Based on Hierarchical Density Estimates" by Ricardo J.G.B. Campello, Davoud Moulavi, and Joerg Sander.

This implementation draws inspiration from:

Why HDBSCAN?

HDBSCAN offers several advantages over traditional clustering algorithms:

  • No need to specify number of clusters: Unlike k-means, HDBSCAN automatically determines the optimal number of clusters
  • Handles variable density clusters: Can find clusters of different shapes and densities
  • Noise handling: Automatically identifies and handles noise points
  • Hierarchical clustering: Provides insights into the hierarchical structure of your data

Performance

Built from scratch in TypeScript in a single file with zero dependencies, this TypeScript implementation includes several optimizations, but not all the optimizations used in the HDBSCAN Python library. In my personal testing, it's shown comparable or better performance than the Python implementation.

  • ✨ Zero dependencies
  • 🚀 Pure TypeScript implementation
  • ⚡️ Optimized for JS runtimes
  • 🔒 Type-safe API
  • 📦 Small bundle size

Setup

npm install clusternova

Usage

import HDBSCAN, { findCentralElements, cosine, VectorPoint } from "clusternova";
// manhattan, euclidean are also available imports

// Extend VectorPoint with your additional fields
interface MyDataPoint extends VectorPoint {
  title: string;        // Optional fields
  timestamp: Date;      // that you might need
}

const data: MyDataPoint[] = [
  {
    id: "1",
    vector: [1, 2, 3],
    title: "First point",
    timestamp: new Date()
  },
  {
    id: "2",
    vector: [4, 5, 6],
    title: "Second point",
    timestamp: new Date()
  }
  // ... more points
];

const hdbscan = new HDBSCAN(data, 3, cosine); // minimum points = 3, cosine distance metric
const { clusters, outliers } = hdbscan.run();
// Types of returned data:
// clusters: MyDataPoint[][] - Array of clusters, each containing array of your data points
// outliers: MyDataPoint[] - Array of outlier points

// Example output:
{
    clusters: [
    [
        { id: "1", vector: [1, 2, 3], title: "First point", timestamp: "2024-01-01..." },
        { id: "2", vector: [4, 5, 6], title: "Second point", timestamp: "2024-01-01..." }
    ],
    // ... more clusters
    ],
    outliers: [
    { id: "7", vector: [7, 8, 9], title: "Outlier point", timestamp: "2024-01-01..." }
    // ... more outliers
    ]
}

// Find central elements of each cluster - returns array of elements with distance field (distance from the center of the cluster)
clusters.forEach(cluster => {
  const centralElements = findCentralElements(cluster, 3, cosine);
  // Returns: (MyDataPoint & { distance: number })[]
  // Example:
  // [
  //   { id: "1", vector: [1,2,3], title: "First point", distance: 0.2 },
  //   { id: "2", vector: [4,5,6], title: "Second point", distance: 0.3 }
  // ]
});

Example Projects

We have included an examples directory with web apps that demonstrate how to use Clusternova in real-world scenarios. One of them, social-media-clustering, is deployed and available for you to try (bring your own API key):

Social Media Clustering Demo: https://astonishing-donut-b3686d.netlify.app

This demo showcases clustering of the text embeddings of media posts (tweets) and summarizing each cluster using GPT3.5.

You can also clone the repo and run it yourself.

Example Use Cases

HDBSCAN is particularly valuable in:

Text Analysis

Despite theoretical limitations of clustering in high dimensions, I've found HDBSCAN works exceptionally well with text embeddings using cosine distance, making it particularly useful for NLP applications.

Examples:

  • Clustering social media posts to identify trending topics
  • Grouping customer support tickets to identify common issues
  • Organizing document collections by theme
  • Identifying similar product reviews or feedback

Image Processing

  • Grouping similar images in large collections
  • Identifying distinct objects in computer vision applications
  • Clustering visual features for object recognition

Scientific Applications

  • Gene expression clustering

Business Intelligence

  • Market segmentation
  • Anomaly detection in transactions
  • Customer behavior pattern identification

License information

This project is licensed under the MIT License.

Copyright (c) 2025 Kevin Ma

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

1.5.4

6 months ago

1.5.3

6 months ago

1.5.2

6 months ago

1.5.1

6 months ago

1.5.0

6 months ago

1.4.0

6 months ago

1.3.0

6 months ago

1.2.0

6 months ago

1.1.0

6 months ago

1.0.0

6 months ago