1.0.0 • Published 9 years ago

hyperloglog32 v1.0.0

Weekly downloads
4
License
MIT
Repository
github
Last release
9 years ago

hyperloglog32

HyperLogLog distinct value estimator for node and the browser using a 32-bit murmurhash3. Fork of hyperloglog (MIT © Optimizely, Inc). From Wikipedia: HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset (the cardinality).

npm status Travis build status AppVeyor build status Dependency status

Jump to: api / install / license

example

Insert two distinct values into an HLL structure with 12 bit indices. Hashing is done for you:

var HyperLogLog = require('hyperloglog32')
var h = HyperLogLog(12)

h.add('value 1')
h.add('value 2')
h.add('value 1')

h.count() === 2;

api

h = HyperLogLog(n)

Construct an HLL data structure with n bit indices. This implies that there will be 2^n buckets (and required octets). Typical values for n are around 12, which would use 4096 buckets and yield less than 1.625% relative error. Higher values use more memory but provide greater precision. Here's a nice table.

h.add(string)

Add a value.

h.count()

Get the current estimate of the number of distinct values.

h.state()

Get the internal HLL state as a Buffer.

h.merge(h2 || Buffer)

Merge another HLL's state into this HLL. If the incoming data has fewer buckets than this HLL, this one will be folded down to be the same size as the incoming data, with a corresponding loss of precision. If the incoming data has more buckets, it will be folded down as it is merged. The result is that this HLL will be updated as though it had processed all values that were previously processed by either HLL.

h1.add('value 1')
h1.add('value 2')
h2.add('value 2')
h2.add('value 3')

h1.merge(h2)
h1.count() === 3;

h.error()

Estimate the relative error for this HLL.

install

With npm do:

npm i hyperloglog32

and browserify for the browser.

license

MIT © Vincent Weevers