2.0.0 • Published 2 years ago

multiscale-timeseries v2.0.0

Weekly downloads
9
License
MIT
Repository
github
Last release
2 years ago

multiscale-timeseries

Small utility for maintaining multi-scale timeseries records.

Why? The problem at hand is to record analytics for various events such as page views, user logins, and usage of features for a Web-based software product. This library was created out of the need for a lightweight solution that has the following properties:

  • Represent multiple levels of detail including hours, days, weeks, months, quarters, and years.
  • Age out data to maintain an upper bound on storage for each timeseries record.
  • Persist the timeseries as JSON.
  • Access the analytics data for presentation within the product.

Example Usage

const { increment } = require('multiscale-timeseries');

const date = new Date('2020-10-05T14:32:40.441Z');
const maxEntries = 1,000; // Max entries governing age-out for each interval.
const record = {}; // Can be an empty object or existing record.

// This is the main call - increment a timeseries record based on a date.
const record = increment(record, date, maxEntries);

// The result looks like this:
assert.deepEqual(record, {
  minutes: { '2020-10-05T10:32': 1 },
  hours: { '2020-10-05T10': 1 },
  days: { '2020-10-05': 1 },
  weeks: { '2020-W41': 1 },
  months: { '2020-10': 1 },
  quarters: { '2020-Q4': 1 },
  years: { 2020: 1 },
  all: { all: 1 },
});

Once a record exists, you can increment it again and again. After accumulating maxEntries entries for any given interval, the oldest of the entries are deleted. For example hours entries would be the first to age out.

After incrementing many times, you end up with a record that looks like this:

const date = new Date('2020-12-06T14:32:40.441Z');
const maxEntries = 2; // Low value to demonstrate age-out.

record = increment(record, date, maxEntries);

assert.deepEqual(record, {
  days: { '2020-12-05': 2, '2020-12-06': 1 },
  hours: { '2020-12-05T09': 2, '2020-12-06T09': 1 },
  minutes: { '2020-12-05T09:32': 2, '2020-12-06T09:32': 1 },
  months: { '2020-10': 2, '2020-12': 3 },
  quarters: { '2020-Q4': 5 },
  weeks: { '2020-W41': 2, '2020-W49': 3 },
  years: { 2020: 5 },
  all: { all: 5 },
});

What should maxEntries be?

The point of this library is to have an upper bound on the size of a multiscale timeseries record, regardless of how many times it is incremented. It begs the question, how does maxEntries relate to the size in Kilobytes of the record stored on disk?

A script (sizeEstimator.js in this repo) was developed that simulates updating a timeseries record every hour for one year, using various values for maxEntries, and estimating the size of the output as stringified JSON. Here is the output of that script:

maxEntriesKilobytes
102 KB
202 KB
303 KB
404 KB
504 KB
605 KB
705 KB
806 KB
906 KB
1007 KB
15010 KB
20012 KB
25015 KB
30018 KB
35020 KB
40023 KB
45025 KB
50026 KB
60030 KB
70034 KB
80038 KB
90042 KB
100045 KB
150065 KB
200084 KB

FAQ

  • Why not just use Google Analytics? With multiscale-timeseries and any lightweight data store, you can roll your own analytics and have full access to the data, so you can present it in your product directly. Other than that, Google Analytics is a perfectly good solution.
2.0.0

2 years ago

1.4.0

4 years ago

1.3.0

4 years ago

1.2.0

4 years ago

1.1.0

4 years ago

1.0.0

4 years ago