Wills-data-utils NPM

Will's Data Utilities

This library provides several useful functions for manipulating data in javascript.

Setup

In you project directory, type this into the console npm install wills-data-utils Then you can add the node module as usual:

var d8ta = require('wills-data-utils');

All the examples in this readme call the library d8ta, but you could name it anything else you would like.

This library does not depend on, but is much more useful in conjunction with d3.js. It uses but does not provide functions for generating random values, so it is best to use this library in conjunction with a library that provides sampling methods from various probability distributions. (Here is one such library.)

Methods

d8ta.data([data]): either sets or returns the current data array, depending on whether there is an argument supplied. Data set with this can subsequently be chained with other methods.

var x = d8ta.data([1,2,3,4]);
x.data() // -> [1,2,3,4]

d8ta.generate(n, name, func): generates a sample of size n using the supplied function, and saves the data in the object under the variable name supplied, which can be chained with other methods or returned with .data().

var x = d8ta.generate(4, 'count', function() { return 28; });
x.data() // -> [{'count': 28}, {'count': 28}, ...]

d8ta.bin(data, by, name): takes a separate data set, the name of a variable in the current data to bin using, and the name of the variable to save the counts as. NOTE: the binning data must be numeric, sorting in ascending or descending order, and equally spaced. The bins are defined such that the supplied numbers represent the beginning of the interval, and the separation between values defines the width. So, supplying [1920, 1922, 1924] would create the bins [1920, 1922), [1922, 1924), [1924, 1926).

var observedYears = [1921, 2010, 1918, 1922, 1925, 1926, 1924.5];
d8ta.data([{'year': 1920},
           {'year': 1922},
		   {'year': 1924}])
    .bin(observedYears, 'year', 'count')
	.data();
// -> [{'year': 1920, 'count': 1},
//     {'year': 1922, 'count': 1},
//	   {'year': 1924, 'count': 2}]

d8ta.fill(name, defaults, [start, end]): takes the name of a variable and fills in the missing integer values. (Hopefully later values will do that same with dates.) There are optional start and end values if you want to fill in below and/or above the min and max values. If only supplied two arguments, it will figure out whether you meant start or end based on data.

d8ta.data([{'year': 1999, 'count': 1},
           {'year': 2001, 'count': 23},
		   {'year': 2002, 'count': 4}])
    .fill(year, {'count': 0})
	.data();
// -> [{'year': 1999, 'count': 1},
//     {'year': 2000, 'count': 0},
//     {'year': 2001, 'count': 23},
//     {'year': 2002, 'count': 4}]

d8ta.cumulative(input, output, baseline=0): adds a variable called output to data set that gives a cumulative count of the variable input. The optional variable baseline gives the number to be added to the first observation.

d8ta.data({'count': 20},
          {'count': 2},
		  {'count': 22}])
    .cumulative('count', 'totalCount')
	.data();
// -> [{'count': 20, 'totalCount': 20},
//     {'count': 2, 'totalCount': 22},
//     {'count': 22, 'totalCount': 44}]

d8ta.percentGrowth(input, output, trim=true): given an input variable to look at this function will output a variable giving percent growth since last observation. By default this will remove the first observation, because there is no way to calculate growth for that observation. If trim is set to false, it will output 0 as the growth for the first observation.

d8ta.data([{'count': 20},
           {'count': 22},
		   {'count': 44}])
    .percentGrowth('count', 'growth')
	.data();
// -> [{'count': 22, 'growth': 10},
//     {'count': 44, 'growth': 100}]

d8ta.select(name): Outputs an array, giving the data from the variable called name.

d8ta.data([{'x': 1, 'y': 2},
           {'x': 4, 'y': 3},
		   {'x': 2, 'y': 5}])
    .select('x');
// -> [1, 4, 2]

d8ta.collect: Converts an object of arrays into an array of objects. For example, given the template {'x': [1,2,3], 'y': [4,5,6]} it will output an array whose first element is {'x': 1, 'y': 4}.

var x = [1, 2, 3];
var y = [4, 5, 6];
d8ta.collect({'low': x, 'high': y});
// -> [{'low': 1, 'high': 4}, {'low': 2, 'high': 5}, ...]

d8ta.thePast(n, time): given either 'days', 'weeks', or 'years', this function will return an array of datetime strings for the past n days (or months or years).

d8ta.thePast(12, 'months');
// -> ["2014-06-12T16:53:59-07:00", "2014-07-12T16:53:59-07:00", ... ]

##Examples Say you had a data set of years and the number of users that joined your website in that year, but you have missing observations from years that had zero users joining. You want to fill in the missing dates and get the cumulative count and percentage growth of users.

Here is what you could do with this library:

var data = [{'year': 2010, 'count': 12},
            {'year': 2011, 'count': 2},
			{'year': 2013, 'count': 1},
			{'year': 2014, 'count': 6}];
data = d8ta.data(data) // Load the data
           .fill('year', {'count': 0}) // Fill in missing years
		   .cumulative('count', 'totalCount', 2000) // Get the cumulative growth
		   .percentGrowth('totalCount', 'percentGrowth', false) // Get % growth
		   .data() // Return the data set

First, it loads the data into the d8ta object we are working with, so we can chain methods. Next, it fills in the missing years, giving the missing years a default value of count of 0. Next it adds a new variable to the data set called totalCount, which gives the total count of users. The 2000 gives the number of users that were there before we started counting new users. Next the percentage growth is calculated for each observation, and saved in the percentageGrowth property. Finally, the data set is returned as a normal object with the .data() method.

data

moment

@everything-registry/sub-chunk-3148 @infinitebrahmanuniverse/nolb-wil

11 years ago

11 years ago

11 years ago

11 years ago