raptor-compare v0.2.2
raptor-compare 
Compare sets of Raptor results and test for their statistical significance (t-test with 0.05 alpha).
$ raptor-compare my_test.ldjson
music.gaiamobile.org base: mean 1: mean 1: delta 1: p-value
--------------------- ---------- ------- -------- ----------
navigationLoaded 711 726 14 0.06
navigationInteractive 737 748 12 0.10
visuallyLoaded 1322 1217 -105 * 0.00
contentInteractive 1323 1217 -105 * 0.00
fullyLoaded 1462 1442 -20 0.14
uss 19.881 20.370 0.489 * 0.00
pss 23.468 23.981 0.513 * 0.00
rss 39.640 40.152 0.512 * 0.00In the example above, Raptor measurements for the Music app were stable for the
visuallyLoaded and contentInteractive events, as indicated by the asterisks
next to the p-values. At the same time, we can see that the memory footprint
has regressed: the mean uss usage is higher than the base measurement and the
difference is statistically significant as well.
For all measurements marked with the asterisk (*) it is valid to assume that
the means are indeed significantly different between the base and the try runs.
The remaining results, e.g. the 20 ms fullyLoaded speed-up, are not
significant and might be caused by a random instability of the data. Try
increasing the sample size (via Raptor's --runs option; see below) and run
Raptor again.
What is p-value?
The p-value is a concept used in statistical testing which represents our willingness to make mistakes about the data. A low p-value means that there's only a small risk of making a mistake by concluding that the test data indicates that the means are truly different and that the observed differences are not due to poor sampling and randomness.
For the data above, a p-value of 0.14 for fullyLoaded means that the risk of
being wrong is 14% when we conclude that the 20 ms difference between the means
is due to an actual code change and not to randomness.
Good p-values are below 0.05.
Installation
npm install -g raptor-compareRunning Raptor tests
(For best results, follow the Raptor guide on MDN.)
Install Raptor with:
$ sudo npm install -g @mozilla/raptorConnect your device to the computer, go into you Gaia directory and build Gaia:
$ make raptorThen, run the desired perf test:
$ raptor test coldlaunch --runs 30 --app music --metrics my_test.ldjsonRaptor will print the output to stdout. The raw data will be saved in the
ldjson file specified in the --metrics option. The data is appended so you
can runmultiple tests for different revisions and apps and raptor-compare
will figure out how to handle it. All testing is conducted relative to the
first result set for the given app.
API
You can also use raptor-compare programmatically. It exposes three functions
for working with Raptor data: read reads in a LDJSON stream with the raw
metrics data, parse aggregates the data into a Map and build creates the
comparison tables with p-values for significance testing.
// Needed for Node.js 0.10 and 0.12.
require('babel/polyfill');
const fs = require('fs');
const compare = require('raptor-compare');
compare.read(fs.createReadStream(filename))
.then(compare.parse)
.then(compare.build)
.then(tables => tables.forEach(
table => console.log(table.toString())))
.catch(console.error);