0.0.6 • Published 2 years ago

@ashnazg/squirrelnado-difftable v0.0.6

Weekly downloads
-
License
MIT
Repository
-
Last release
2 years ago

Usage

const sqrl = require('@ashnazg/squirrelnado');
require('@ashnazg/squirrelnado-psql')(sqrl);
require('@ashnazg/squirrelnado-difftable')(sqrl);

const diff = sqrl.diff({
	control:    'postgres://known_good_server/db1',
	experiment: 'postgres://audit_this_server/db2',
	cache:      'postgres://localhost/tmpdb/some_schema',
	write:      'postgres://final_server/db3/reports/single_anomaly_table',

	tmp: process.env.HOME + '/sqrl-diff-cache' // all control/experiment tables looked at will be cached as $tmp/{ctl|exp}/schema/table/page-N.jsonl.gz
	// !! handle your own expiration of tables out of tmp !!
	// to avoid burning DB I/O, sqrl.diff assumes that its existing cache is still usable as long as a folder named $tmp/$mode/schema/table exists.
});


await diff('schema1.table1'); // write table1's anomalies to 'single_anomaly_table'
await diff('schema1'); // scan tables in control. for each table, do the above.

// you can also break the steps out and sequence all the reads at once. the above oneliner breaks out as:
await diff.snapshot('schema1.table1'); // cache both copies of the target(s) into $tmp
await diff.compare('schema1.table1'); // compare each pair, building up and then slicing down a copy of the data in $cache, leaving only the anomalies behind
await diff.write('schema1.table1'); // export that group of anomalies from local cahe to $write's 'single_anomaly_table'
await diff.close();

broken out, the only step that needs access to control/experiment is snapshot, and only write needs access to $write db.
compare always reads from $tmp, and nukes and replaces the state in cache. So you could run snapshots on Monday; delete "$tmp/exp" on Tuesday to force snapshot to re-pull only that
side, and run compare() on Wednesday.

variations

  • If experiment does not specify a table, scan schema3 for all tables/views, and look them up in control's schema2 (in this mode, a table name on control is ignored.)
  • if write does not specify a table, pick a report name name based on "control" * (The above "control" URL would put all final records in "db3/reports/schema2_table" if we're in table:table mode, or reports.schema2 if we're in schema:schema mode.)
  • you can compare two tables in one DB as well: just use the same URL for control and experiment, and then use 2 params on diff('experiment.table', 'reference.table') * when the URLs are the same, the reference copy will load from $tmp/ctl if it exist, else $tmp/exp

index columns

By default, diff only looks for a column named 'id'. If not usable as a unique index, it falls back on the whole raw row as identity.

You can pass in any array of strings as the last param to indicate valid candidaates:

  • diff('table', ['id', 'src_id'])
  • diff('table', 'ref_table', ['id', 'src_id'])
  • diff.compare('table', ['id', 'src_id'])
  • diff.compare('table', 'ref_table', ['id', 'src_id'])

And an empty list indicates not to use the default of 'id'.

0.0.5

2 years ago

0.0.6

2 years ago

0.0.4

3 years ago

0.0.3

3 years ago

0.0.2

3 years ago