databridge v1.5.7
DataBridge
A framework for automated and programmatic data transfer. Separate source and destination modules allow for a high degree of customization and deployment-specific data handling.
Installation
See INSTALL.
Usage
Command-Line
Run bridge or batch. In project directory at command-line:
node app --help
npm start -- --helpRe-setup:
node setupCleanup any extraneous output files (make appropriate backups first). Takes -d flag for number of previous days' files to keep.
node clean
node clean -d 3
node clean -d 0Manage bind variables. Follow prompts:
node bindNode Module
npm install databridge --savevar databridge = require('databridge');The databridge module exposes a number of functions/objects. See the documentation at the beginning of each module for specific function reference.
{
setupConfig: [Function], // bin/config-setup
config: { // config.json (if exists)
dirs: {
batches: '{...}/batches/',
creds: '{...}/creds/',
destinations: '{...}/destinations/',
input: '{...}/input/',
logs: '{...}/logs/',
output: '{...}/output/',
sources: '{...}/sources/'
},
logto: 'console',
defaultBindVars: {},
schedule: '{...}/schedule.json'
},
bridge: [Function], // bin/bridge
batchParse: [Function], // bin/batch-parse
bridgeRunner: [Function], // bin/bridge-runner
list: {
src: [Function], // bin/list-src
dest: [Function], // bin/list-dest
tables: [Function], // bin/list-tables
batches: [Function] // bin/list-batches
}
}Data Types
Data type detection is handled by typeof() and a combination of
source column strings.
GPAanywhere or_DECat the end of the column name will be parsed asDECIMAL(8,2). (_DECwill be removed from the destination column name.)DATEorTIMESTAMPin the column name is parsed asDATE.- For all other types, the first row of data will be used.
- If
typeof(value) == 'number'parsed asINT. - Else parsed as
VARCHAR(255)
This behavior is run by bin/col-parser and should be customized for
particular situations especially involving large amounts of data.
Indexes
For SQL-based destinations, indexes are created for column names with the
trailing string _IND. This trailing string is removed from the destination
column name.
Running as a Service / PM2
Uses config.schedule file and can setup service. Requires pm2 installed globally npm install -g pm2.
# start pm2 service
npm run service-start
# restart pm2
npm run service-restart
# stop pm2
npm run service-stopIt is possible to run pm2 at startup.
Schedule / Batch Configuration
NOTE: When using the service, it cannot prompt for bind variables if needed. Therefore, any sources that require bind variables will throw an error if bind variables are not defined. Each job object in that case MUST have a
binds: trueor definedbinds: {...}attribute.
Option for truncate allows for truncating the table without
completely dropping and recreating (sql databases only). Use "truncate": true or -n.
Option for update allows for appending values to the table rather
than completely deleting table and recreating (databases only). Use "update": true or -u.
Run a custom script with "type": "script" and the name of the file (no extension) inside local/input/ under "name": "script".
Each schedule object requires cron attribute in the following format.
* * * * * *
┬ ┬ ┬ ┬ ┬ ┬
│ │ │ │ │ |
│ │ │ │ │ └ day of week (0 - 7) (0 or 7 is Sun)
│ │ │ │ └───── month (1 - 12)
│ │ │ └────────── day of month (1 - 31)
│ │ └─────────────── hour (0 - 23)
│ └──────────────────── minute (0 - 59)
└───────────────────────── second (0 - 59, optional)Example schedule.json file:
[{
"name": "test",
"type": "batch",
"cron": "*/30 * * * *"
}, {
"type": "bridge",
"name": "oracle employees.ferpa_certified => mssql",
"cron": "*/10 * * * *",
"binds": true,
"source": "oracle",
"destination": "mssql",
"table": "employees.ferpa_certified",
"truncate": true
}, {
"type": "bridge",
"name": "mssql surveys.population_open => csv",
"cron": "*/20 * * * *",
"binds": true,
"source": "mssql",
"table": "surveys.population_open",
"destination": "csv"
}, {
"type": "bridge",
"name": "xlsx employees.ferpa_certified => mssql",
"cron": "*/25 * * * *",
"binds": true,
"source": "xlsx",
"destination": "mssql",
"table": "employees.ferpa_certified",
"update": true
}, {
"type": "script",
"name": "name of script inside input directory, file extension",
"cron": "0 1 * * *"
}]Testing
Install Mocha globally:
npm install -g mochaAll tests:
npm testAll destinations or sources:
mocha spec/destinations
mocha spec/sourcesJust one destination/source:
mocha spec/destinations --one=mssql
mocha spec/sources --one=mysqlCustomizing sources / destinations
Note about require()
If you plan on using a separate input/output/source dir (as in when you ran npm install, you told databridge to make the "local" folder outside the main databridge folder), you'll need to either:
- Install necessary packages (like database connection packages) inside that local directory.
- Use something like
require.main.requireto include the database source or scripts from the main databridge directory.
Databridge will be requiring the source directory but it will try to require from a different path than the main databridge directory.
See this great GitHubGist article by branneman for more information and options.
Multiple similar sources (oracle => oracle)
One way to accomplish transfer from two separate databases of the same type is
to create a source or destination from one to the other that changes the
source name (and therefore the credentials/connection file). For example,
a custom oracle source could be added inside local/sources/newsource.js would
make databridge look for creds/newsource.js. Here's how to do that:
//include oracle source from bin/src/
module.exports = (opt, moduleCallback) => {
const oracle = require.main.require('./bin/src/oracle')
oracle(opt, (e, r, c) => {
moduleCallback(e, r, c)
})
}Source modules
Source modules are passed the config/opt object from the bridge.
Source modules MUST return:
nullor error.- Number of rows pulled from source.
- Array of column names from source.
Destination modules
Destination modules are passed config/opt object from the bridge
and the columns definitions generated by bin/col-parser.
Destination modules MUST return:
nullor error.- Number of rows written to destination.
- Array of column names at destination.
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago