0.3.31-alpha.1 • Published 2 years ago

@nqminds/nqm-databot-host v0.3.31-alpha.1

Weekly downloads
30
License
ISC
Repository
github
Last release
2 years ago

nqm-databot-host

install

You can use local or global installation. If you intend to daemonize the host (e.g. using pm2 or forever) then a local install is advisable. Local installs also provide a more reliable update mechanism, and allow multiple versions of the host to be run on a given machine.

For local install, create a TDX platform folder if you don't already have one:

mkdir /tdx-platform
cd /tdx-platform
npm init (use ENTER to accept all defaults)

Then install the latest version of the host:

npm install --save @nqminds/nqm-databot-host

Or install a given version:

npm install --save @nqminds/nqm-databot-host@0.3.2

Global install:

npm install -g @nqminds/nqm-databot-host

configure

Edit (or copy) the config.json file, found at e.g. /tdx-platform/node_modules/@nqminds/nqm-databot-host.

Verify that tdxServer property points to your TDX. If your TDX uses a non-standard naming convention, you can also specify the individual service endpoints via commandServer, databotServer, queryServer and sshServer.

Enter your databot host id and secret (as created in the toolbox) into the credentials property of the config file.

Other optional configuration properties:

  • autoStart - see the offline databots section below.
  • databotStorePath - instructs the databot host where to store databot packages it runs. This is (along with fileStorePath) is useful for sandboxing databots within the local file system. If this property is not specified in the config file it defaults to the node_modules/@nqminds/nqm-databot-store path relative to the installed root of the host. Note this path must be an instance nqm-databot-store.
  • fileStorePath - instructs the databot host where to store temporary files created by databots. If not specified it defaults to the nqm-databot-file-store path relative to the installed root of the host.
  • sshTunnelPort - optionally configure the host to establish an ssh tunnel to the TDX proxy service. Set this to true to use the default ssh server port, or override with an explicit port value, e.g. 4199 - contact your tdx administrator for full details.

run

single instance mode

To run a single instance:

/tdx-platform/node_modules/.bin/nqm-databot-host --config ./config.json

master mode

You can also run the host in 'master' mode, specifying the number of worker instances to run using the poolSize argument.

/tdx-platform/node_modules/.bin/nqm-databot-host --config ./config.json --master --poolSize 5

daemonize

The recommended process manager is pm2.

pm2 start /tdx-platform/node_modules/.bin/nqm-databot-host -- --config ./config.json

You can name the daemonized instance using e.g.

pm2 start /tdx-platform/node_modules/.bin/nqm-databot-host --name worker-databot -- --config ./config.json

debug

The databot host supports two modes of debugging your databots.

The first involves running the databot with your databot host with "debugMode": true in the config.json file. The databot is scheduled as usual through the toolbox (or via the API) and the databot host will display a message and pause before running the databot. At this point you can attach your debugger to the databot process and begin debugging.

The second mode involves running your databot in a local mode, i.e. not from a databot host, without making any modifications to your databot code.

debugging through databot host

This mode of debugging works out-of-the-box for Visual Studio Code, but should be trivial to support in other debuggers/IDEs.

Run the host using the --debugBreak option.

/tdx-platform/node_modules/.bin/nqm-databot-host --config ./my-config.json --debugBreak

The databot host will start and enter the idle state waiting for a databot instance to be assigned. Using the toolbox, create a databot or select an existing databot, and make sure you grant permisson for your host to execute it. Then run the databot using the toolbox.

After a short delay you should see the host receive the databot instance run request, and it will then proceed to install the databot. Once installed, the host will print a message to the console similar to the following, and then pause:

nqm-databot-host:DatabotHost piping input to child: ... +6ms
nqm-databot-host:CHILD-DEBUG ****************************** +40ms
nqm-databot-host:CHILD-DEBUG *                            * +0ms
nqm-databot-host:CHILD-DEBUG * Debugger listening on 5858 * +0ms
nqm-databot-host:CHILD-DEBUG *                            * +0ms
nqm-databot-host:CHILD-DEBUG ****************************** +0ms
nqm-databot-host:CHILD-DEBUG Sat, 22 Oct 2016 18:49:16 GMT nqm-databot reading input +195ms
nqm-databot-host:CHILD-DEBUG Sat, 22 Oct 2016 18:49:16 GMT nqm-databot received input 

The host will now wait until a debugger attaches to process 5858.

Visual Studio Code

Run an instance of Visual Code and choose the debug tab. Create a new 'Attach to process' configuration using the launch configuration drop-down. Click the run button, or hit F5. The debugger should start and immediately break at a debugger statement right before the databot entry point.

You can now step into your databot code.

debugging in local mode

To do this set the environment variable DATABOT_DEBUG=1.

The context for your databot instance will be read from a file named debug-input.js in the current working directory. If this file does not exist, an empty context will be used.

examples

The following examples assume your databot source is in a folder /path/to/databot with index.js as the main script file.

The examples show running the databot using node on the command line, but of course you would probably run it through your IDE.

Running with no context

This scenario is probably not very useful as there is no way to send input to the databot.

/path/to/databot>DATABOT_DEBUG=1 node index.js

Running with context

Create a file in your databot folder called debug-input.js and enter some context data. The primary type of context data will be inputs which is the dictionary of inputs that will be sent to the databot entry point.

/path/to/databot>nano debug-input.js
module.exports = {
  definitionVersion: 1,
  inputs: {
    mode: "update-v1-4",
    projectsDataset: "Z1bmaGQ-prj",
    projectId: "SyIgZNDhM",
  },
  packageParams: {},
  tdxServer: "http://tdx.nqm-1.com",
  queryServer: "http://q.nqm-1.com",
  commandServer: "http://cmd.nqm-1.com",
  shareKeyId: "HJeo92d3hf",
  shareKeySecret: "letmein",
};

Then run the databot main script:

/path/to/databot>DATABOT_DEBUG=1 node index.js

Other useful context properties are:

module.exports = {
  definitionVersion: 1,                // omit if you want to use legacy nqm-api-tdx
  inputs: {},                          // simulate inputs
  packageParams: {},                   // simulate package parameters
  fileStorePath: "",                   // set the file store path
  tdxHost: "https://tdx.nqminds.com",  // used to initialise tdxApi
  shareKeyId: "",                      // share key id for authentication
  shareKeySecret: "",                  // share key secret
}

The fileStorePath specifies the folder where databot output created via output.getFileStorePath or output.generateFileStorePath will be placed. By default this will be a folder named debug-output in the working directory.

offline databots

It is possible to configure the host to start a databot on boot, even if the host is offline.

n.b. use of this feature is not recommended. It is intended for hosts that need to start a databot while offline. Do not use it for general purpose hosts as it breaks the intended databot architecture pattern. All standard hosts will continue to run a databot if the network connection is interrupted, and they will sync successfully on re-connection. If you have a databot instance that needs to always be running, simply set the always running flag when starting the instance via the toolbox and the TDX will schedule it accordingly and make sure it is always running on any eligible host.

To set up an offline databot requires:

1 - the databot package needs to be installed in the databot library of the host. The best way to do this is to schedule the databot to run on the host via the TDX when the host is online. The databot package will then be cached in the databot library. Alternatively it's possible to manually copy the package to the databot library, or copy the package from a databot library of another host that has already cached it.

2 - the instance definition needs to be specified in the databot host config file, under the autoStart section. Again, the best way to do this is to run the instance via the TDX and then copy the instance definition from the toolbox input modal (n.b. the instance definition must have a unique id property).

Configuring autoStart databots

Databots that should start on boot are configured in the autoStart section of the host config file. This section is an array of definitions of instances that should be started when the host starts. Note that a slave host is required for each instance listed in the autoStart section, for example if there are 3 instances defined then the host should be started in master mode with a poolSize of at least 3.

Below is an example of a config file containing a single auto-start databot instance:

{
  "tdxServer": "https://tdx.nq-m.com",
  "credentials": "dkjfdJDK:letmein",
  "debugMode": false,
  "autoStart": [
    {
      "databotId": "rklWGtwsib",
      "databotVersion": "5",
      "id": "ryg47oR2hb",
      "inputs": {
        "message": "foobar!"
      },
      "name": "auto start example",
      "shareKeyId": "LOjkdjiD",
      "shareKeySecret": "letmein",
      "schedule": {
        "always": true,
        "cron": ""
      }
    }
  ]
}

The example above shows a single databot configured to auto-start on the host. The host will start this databot when it boots, and the schedule.always property indicates that the databot should always be running. This means that if the databot instance were to finish (without error), the host would start it again immediately. If you just need the databot host to run once on boot set schedule.always to false.

The shareKeyId and shareKeySecret must be valid credentials for a TDX share key. These credentials are used to sync the instance status with the TDX once a network connection is made.

Databot library structure

The databot library is stored under the folder specified by the databotStorePath property in the host config (see above). For example, if your databotStorePath is specified as /path/to/databotStore then the databot library folder will be created at /path/to/databotStore/databots. Each databot that the host runs will be cached in a sub-folder with a name taken from the databot id. Within each databot folder, a series of sub-folders will be created matching the version number of the databot. For example, if a databot with id rklWGtwsib and version number 4 is run, the following illustrates the folder structure of the library:

/path/to/databotStore/databots
    |
    -- rklWGtwsib
        |
        -- 4

To manually install a databot package in the library, create the folder structure matching the databot id and version, and then install the package in that folder. You may need to create or tweak the nqm.lib.json file to reflect the library path.

server databots

You can expose a web service from your databot and the TDX will set up a unique URL and proxy requests to your databot server. To accomplish this, you should notify the host of the port your server is listening on using the output.setProxyPort method.

The following example demonstrates how to set up a basic nodejs server.

function databot(input, output, context) {
  const http = require("http");

  // Create the server.
  const server = http.createServer((req, res) => {
    // TODO - place your routing and responses here.
    res.statusCode = 200;
    res.setHeader("Content-Type", "text/html");
    res.write("<html><body style=\"background-color: lime\"><div>hello world</div></body></html>");
  });

  // Use an input supplied-value with a fallback default.
  let port = input.serverPort || 2323;

  // Get notification that the server is listening successfully.
  server.on("listening", () => {
    output.debug("setting proxy port to %d", server.address().port);
    output.setProxyPort(server.address().port);
  });

  // Intercept server errors and try a different port if it is already in use.
  server.on("error", (err) => {
    if (err.code === "EADDRINUSE") {
      server.close();
      // Increment the port number and try again.
      setTimeout(() => {
        port++;
        server.listen(port);
      }, 0);
    } else {
      output.abort("failed to start server [%s]", err.message);
    }
  });

  // Start listening
  server.listen(port);
}

host to TDX protocol

The databot host communicates with the TDX via the standard client API. The api must be authenticated using a TDX databot host ID and secret, which is usually specified in the credentials property of the configuration file.

In the current implementation, the databot host effectively pulls commands from the TDX rather than the TDX pushing commands to the host. There are several reasons for this approach, one of which is the plan to implement a browser version of the host. The command routing is implemented by the TDX response to the status update command (see below).

On startup a databot host must register with the TDX. This notifies the TDX where the host is running and that it is eligible to receive commands.

Once registered, the databot host periodically sends the TDX status information via the updateDatabotHostStatus API. The TDX will respond to this status update with any commands that are pending for the host. This is how the TDX to databot host communication is achieved.

The status update interval is configurable via the idleTickInterval and runningTickInterval configuration options. This allows the host to send updates more frequently when it is running a databot, and revert to a less frequent update when idle (or vice versa). The default idle tick interval is 15 seconds, the default running tick interval is 5 seconds.

host registration

Enables a host to register with the TDX, making it eligible to receive requests to run a databot instance.

This is available on the registerDatabotHost api.

The raw HTTP endpoint is a POST method to:

https://databot.acmeTDX.com/host/register

host status update

Used by a host to notify the TDX of status. This also serves as the host command router, in that the response from the TDX is passed to the command processor which will action any commands accordingly (see TDX command format below).

This is available on the updateDatabotHostStatus api.

The raw HTTP endpoint is a POST method to:

https://databot.acmeTDX.com/host/status

write instance output

This enables databot hosts to notify the TDX of databot output. When a databot writes output it is cached by the host and sent to the TDX when the databot completes.

This is available via the writeDatabotHostInstanceOutput api.

The raw HTTP endpoint is a POST method to:

https://databot.acmeTDX.com/host/output

TDX command format

The databot host implements a simple command processor. This can support any transport, currently it is invoked via the response to a host status update.

The generic format of the command object is shown below.

{
  commandId: {string} - a unique id for this command
  command: {string} - the command name
  payload: {object} - the command payload
}

There are currently 4 supported commands, runInstance, stopInstance, stopHost, updateHost.

run instance command

This command is sent by the TDX to a databot host as a response to an idle status update. The format of the message is shown below.

{
  commandId: "KD9dk-dZ",          // Unique id of the command
  command: "runInstance",         // The command name.
  payload: {                      // The command payload.
    databotInstance: {            // Details about the instance to run.
      id: "iOF98d-",              // The id of this databot instance.
      inputs: {                   // Any inputs specified when the instance was started.
        someInputParameter: 343,
        anotherInputParameter: {foo: "bar"}
      },
      chunks: 4,                  // The number of 'chunks' to run for this instance.
      name: "my-app",             // The name given to the instance when it was started.
      shareKeyId: "accessGEO",    // The id of a share key that the instance can use.
      shareKeySecret: "letmein",  // The password for the share key.
      authTokenTTL: 3600,         // The TTL for the generated share key token.
      databotId: "IdkE83-",       // The id of the databot definition.
      databotVersion: "0.3.1",    // The version of the databot definition.
      debugMode: false,           // Flag indicating debug mode.
    },
    instanceProcess: {            // Details about the chunk (process) to run.
      id: "KLKidII",              // The unique process id.
      chunk: 1                    // The chunk number to run.
    }
  }
}

When a databot instance is started by the end-user they may indicate that it should be distributed across databot hosts, i.e. the processing should be split into 'chunks'. In this case the command.payload.databotInstance.chunks property will contain the total number of chunks specified and the command.payload.instanceProcess.chunk will indicate the chunk number that this host should run. How this distribution information is interpreted is down to the databot itself.

stop instance command

This command can be sent by the TDX to a databot host in response to a busy status update, informing the host that it should terminate the instance it is currently running.

{
  commandId: "RIkd34pz",
  command: "stopInstance",
  payload: {
    mode: "pause" | "resume" | "stop"
  }
}

stop host command

This command can be sent by the TDX to a databot host as a response to any status update, instructing the host to exit.

{
  commandId: "RIkd34pz",
  command: "stopInstance",
  payload: {
    mode: "stop"
  }
}

update host command

This command is sent by the TDX in response to any status update if the host software version is out of date with respect to that expected by the TDX.

{
  commandId: "RIkd34pz",
  command: "updateHost",
}
0.3.31-alpha.0

2 years ago

0.3.31-alpha.1

2 years ago

0.3.30

4 years ago

0.3.29

5 years ago

0.3.27

5 years ago

0.3.26

5 years ago

0.3.25

5 years ago

0.3.24

5 years ago

0.3.23

5 years ago

0.3.22

5 years ago

0.3.21

5 years ago

0.3.20

5 years ago

0.3.19

5 years ago

0.3.17

6 years ago

0.3.16

6 years ago

0.3.15

6 years ago

0.3.14

6 years ago

0.3.13

6 years ago

0.3.12

6 years ago

0.3.11

6 years ago

0.3.10

6 years ago

0.3.9

6 years ago

0.3.8

6 years ago

0.3.7

6 years ago

0.3.6

6 years ago

0.3.5

7 years ago

0.3.4

7 years ago

0.3.3

7 years ago

0.3.2

7 years ago

0.2.10

7 years ago

0.2.9

7 years ago

0.2.8

7 years ago

0.3.1

7 years ago

0.2.6

7 years ago

0.2.5

7 years ago

0.2.4

7 years ago

0.2.3

7 years ago

0.2.2

7 years ago

0.2.1

7 years ago

0.2.0

7 years ago