9.0.46 • Published 3 months ago

@edirect/trace v9.0.46

Weekly downloads
-
License
ISC
Repository
-
Last release
3 months ago

Open Telemetry Trace

Background

Based on application logs and a sales checker service that triggers alerts on a slack channel, the current solution offers some insight into the application but falls short when it comes to deep understanding of exceptions and the general business flow.

Due to the sheer amount of logs and complex integrations between the application microservices, it is not uncommon to find scenarios where the development team can spend hours tracing the problem back to its origin.

  • Non-standard logs
    • The log format and content depends on implementation and log uniformity is a challenge by itself.
  • Logs might be outdated or absent
    • As services shift, evolve and so should logs. This can be a source for inconsistencies as logs may not be updated or can be wrongly removed during the feature implementation.
  • Logs are difficult to query and trace
    • Even with the help of tools such Elasticsearch, querying and tracing logs can be a difficult task due to complex interactions among different services in the application.
    • As an example, an error response from quote-engine but we only dispose of the policy number to find the root cause. The data provided is not enough for a complete investigation, which may lead to long research.
  • Not able to trace all exception occurrences
    • In the case shown by the previous item the error was not logged or at least not easily accessible.
    • Sales triggers are not flexible
    • Triggers are configured as code in the sales checker service. Any change to the logic must be implemented, tested and deployed.
    • Sales alert do not provide insight into root causes
    • The alert contains only the basic information necessary for an investigation to start. It contains the lead, partner, vertical and a brief description of the problem.
  • Limited business process visibility
    • As the alerts are based on sale statuses, we have limited insight into the process. With such a limited view we cannot verify which steps represent bottlenecks or are more prone to errors. Useful information for maintaining and improving the system.
  • No concise view of the business flow
    • Currently there are no dashboards or easily accessible views that can provide correlated information about the many steps that compose the Bolttech business flow.

Detailed Design

To fill the gaps that the current approach, the proposed solution must fulfill some requirements:

  • Requires minimal code and service changes
  • Must be flexible and configurable
  • Must capable of querying multiple data sources
  • Must be an evolutive tool that can help multiple types of users
  • Must have a dashboard that shows useful information about business flow

Taking the requirements in consideration, the proposed solutions employs an application monitoring tool named Open Telemetry, a microservice responsible for correlating the data captured by the tool and a comprehensive dashboard for the final user.

image

Usage

Sample application

NestJS Applications

NestJs applications should be upgraded to version 9.

  • Just the fact of upgrade the nest-app to version 9 will able the service to ship all logs to the service visibility

    As we applied the @edirect/trace to the nest-app, we dont need to have the trace part of code on nest-app applications with nest-app version 9

    In case of impossibility to upgrade, use the "Other" instructions using @edirect/trace directly

    Integration Service was not able to upgrade because it uses a react-ssr that does not have compatibility with nestjs 9, so, even as a nestjs application, we implemented the "Other" way

  • @edirect/nest-app:9

    1. Upgrade nest-app, all edirect dependencies and all nestjs dependencies

    2. We should initialize our application with otel as a requirement for node startup command:

          node --require ./node_modules/@edirect/trace/dist/otel.js dist/main

      (dist/main or your application entrypoint)

    3. Remove the old client initialization as we moved it to the node require

      If the service was using old trace version

      const config = require('config');
      const { services: { APM }, middlewares: traceMiddlewares } = require('@edirect/trace');
      const apmConfig = config.get('apm');
      
      if (apmConfig.enabled) {
          const apm = new APM(apmConfig.service, apmConfig.url, apmConfig.traceServer);
          apm.startTrace();
      }
    4. Remove the middleware usage

      If the service was using old trace version

      Remove it completely as we moved it to inside of @edirect/trace:

      if (apmConfig.enabled) {
          app.use(new traceMiddlewares.Body().body);
      }

NextJS Applications

NextJS applications should be upgraded to version 12 or above.

  1. Install the trace package:

    npm install --save @edirect/trace
  2. Create a middleware

    • When you upgrade the NextJS application to the version 12(or above) you will be able to use Middlewares.

      Create a file called as `middleware.ts(or .js)` inside of the `src` folder.
      
      ```ts
      import { NextResponse } from 'next/server';
      import type { NextRequest } from 'next/server';
      import NextJsTracer from '@edirect/trace/dist/middlewares/nextjs';
      
      // This function can be marked `async` if using `await` inside
      export function middleware(request: NextRequest) {
      // @edirect/trace
      new NextJsTracer().body(req as unknown as Request & Record<string, unknown>);
      NextResponse.next();
      }
      ```
  3. Install dotenv and create a dotenv.config

    npm install --save dotenv

    dotenv.config.js

    const dotenv = require('dotenv');
    const nodeEnv = process.env.NODE_ENV || 'development';
    
    dotenv.config({ path: `.${nodeEnv}.env` });
  4. Add the environment variables to your NextJS app: Example:.development.env

    ## OPEN TELEMETRY
    TRACE_TAG_OWNER=OWNER
    TRACE_TAG_SCOPE=ie
    TRACE_TAG_SERVICENAME=frontend-v2
    TRACE_TAG_TENANT={STAGE}-{INSTANCE}
    TRACE_SERVER_URL={URL}
    TRACE_TAG_SERVICE=frontend-v2-{STAGE}-{INSTANCE}
    TRACE_TAG_VERSION=1.0.0
    TRACE_TAG_CLUSTER=stag-ie
    TRACE_TAG_ENV=ie-stag-broker
  5. Serve your production build on the standalone mode. Add this line to your next.config.js file

    module.exports = {
        // ...myOtherSettings
        output: 'standalone'
    }
  6. Change the start and dev command:

    {
        "scripts": {
            "dev": "cross-env NODE_ENV=development node --require ./dotenv.config.js  --require ./node_modules/@edirect/trace/dist/otel.js ./node_modules/next/dist/bin/next -p 3100",
            "start": "cross-env NODE_ENV=production node --require ./dotenv.config.js --require ./node_modules/@edirect/trace/dist/otel.js ./build/client/standalone/server.js -p 3100"
        }
    }

Other (Not nest-app:9)

If you are unable to update to nest 9 or is not nest (express)

  • @edirect/trace

    The usage of @edirect/trace will need your care to inject our middleware to log all requests from the app

    1. Install the trace package

      npm install --save @edirect/trace
    2. Import the trace package

      import { middlewares as traceMiddlewares } from '@edirect/trace';
    3. Add the trace middleware to your application routes

      Express:

      app.use(new traceMiddlewares.Body().body);

      NestJs with nest-app:

      consumer
          .apply(new traceMiddlewares.Body().body)
          .forRoutes({path: '/**', method: RequestMethod.ALL});
    4. We should initialize our application with otel as a requirement for node startup command:

      node --require ./node_modules/@edirect/trace/dist/otel.js dist/main

      (dist/main or your application entrypoint)

    5. Remove the old client initialization as we moved it to the node require

      If the service was using old trace version

      const config = require('config');
      const { services: { APM }, middlewares: traceMiddlewares } = require('@edirect/trace');
      const apmConfig = config.get('apm');
      
      if (apmConfig.enabled) {
          const apm = new APM(apmConfig.service, apmConfig.url, apmConfig.traceServer);
          apm.startTrace();
      }
    6. Clean the middleware usage

      If the service was using old trace version

      Replace:

      if (apmConfig.enabled) {
          app.use(new traceMiddlewares.Body().body);
      }

      with:

      app.use(new traceMiddlewares.Body().body);

Applying k8s configurations (Required)

We created a standard way to deploy our application configurations and simplify its configurations:

  • Sometimes it could be already done by another team and you don't need to manage the configuration, just generate the configurations and apply it
  • bolttech-broker-asia\staging\config-map.yaml (or your environment config-map)

    auth: #or your service
        trace:
        tags:
            scope: ie #your tech center
            env: ie-stag-broker #current k8s cluster environment
            cluster: stag-ie #current k8s cluster
            tenant: ${namespace}
            service: ${name}-${namespace}
            servicename: ${name}
            version: "1.0.0"
            owner: OWNER
            server: opentelemetry-collector.stag.bolttechbroker.net #cluster opentelemetry server

Then you should generate and apply your brand new configurations:

  1. Generate

    This command will generate the new configuration files for the services.

    # Staging Clusters
    edi infra bolttech-broker-asia k8s generate <ENVIRONMENT> <SERVICE>
    edi infra bolttech-broker-asia k8s generate stage1-vnbroker rules-engine # VN Example
    
    # Production Clusters
    edi infra <PRODUCTION_BROKER> k8s generate <ENVIRONMENT> <SERVICE>
    edi infra bolttech-broker-asia-hk k8s generate live-hkbroker-a policy-issuing-service --prod # HK Example
  2. Apply

    This command will apply the new configurations and redeploy the service on the cluster.

    # Staging Clusters
    edi infra bolttech-broker-asia k8s apply staging <STAGE/RC> <SERVICE> --redeploy
    edi infra bolttech-broker-asia k8s apply staging rc-vnbroker frontend-service --redeploy # VN Example
    
    # Production Clusters    
    edi infra <PRODUCTION_BROKER> k8s apply <CLUSTER> <ENVIRONMENT> <SERVICE>
    edi infra bolttech-broker-asia-hk k8s apply cluster-a live-hkbroker-a plan-service --prod --redeploy # HK Example
  • Key notes when using the infrastructure repositories:
    • Before running any command make sure you have completed the setup of the edi-cli tool and that you have all the related projects (infrastructure and k8s-templates) inside the same root folder.
    • Make sure to have the most recent version of the master branch before running any commands.
    • Remember to ALWAYS COMMIT AND PUSH YOUR CHANGES TO THE REPOSITORIES, OTHERWISE THE NEW CONFIGURATIONS WILL NOT BE APPLIED TO SERVICES WHEN THEY ARE DEPLOYED USING THE JENKINS PIPELINES.
    • To make things easier, make sure to check the helper script on edi-cli project. Located at the edi-cli-cli folder.
    • Consider hiding the folders of the brokers you will not need to work with. This will greatly improve your navigation inside the project.

Debugging

  • How to debug it on vscode:

    As we don't debug using node command directly, we suggest you to use this in your .vscode/launch.json

    {
        "version": "0.2.0",
        "configurations": [
            {
                "type": "node",
                "request": "launch",
                "name": "Debug with OTEL",
                "args": ["${workspaceFolder}/src/main.ts"],
                "runtimeArgs": [
                    "--inspect",
                    "-r",
                    "./node_modules/@edirect/trace/dist/otel.js",
                    "-r",
                    "tsconfig-paths/register",
                    "-r",
                    "ts-node/register",
                ],
                "console": "integratedTerminal",
                "envFile": "${workspaceFolder}/.development.env"
            },
        ]
    }

    and append the OTEL environment variables to your local environments file:

    TRACE_TAG_OWNER=OWNER
    TRACE_TAG_SCOPE=pgw
    TRACE_TAG_SERVICENAME=payment-gateway
    TRACE_TAG_TENANT=local-paythbroker
    TRACE_SERVER_URL=opentelemetry-collector.stag.bolttechpay.net
    TRACE_TAG_SERVICE=payment-gateway-local-paythbroker
    TRACE_TAG_VERSION=1.0.0
    TRACE_TAG_CLUSTER=localhost
    TRACE_TAG_ENV=development
9.0.46

3 months ago

9.0.45

6 months ago

9.0.42

7 months ago

9.0.41

7 months ago

9.0.44

6 months ago

9.0.43

7 months ago

9.0.40

7 months ago

9.0.39

7 months ago

9.0.38

7 months ago

9.0.35

10 months ago

9.0.34

10 months ago

9.0.37

8 months ago

9.0.36

8 months ago

9.0.33

10 months ago

9.0.32

11 months ago

9.0.28

1 year ago

9.0.27

1 year ago

9.0.29

1 year ago

9.0.24

1 year ago

9.0.23

1 year ago

9.0.26

1 year ago

9.0.25

1 year ago

9.0.22

1 year ago

9.0.21

1 year ago

9.0.31

1 year ago

9.0.30

1 year ago

9.0.20

1 year ago

9.0.17

1 year ago

9.0.16

1 year ago

9.0.19

1 year ago

9.0.18

1 year ago

9.0.13

1 year ago

9.0.15

1 year ago

9.0.14

1 year ago

9.0.9

2 years ago

9.0.8

2 years ago

9.0.7

2 years ago

9.0.6

2 years ago

9.0.5

2 years ago

9.0.4

2 years ago

9.0.3

2 years ago

9.0.12

1 year ago

9.0.11

2 years ago

9.0.10

2 years ago

9.0.2

2 years ago

9.0.1

2 years ago

9.0.0

2 years ago

1.0.19

2 years ago

1.0.18

2 years ago

1.0.17

2 years ago

1.0.20

2 years ago

1.0.16

2 years ago

1.0.9

3 years ago

1.0.8

3 years ago

1.0.11

2 years ago

1.0.10

2 years ago

1.0.15

2 years ago

1.0.14

2 years ago

1.0.13

2 years ago

1.0.12

2 years ago

1.0.7

3 years ago

1.0.6

3 years ago

1.0.5

3 years ago

1.0.4

3 years ago

1.0.3

3 years ago

1.0.2

3 years ago

1.0.1

3 years ago