1.1.3 • Published 8 months ago

insta-integration v1.1.3

Weekly downloads
-
License
MIT
Repository
github
Last release
8 months ago

insta-integration - Integration Testing

Automated integration tests for any application/job.

  • Spin up any external services
  • Generate production-like data
  • Run data validations to ensure application/job works as expected

Problems it can help with:

  • Unreliable test environments
  • Dependencies on other teams
  • Simulate complex data flows

Usage

Get Token

To use this GitHub Action, you need to get a username and token from data-catering.

CLI

  1. Install via npm install -g insta-integration
  2. Create YAML file insta-integration.yaml to define your integration tests

    1. Examples can be found here.
    2. Use JSON schema to help guide you on available options
  3. Run insta-integration

GitHub Action

  1. Create YAML file .github/workflows/integration-test.yaml

    name: Integration Test
    on:
      push:
        branches:
          - *
    jobs:
      integration-test:
        name: Integration Test
        runs-on: ubuntu-latest
        steps:
          - name: Run integration tests
            uses: data-catering/insta-integration@v2
            with:
              data_caterer_user: ${{ secrets.DATA_CATERER_USER }}
              data_caterer_token: ${{ secrets.DATA_CATERER_TOKEN }}
  2. Create YAML file insta-integration.yaml to define your integration tests

    1. Examples can be found here.
    2. Use JSON schema to help guide you on available options
  3. Push your code and the GitHub Action will run

Services

The following services are available to run alongside your application/job.

Service TypeServiceSupported
Change Data Capturedebezium
Databasecassandra
Databasecockroachdb
Databaseelasticsearch
Databasemariadb
Databasemongodb
Databasemssql
Databasemysql
Databaseneo4j
Databasepostgres
Databasespanner
Databasesqlite
Databaseopensearch
Data Catalogmarquez
Data Catalogunitycatalog
Data Catalogamundsen
Data Catalogdatahub
Data Catalogopenmetadata
Distributed Coordinationzookeeper
Distributed Data Processingflink
HTTPhttpbin
Identity Managementkeycloak
Job Orchestratorairflow
Job Orchestratordagster
Job Orchestratormage-ai
Job Orchestratorprefect
Messagingactivemq
Messagingkafka
Messagingrabbitmq
Messagingsolace
Notebookjupyter
Object Storageminio
Query Engineduckdb
Query Engineflight-sql
Query Enginepresto
Query Enginetrino
Real-time OLAPclickhouse
Real-time OLAPdoris
Real-time OLAPdruid
Real-time OLAPpinot
Test Data Managementdata-caterer
Workflowtemporal

Generation and Validation

Since it uses data-caterer behind the scenes to help with data generation and validation, check the following pages for discovering what options are available.

Data Sources

The following data sources are available to generate/validate data.

Data Source TypeData SourceSupport
Cloud StorageAWS S3
Cloud StorageAzure Blob Storage
Cloud StorageGCP Cloud Storage
DatabaseCassandra
DatabaseMySQL
DatabasePostgres
DatabaseElasticsearch
DatabaseMongoDB
DatabaseOpensearch
FileCSV
FileDelta Lake
FileJSON
FileIceberg
FileORC
FileParquet
FileHudi
HTTPREST API
MessagingKafka
MessagingSolace
MessagingActiveMQ
MessagingPulsar
MessagingRabbitMQ
MetadataData Contract CLI
MetadataGreat Expectations
MetadataMarquez
MetadataOpenAPI/Swagger
MetadataOpenMetadata
MetadataOpen Data Contract Standard (ODCS)
MetadataAmundsen
MetadataDatahub
MetadataSolace Event Portal

Examples

Simple Example
services: []
run:
  - command: ./my-app/run-app.sh
    test:
      generation:
        parquet:
          - options:
              path: /tmp/parquet/accounts
            fields:
              - name: account_id
      validation:
        parquet:
          - options:
              path: /tmp/parquet/accounts
            validations:
              - expr: ISNOTNULL(account_id)
              - aggType: count
                aggExpr: count == 1000
Full Example
services:
  - name: postgres #define external services
    data: my-data/sql #initial service setup (i.e. schema/tables, topics, queues)
run:
  - command: ./my-app/run-postgres-extract-app.sh #how to run your application/job
    env: #environment variables for your application/job
      POSTGRES_URL: jdbc:postgresql://postgres:5432/docker
    test:
      env: #environment variables for data generation/validation
        POSTGRES_URL: jdbc:postgresql://postgres:5432/docker
      mount: #volume mount for data validation
        - ${PWD}/example/my-app/shared/generated:/opt/app/shared/generated
      relationship: #generate data with same values used across different data sources
        postgres_balance.account_number: #ensure account_number in balance table exists when transaction created
          - postgres_transaction.account_number
      generation: #define data sources for data generation
        postgres:
          - name: postgres_transaction #give it a name to use in relationship definition
            options: #configuration on specific data source
              dbtable: account.transactions
            count: #how many records to generate (1,000 by default)
              perField: #generate 5 records per account_number
                fieldNames: [account_number]
                count: 5
            fields: #fields of the data source
              - name: account_number #default data type is string
              - name: create_time
                type: timestamp
              - name: transaction_id
              - name: amount
                type: double
          - name: postgres_balance
            options:
              dbtable: account.balances
            fields:
              - name: account_number
                options: #additional metadata for data generation
                  isUnique: true
                  regex: ACC[0-9]{10}
              - name: create_time
                type: timestamp
              - name: account_status
                options:
                  oneOf: [open, closed]
              - name: balance
                type: double
      validation:
        csv: #define data source for data validations
          - options:
              path: /opt/app/shared/generated/balances.csv
              header: true
            validations: #list of validation to run, can be basic SQL, aggregations, upstream data source or column name validations
              - expr: ISNOTNULL(account_number)
              - aggType: count
                aggExpr: count == 1000
          - options:
              path: /opt/app/shared/generated/transactions.csv
              header: true
            validations:
              - expr: ISNOTNULL(account_number)
              - aggType: count
                aggExpr: count == 5000
              - groupByCols: [account_number]
                aggType: count
                aggExpr: count == 5

GitHub Action Options

Input

Optional configurations to alter the files and folders used by the GitHub Action can be found below.

NameDescriptionDefault
configuration_fileFile path to configuration fileinsta-integration.yaml
insta_infra_folderFolder path to insta-infra (this repository)${HOME}/.insta-integration/insta-infra
base_folderFolder path to use for execution files${HOME}/.insta-integration
data_caterer_versionVersion of data-caterer Docker image0.15.2
data_caterer_userUser for data-caterer. If you don't have one yet, create one here
data_caterer_tokenToken for data-caterer. If you don't have one yet, create one here

To use these configurations, alter your .github/workflows/integration-test.yaml.

name: Integration Test
on:
  push:
    branches:
      - *
jobs:
  integration-test:
    name: Integration Test
    runs-on: ubuntu-latest
    steps:
      - name: Run integration tests
        uses: data-catering/insta-integration@v1
        with:
          configuration_file: my/custom/folder/insta-integration.yaml
          insta_infra_folder: insta-infra/folder
          base_folder: execution/folder
          data_caterer_version: 0.15.2
          data_caterer_user: ${{ secrets.DATA_CATERER_USER }}
          data_caterer_token: ${{ secrets.DATA_CATERER_TOKEN }}

Output

If you want to use the output of the GitHub Action, the following attributes are available:

NameDescription
num_records_generatedTotal number of records generated.
num_success_validationsTotal number of successful validations.
num_failed_validationsTotal number of failed validations.
num_validationsTotal number of validations.
validation_success_rateSuccess rate of validations (i.e. 0.75 = 75% success rate).
full_resultAll result details as JSON (data generation and validation).

For example, you can print out the results like below:

- name: Run integration tests
  id: test-action
  uses: data-catering/insta-integration@v1
- name: Print Output
  id: output
  run: |
    echo "Records generated:         ${{ steps.test-action.outputs.num_records_generated }}"
    echo "Successful validations:    ${{ steps.test-action.outputs.num_success_validations }}"
    echo "Failed validations:        ${{ steps.test-action.outputs.num_failed_validations }}"
    echo "Number of validations:     ${{ steps.test-action.outputs.num_validations }}"
    echo "Validation success rate:   ${{ steps.test-action.outputs.validation_success_rate }}"

JSON Schema for insta-integration.yaml

A JSON Schema has been created to help guide users on what is possible in the insta-integration.yaml. The links below show how you can import the schema in your favourite IDE:

Validate JSON Schema

Using the following tool ajv.

Validate the JSON Schema:

ajv compile --spec=draft2019 -s schema/insta-integration-config-latest.json

Validate a insta-integration.yaml file:

ajv validate --spec=draft2019 -s schema/insta-integration-config-latest.json -d example/postgres-to-csv.yaml

Example Flows

Examples can be found here.

1.1.1

9 months ago

1.1.3

8 months ago

1.1.2

9 months ago

1.0.11

11 months ago

1.0.14

11 months ago

1.0.13

11 months ago

1.0.12

11 months ago

1.0.10

1 year ago

1.0.9

1 year ago

1.0.8

1 year ago

1.0.6

1 year ago

1.0.5

1 year ago

1.0.4

1 year ago

1.0.3

1 year ago

1.0.2

1 year ago

1.0.1

1 year ago

1.0.0

1 year ago