2.74.0 • Published 9 months ago

@aws-solutions-constructs/aws-kinesisstreams-gluejob v2.74.0

Weekly downloads
119
License
Apache-2.0
Repository
github
Last release
9 months ago

aws-kinesisstreams-gluejob module


Stability: Experimental

All classes are under active development and subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.


Reference Documentation:https://docs.aws.amazon.com/solutions/latest/constructs/
LanguagePackage
Python Logo Pythonaws_solutions_constructs.aws_kinesis_streams_gluejob
Typescript Logo Typescript@aws-solutions-constructs/aws-kinesisstreams-gluejob
Java Logo Javasoftware.amazon.awsconstructs.services.kinesisstreamsgluejob

Overview

This AWS Solutions Construct deploys a Kinesis Stream and configures a AWS Glue Job to perform custom ETL transformation with the appropriate resources/properties for interaction and security. It also creates an S3 bucket where the python script for the AWS Glue Job can be uploaded.

Here is a minimal deployable pattern definition:

Typescript

import * as glue from "@aws-cdk/aws-glue";
import * as s3assets from "@aws-cdk/aws-s3-assets";
import { KinesisstreamsToGluejob } from "@aws-solutions-constructs/aws-kinesisstreams-gluejob";

const fieldSchema: glue.CfnTable.ColumnProperty[] = [
  {
    name: "id",
    type: "int",
    comment: "Identifier for the record",
  },
  {
    name: "name",
    type: "string",
    comment: "Name for the record",
  },
  {
    name: "address",
    type: "string",
    comment: "Address for the record",
  },
  {
    name: "value",
    type: "int",
    comment: "Value for the record",
  },
];

const customEtlJob = new KinesisstreamsToGluejob(this, "CustomETL", {
  glueJobProps: {
    command: {
      name: "gluestreaming",
      pythonVersion: "3",
    },
  },
  fieldSchema: fieldSchema,
  etlCodeAsset: new s3assets.Asset(this, "ScriptLocation", {
    path: `${__dirname}/../etl/transform.py`,
  }),
});

Pattern Construct Props

NameTypeDescription
existingStreamObj?kinesis.StreamExisting instance of Kinesis Stream, providing both this and kinesisStreamProps will cause an error.
kinesisStreamProps?kinesis.StreamPropsOptional user-provided props to override the default props for the Kinesis stream.
glueJobProps?cfnJob.CfnJobPropsUser provided props to override the default props for the AWS Glue Job.
existingGlueJob?cfnJob.CfnJobExisting instance of AWS Glue Job, providing both this and glueJobProps will cause an error.
fieldSchema?CfnTable.ColumnProperty[]User provided schema structure to create an AWS Glue Table.
existingTable?CfnTableExisting instance of AWS Glue Table. If this is set, tableProps and fieldSchema are ignored.
tableProps?CfnTablePropsUser provided AWS Glue Table props to override default props used to create a Glue Table.
existingDatabase?CfnDatabaseExisting instance of AWS Glue Database. If this is set, then databaseProps is ignored.
databaseProps?CfnDatabasePropsUser provided Glue Database Props to override the default props used to create the Glue Database.
outputDataStore?SinkDataStorePropsUser provided properties for S3 bucket that stores Glue Job output. Current datastore types supported is only S3.
createCloudWatchAlarms?booleanWhether to create recommended CloudWatch alarms for Kinesis Data Stream. Default value is set to true.
etlCodeAsset?s3assets.AssetUser provided instance of the Asset class that represents the ETL code on the local filesystem

SinkDataStoreProps

NameTypeDescription
existingS3OutputBucket?BucketExisting instance of S3 bucket where the data should be written. Providing both this and outputBucketProps will cause an error.
outputBucketPropsBucketPropsUser provided bucket properties to create the S3 bucket to store the output from the AWS Glue Job.
datastoreTypeSinkStoreTypeSink data store type.

SinkStoreType

Enumeration of data store types that could include S3, DynamoDB, DocumentDB, RDS or Redshift. Current construct implementation only supports S3, but potential to add other output types in the future.

NameTypeDescription
S3stringS3 storage type

Pattern Properties

NameTypeDescription
kinesisStreamkinesis.StreamReturns an instance of the Kinesis stream created or used by the pattern.
glueJobCfnJobReturns an instance of AWS Glue Job created by the construct.
glueJobRoleiam.RoleReturns an instance of the IAM Role created by the construct for the Glue Job.
databaseCfnDatabaseReturns an instance of AWS Glue Database created by the construct.
tableCfnTableReturns an instance of the AWS Glue Table created by the construct
outputBucket?s3.BucketReturns an instance of the output bucket created by the construct for the AWS Glue Job.
cloudwatchAlarms?cloudwatch.Alarm[]Returns an array of recommended CloudWatch Alarms created by the construct for Kinesis Data stream.

Default settings

Out of the box implementation of the Construct without any override will set the following defaults:

Amazon Kinesis Stream

  • Configure least privilege access IAM role for Kinesis Stream
  • Enable server-side encryption for Kinesis Stream using AWS Managed KMS Key
  • Deploy best practices CloudWatch Alarms for the Kinesis Stream

Glue Job

  • Create a Glue Security Config that configures encryption for CloudWatch, Job Bookmarks, and S3. CloudWatch and Job Bookmarks are encrypted using AWS Managed KMS Key created for AWS Glue Service. The S3 bucket is configured with SSE-S3 encryption mode
  • Configure service role policies that allow AWS Glue to read from Kinesis Data Streams

Glue Database

  • Create an AWS Glue database. An AWS Glue Table will be added to the database. This table defines the schema for the records buffered in the Amazon Kinesis Data Streams

Glue Table

  • Create an AWS Glue table. The table schema definition is based on the JSON structure of the records buffered in the Amazon Kinesis Data Streams

IAM Role

  • A job execution role that has privileges to 1) read the ETL script from the S3 bucket location, 2) read records from the Kinesis Stream, and 3) execute the Glue Job

Output S3 Bucket

  • An S3 bucket to store the output of the ETL transformation. This bucket will be passed as an argument to the created glue job so that it can be used in the ETL script to write data into it

Cloudwatch Alarms

  • A CloudWatch Alarm to report when consumer application is reading data slower than expected
  • A CloudWatch Alarm to report when consumer record processing is falling behind (to avoid risk of data loss due to record expiration)

Architecture

Architecture Diagram

Reference Implementation

A sample use case which uses this pattern is available under use_cases/aws-custom-glue-etl.

© Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

2.74.0

9 months ago

2.73.0

9 months ago

2.72.0

9 months ago

2.71.0

10 months ago

2.69.0

10 months ago

2.68.0

11 months ago

2.67.0

11 months ago

2.67.1

11 months ago

2.70.0

10 months ago

2.64.0

11 months ago

2.60.0

1 year ago

2.63.0

12 months ago

2.59.0

1 year ago

2.62.0

12 months ago

2.58.0

1 year ago

2.58.1

1 year ago

2.65.0

11 months ago

2.61.0

1 year ago

2.57.0

1 year ago

2.56.0

1 year ago

2.55.0

1 year ago

2.54.1

1 year ago

2.54.0

1 year ago

2.53.0

1 year ago

2.52.1

1 year ago

2.52.0

1 year ago

2.51.0

1 year ago

2.50.0

1 year ago

2.49.0

1 year ago

2.48.0

2 years ago

2.47.0

2 years ago

2.46.0

2 years ago

2.45.0

2 years ago

2.44.0

2 years ago

2.43.0

2 years ago

2.43.1

2 years ago

2.42.0

2 years ago

2.41.0

2 years ago

2.40.0

2 years ago

2.38.0

2 years ago

2.39.0

2 years ago

2.37.0

2 years ago

2.34.0

2 years ago

2.33.0

2 years ago

2.32.0

2 years ago

2.36.0

2 years ago

2.31.0

2 years ago

2.35.0

2 years ago

2.30.0

3 years ago

2.27.0

3 years ago

1.176.0

3 years ago

2.26.0

3 years ago

1.175.0

3 years ago

1.179.0

3 years ago

2.29.0

3 years ago

1.181.0

3 years ago

1.181.1

3 years ago

1.174.0

3 years ago

1.178.0

3 years ago

2.28.0

3 years ago

1.180.0

3 years ago

1.177.0

3 years ago

1.168.0

3 years ago

2.19.0

3 years ago

1.172.0

3 years ago

2.22.0

3 years ago

1.167.0

3 years ago

2.18.0

3 years ago

2.21.0

3 years ago

1.171.0

3 years ago

2.25.0

3 years ago

2.17.0

3 years ago

1.170.0

3 years ago

2.20.0

3 years ago

1.170.1

3 years ago

2.24.0

3 years ago

1.169.0

3 years ago

2.16.0

3 years ago

1.173.0

3 years ago

2.23.0

3 years ago

2.11.0

3 years ago

1.160.0

3 years ago

1.164.0

3 years ago

1.157.0

3 years ago

1.163.0

3 years ago

2.10.0

3 years ago

1.163.2

3 years ago

1.163.1

3 years ago

2.14.0

3 years ago

1.162.0

3 years ago

1.166.1

3 years ago

2.13.0

3 years ago

1.159.0

3 years ago

2.12.0

3 years ago

1.161.0

3 years ago

1.165.0

3 years ago

2.9.0

3 years ago

1.158.0

3 years ago

2.6.0

3 years ago

2.8.0

3 years ago

1.153.0

3 years ago

1.153.1

3 years ago

1.155.0

3 years ago

2.7.0

3 years ago

1.154.0

3 years ago

1.156.0

3 years ago

1.156.1

3 years ago

2.4.0

3 years ago

1.147.0

3 years ago

1.149.0

3 years ago

1.151.0

3 years ago

2.5.0

3 years ago

1.148.0

3 years ago

1.150.0

3 years ago

1.152.0

3 years ago

1.143.0

3 years ago

1.145.0

3 years ago

1.141.0

3 years ago

1.142.0

3 years ago

1.144.0

3 years ago

1.140.0

3 years ago

1.146.0

3 years ago

2.2.0

3 years ago

1.132.0

4 years ago

1.134.0

4 years ago

1.130.0

4 years ago

1.138.2

3 years ago

1.138.1

3 years ago

1.136.0

4 years ago

1.138.0

3 years ago

2.3.0

3 years ago

2.1.0

3 years ago

1.131.0

4 years ago

1.133.0

4 years ago

1.139.0

3 years ago

1.135.0

4 years ago

1.137.0

3 years ago

1.128.0

4 years ago

2.0.0

4 years ago

1.129.0

4 years ago

1.127.0

4 years ago

2.0.0-rc.2

4 years ago

1.126.0

4 years ago

2.0.0-rc.1

4 years ago

1.125.0

4 years ago

1.124.0

4 years ago

1.123.0

4 years ago

1.122.0

4 years ago

1.121.0

4 years ago

1.120.0

4 years ago

1.119.0

4 years ago

1.118.0

4 years ago

1.117.0

4 years ago

1.115.0

4 years ago

1.116.0

4 years ago

1.113.0

4 years ago

1.114.0

4 years ago

1.112.0

4 years ago

1.111.0

4 years ago

1.110.1

4 years ago

1.109.0

4 years ago

1.108.1

4 years ago

1.110.0

4 years ago

1.108.0

4 years ago

1.107.0

4 years ago

1.106.1

4 years ago

1.106.0

4 years ago

1.105.0

4 years ago

1.99.0

4 years ago

1.100.0

4 years ago

1.101.0

4 years ago

1.104.0

4 years ago

1.102.0

4 years ago

1.103.0

4 years ago

1.98.0

4 years ago

1.97.0

4 years ago

1.96.0

4 years ago

1.95.2

4 years ago

1.95.1

4 years ago

1.95.0

4 years ago

1.94.1

4 years ago

1.94.0

4 years ago

1.93.0

4 years ago

1.92.0

4 years ago

1.91.0

4 years ago

1.90.1

4 years ago

1.90.0

4 years ago

1.89.0

4 years ago

1.88.0

4 years ago

1.87.1

4 years ago

1.87.0

4 years ago

1.86.0

4 years ago