2.0.1 • Published 2 years ago

@lcylwik/serverless-glue v2.0.1

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

Serverless Glue

Serverless-glue is an open source MIT licensed project, which has been able to grow thanks to the community. This project is the result of an idea that did not let it rest in oblivion and many hours of work after hours.

Install

  1. run npm install --save-dev @lcylwik/serverless-glue
  2. add @lcylwik/serverless-glue in serverless.yml plugin section
    plugins:
        - "@lcylwik/serverless-glue"

How it works

The plugin creates CloufFormation resources of your configuration before making the serverless deploy then add it to the serverless template.

So any glue-job deployed with this plugin is part of your stack too.

How to configure your GlueJob(s)

Configure your glue jobs in the root of servelress.yml like this:

Glue:
  bucketDeploy: someBucket # Required
  createBucket: true # Optional, default = false
  createBucketConfig: # Optional 
    ACL: private # Optional, private | public-read | public-read-write | authenticated-read
    LocationConstraint: af-south-1
    GrantFullControl: 'STRING_VALUE' # Optional
    GrantRead: 'STRING_VALUE' # Optional
    GrantReadACP: 'STRING_VALUE' # Optional
    GrantWrite: 'STRING_VALUE' # Optional
    GrantWriteACP: 'STRING_VALUE' # Optional
    ObjectLockEnabledForBucket: true # Optional
    ObjectOwnership: BucketOwnerPreferred # Optional
  s3Prefix: some/s3/key/location/ # optional, default = 'glueJobs/'
  tempDirBucket: someBucket # optional, default = '{serverless.serviceName}-{provider.stage}-gluejobstemp'
  tempDirS3Prefix: some/s3/key/location/ # optional, default = ''. The job name will be appended to the prefix name
  jobs:
    - name: super-glue-job # Required
      scriptPath: src/script.py # Required script will be named with the name after '/' and uploaded to s3Prefix location
      Description: # Optional, string
      tempDir: true # Optional true | false
      type: spark # spark / pythonshell # Required
      glueVersion: python3-2.0 # Required python3-1.0 | python3-2.0 | python2-1.0 | python2-0.9 | scala2-1.0 | scala2-0.9 | scala2-2.0
      role: arn:aws:iam::000000000:role/someRole # Required
      MaxConcurrentRuns: 3 # Optional
      WorkerType: Standard # Optional, G.1X | G.2X
      NumberOfWorkers: 1 # Optional
      Connections: # Optional
        - some-conection-string
        - other-conection-string
      Timeout: # Optional, number
      MaxRetries: # Optional, number
      DefaultArguments: # Optional
        class: string # Optional
        scriptLocation: string # Optional
        extraPyFiles: string # Optional
        extraJars: string # Optional
        userJarsFirst: string # Optional
        usePostgresDriver: string # Optional
        extraFiles: string # Optional
        disableProxy: string # Optional
        jobBookmarkOption: string # Optional
        enableAutoScaling: string # Optional
        enableS3ParquetOptimizedCommitter: string # Optional
        enableRenameAlgorithmV2: string # Optional
        enableGlueDatacatalog: string # Optional
        enableMetrics: string # Optional
        enableContinuousCloudwatchLog: string # Optional
        enableContinuousLogFilter: string # Optional
        continuousLogLogGroup: string # Optional
        continuousLogLogStreamPrefix: string # Optional
        continuousLogConversionPattern: string # Optional
        enableSparkUi: string # Optional
        sparkEventLogsPath: string # Optional
        customArguments: # Optional; these are user-specified custom default arguments that are passed into cloudformation with a leading -- (required for glue)
          custom_arg_1: custom_value
          custom_arg_2: other_custom_value
      SupportFiles: # Optional
        - local_path: path/to/file/or/folder/ # Required if SupportFiles is given, you can pass a folder path or a file path
          s3_bucket: bucket-name-where-to-upload-files # Required if SupportFiles is given
          s3_prefix: some/s3/key/location/ # Required if SupportFiles is given
          execute_upload: True # Boolean, True to execute upload, False to not upload. Required if SupportFiles is given
      Tags:
        job_tag_example_1: example1
        job_tag_example_2: example2
  triggers:
    - name: some-trigger-name # Required
      Description: # Optional, string
      StartOnCreation: True # Optional, True or False
      schedule: 30 12 * * ? * # Optional, CRON expression. The trigger will be created with On-Demand type if the schedule is not provided.
      Tags:
        trigger_tag_example_1: example1     
      actions: # Required. One or more jobs to trigger
        - name: super-glue-job # Required
          args: # Optional
            custom_arg_1: custom_value
            custom_arg_2: other_custom_value
          timeout: 30 # Optional, if set, it overwrites specific jobs timeout when job starts via trigger

You can define a lot of jobs...

  Glue:
    bucketDeploy: someBucket
    jobs:
      - name: jobA
        scriptPath: scriptA
        ...
      - name: jobB
        scriptPath: scriptB
        ...

And a lot of triggers...

  Glue:
    triggers:
        - name:
            ...
        - name:
            ...

Glue configuration parameters

ParameterTypeDescriptionRequired
bucketDeployStringS3 Bucket nametrue
createBucketBooleanIf true, a bucket named as bucketDeploy will be created before. Helpful if you have not created the bucket firstfalse
createBucketConfigcreateBucketConfigBucket configuration for creation on S3false
s3PrefixStringS3 prefix namefalse
tempDirBucketStringS3 Bucket name for Glue temporary directory. If dont pass argument the bucket'name will generates with pattern {serverless.serviceName}-{provider.stage}-gluejobstempfalse
tempDirS3PrefixStringS3 prefix name for Glue temporary directoryfalse
jobsArrayArray of glue jobs to deploytrue

CreateBucket confoguration parameters

ParameterTypeDescriptionRequired
ACLStringThe canned ACL to apply to the bucket. Possible values include:privatepublic-readpublic-read-writeauthenticated-readFalse
LocationConstraintStringSpecifies the Region where the bucket will be created. If you don't specify a Region, the bucket is created in the US East (N. Virginia) Region (us-east-1). Possible values are: af-south-1ap-east-1ap-northeast-1ap-northeast-2ap-northeast-3ap-south-1ap-southeast-1ap-southeast-2ca-central-1cn-north-1cn-northwest-1EUeu-central-1eu-north-1eu-south-1eu-west-1eu-west-2eu-west-3me-south-1sa-east-1us-east-2us-gov-east-1us-gov-west-1us-west-1us-west-2false
GrantFullControlStringAllows grantee the read, write, read ACP, and write ACP permissions on the bucket.false
GrantRead(StringAllows grantee to list the objects in the bucket.false
GrantReadACPStringAllows grantee to read the bucket ACL.false
GrantWriteStringAllows grantee to create new objects in the bucket. For the bucket and object owners of existing objects, also allows deletions and overwrites of those objects.false
GrantWriteACPStringAllows grantee to write the ACL for the applicable bucket.false
ObjectLockEnabledForBucketBooleanSpecifies whether you want S3 Object Lock to be enabled for the new bucket.false
ObjectOwnershipStringThe container element for object ownership for a bucket's ownership controls.Possible values include:BucketOwnerPreferredObjectWriterBucketOwnerEnforcedfalse

Jobs configurations parameters

ParameterTypeDescriptionRequired
nameStringname of jobtrue
DescriptionStringDescription of the jobFalse
scriptPathStringscript path in the projecttrue
tempDirBooleanflag indicate if job required a temp folder, if true plugin create a bucket for tmpfalse
typeStringIndicate if the type of your job. Values can use are : spark or pythonshelltrue
glueVersionStringIndicate language and glue version to use ( [language][version]-[glue version]) the value can you use are: python3-1.0python3-2.0python2-1.0python2-0.9scala2-1.0scala2-0.9scala2-2.0true
roleStringarn role to execute jobtrue
MaxConcurrentRunsDoublemax concurrent runs of the jobfalse
ConnectionsStringDatabase conections (for multiple connections use , for separetion)false
WorkerTypeStringThe type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, or G.2X.false
NumberOfWorkersIntegernumber of workersfalse
ConnectionsLista list of connections used by the jobfalse
DefaultArgumentsobjectSpecial Parameters Used by AWS Glue for mor information see this read the AWS documentationfalse
SupportFilesListList of supporting files for the glue job that need upload to S3false
TagsJSONThe tags to use with this job. You may use tags to limit access to the job. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.false

Triggers configuration parameters

ParameterTypeDescriptionRequired
nameStringname of the triggertrue
scheduleStringCRON expressionfalse
actionsArrayAn array of jobs to triggertrue
DescriptionStringDescription of the TriggerFalse
StartOnCreationBooleanWhether the trigger starts when created. Not supperted for ON_DEMAND triggersFalse

Only On-Demand and Scheduled triggers are supported.

Trigger job configuration parameters

ParameterTypeDescriptionRequired
nameStringThe name of the Glue job to triggertrue
timeoutIntegerJob execution timeout. It overwritesfalse
argsMapjob argumentsfalse
TagsJSONThe tags to use with this triggers. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.false

And now?...

Only run serverless deploy

2.0.1

2 years ago

2.0.0

2 years ago

1.0.9

3 years ago

1.0.8

3 years ago

1.0.7

3 years ago

1.0.6

3 years ago

1.0.5

3 years ago

1.0.4

3 years ago

1.0.3

3 years ago

1.0.2

3 years ago

1.0.1

3 years ago

1.0.0

3 years ago