1.0.6 • Published 3 years ago

glc-data-glue-crawlers-serverless-plugin v1.0.6

Weekly downloads
1
License
MIT
Repository
github
Last release
3 years ago

Serverless Plugin for GLC glue crawlers

GitHub Actions npm version

Requirements

Tested with:

  • Node.js >= v10
  • Serverless Framework >= v1.51

Installation

Install the dependency

Using npm:

npm i -D glc-data-glue-crawlers-serverless-plugin

Using yarn:

yarn add --dev glc-data-glue-crawlers-serverless-plugin

Use the plugin

Add the plugin to your serverless.yml file:

plugins:
  - glc-data-glue-crawlers-serverless-plugin

Usage

serverless deploy --stage <yourStage>

Example

You can specify the custom section in your serverless.yml:

custom:
    glcGlueCrawler:
        name: <crawlerName> # the crawler name you want
        source:
          path: <s3Path> # the s3 path to crawl (eg. "s3://stats.datalake.${opt:stage}/classified")
          classifier: <classifierName> # optional mapper ("DynamoDbStreamNewField" to crawl only $.new)
          exclusions: [<exclusionPattern1>, <exclusionPattern2>, ...] # optional exclusions (eg. ["2018/**", "2019/0[1-2]/**"])
        destination:
          database: <databaseName> the database where to add the crawled table (eg. "datalake_${opt:stage}")
          tablePrefix: <tablePrefix> # optional table prefix (eg. "lc_") to the crawled last S3 folder name (which will be "classified" if path is "s3://stats.datalake.${opt:stage}/classified")
        tags:
          Env: "${opt:stage}"
          Bloc: "data"
          App: "datalakehouse"
          Comp: <crawlerName>
          Team: <teamTag> # may be already defined in stackTags
          IsInfraAsCode: "serverless" # may be already defined in stackTags

You can also define a list of multiple glue crawlers at once:

custom:
  glcGlueCrawler:
    - name: ${self:service}-${opt:stage}
      source:
        path: "s3://stats.datalake.${opt:stage}/my-lake"
      destination:
        database: "datalake_${opt:stage}"
      tags:
        Env: "${opt:stage}"
        Bloc: "data"
        App: "datalakehouse"
        Comp: "${self:service}"
        Team: "my-team" # may be already defined in stackTags
        IsInfraAsCode: "serverless" # may be already defined in stackTags
    - name: another-crawler-${opt:stage}
      source:
        path: "s3://stats.datalake.${opt:stage}/another-lake"
      destination:
        database: "datalake_${opt:stage}"
      tags:
        Env: "${opt:stage}"
        Bloc: "data"
        App: "datalakehouse"
        Comp: "another-crawler"
        Team: "my-team" # may be already defined in stackTags
        IsInfraAsCode: "serverless" # may be already defined in stackTags
1.0.6

3 years ago

1.0.5

4 years ago

1.0.4

4 years ago

1.0.3

4 years ago

1.0.2

4 years ago

1.0.1

4 years ago