1.0.0 • Published 21 days ago

dps-extractor v1.0.0

Weekly downloads
-
License
ISC
Repository
github
Last release
21 days ago

DPS-EXTRACTOR

A containerized application that executes extraction tasks. The scope of this repository is focused on collecting third-party data and storing it in S3.

Requirements

Internal Dependencies

Setup

  1. Clone this repository git@github.com:cafemedia/dps-extractor.git
  2. Enter the directory cd dps-extractor
  3. Install dependencies npm install

Infrastructure

Running Terraform. A wrapper script is provided for your convenience. Use terraform.sh -h for more information.

# -e environment
# -p aws credentials profile name
# -v pass terraform variables, can be invoked multiple times
./terraform.sh \
  -e development \
  -p aws-profile \
  .tf

Building

This project is designed to be imported as a library, and must first be compiled into javascript.
NOTE: the memory requirements for building are increasing... npm run build

Linting

This project is configured to use tslint to keep our code styling in line.
npm run lint

Formatting

Please be sure to format your code before commit! npm run format

Testing

A full test suite has been integrated into the project using:

Git Commit Hooks

In order to ensure that we aren't pushing messy code that likely won't pass linting or test phases in Drone, we use husky (https://github.com/typicode/husky) which will automatically build, lint and test our code when we attempt to commit.

Working with Private Github Packages

This project depends on dps-utilities-typescript, which is installed via NPM, but requires authentication with Github Packages.

Building Locally with Docker

This project is configured to automatically build and deploy an image to ECR on the Adthrive AWS Account with a repository of the same name. In order to test that builds work locally:

docker build -t dps-extractor --build-arg GITHUB_TOKEN=<YOUR GITHUB PAT> .
docker run -t dps-extractor Hello

jowens@JOWENS-MAC dps-extractor % docker run -t dps-extractor Hello                                                                
2021-02-11T18:05:22.912Z - info: [Hello] Starting - 512c0d70-a72d-4cdc-8013-4673527dd0b9 - {}
2021-02-11T18:05:22.923Z - info: [Hello] Done duration=3ms
jowens@JOWENS-MAC dps-extractor %

TODO: This could use some optimization.

Running in Airflow

Example airflow task to be incorporated into a DAG:

hello_extractor = KubernetesPodOperator(
    namespace = 'dps',
    image =  f"312505582686.dkr.ecr.us-east-1.amazonaws.com/dps-extractor:<IMAGE TAG>",
    arguments = [
        "Hello",
        "-s", "{{ task_instance.xcom_pull(task_ids='get_state', key='return_value').date }}",
        "-e", "{{ task_instance.xcom_pull(task_ids='get_state', key='return_value').date }}",
        "-x"
    ],
    name = "hello-extractor",
    task_id = "hello-extractor",
    get_logs = True,
    dag = dag,
    is_delete_operator_pod = True,
    in_cluster = True,
    log_events_on_failure = True,
    run_as_user = "airflow",
    annotations = {"datadog-service": "sample-k8s-dag", "datadog-source": "airflow"},
    do_xcom_push = True,
)

Note that if do_xcom_push is set to True, we must also pass the -x argument to the container.

Example Log Output:

[2021-02-11 18:14:17,749] {taskinstance.py:901} INFO - Executing <Task(KubernetesPodOperator): hello-extractor> on 2021-02-11T18:13:52.710304+00:00
[2021-02-11 18:14:17,750] {base_task_runner.py:131} INFO - Running on host: gamearningshelloextractor-f82ef6ed61744772b86229379fd9ba1b
[2021-02-11 18:14:17,750] {base_task_runner.py:132} INFO - Running: ['airflow', 'run', 'gam_earnings', 'hello-extractor', '2021-02-11T18:13:52.710304+00:00', '--job_id', '188', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/dags/gam_earnings.py', '--cfg_path', '/tmp/tmp77e3vqz_']
[2021-02-11 18:14:19,072] {base_task_runner.py:111} INFO - Job 188: Subtask hello-extractor [2021-02-11 18:14:19,072] {__init__.py:50} INFO - Using executor LocalExecutor
[2021-02-11 18:14:19,072] {base_task_runner.py:111} INFO - Job 188: Subtask hello-extractor [2021-02-11 18:14:19,072] {dagbag.py:417} INFO - Filling up the DagBag from /opt/airflow/dags/dags/gam_earnings.py
[2021-02-11 18:14:19,396] {base_task_runner.py:111} INFO - Job 188: Subtask hello-extractor Running <TaskInstance: gam_earnings.hello-extractor 2021-02-11T18:13:52.710304+00:00 [running]> on host gamearningshelloextractor-f82ef6ed61744772b86229379fd9ba1b
[2021-02-11 18:14:19,548] {logging_mixin.py:112} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py:309: DeprecationWarning: Using `airflow.contrib.kubernetes.pod.Pod` is deprecated. Please use `k8s.V1Pod`.
  dummy_pod = Pod(
[2021-02-11 18:14:19,548] {logging_mixin.py:112} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/kubernetes/pod_launcher.py:77: DeprecationWarning: Using `airflow.contrib.kubernetes.pod.Pod` is deprecated. Please use `k8s.V1Pod` instead.
  pod = self._mutate_pod_backcompat(pod)
[2021-02-11 18:14:19,606] {pod_launcher.py:171} INFO - Event: hello-extractor-0d3841b79ce44594bd421def1f168461 had an event of type Pending
[2021-02-11 18:14:19,606] {pod_launcher.py:139} WARNING - Pod not yet started: hello-extractor-0d3841b79ce44594bd421def1f168461
[2021-02-11 18:14:20,614] {pod_launcher.py:171} INFO - Event: hello-extractor-0d3841b79ce44594bd421def1f168461 had an event of type Pending
[2021-02-11 18:14:20,614] {pod_launcher.py:139} WARNING - Pod not yet started: hello-extractor-0d3841b79ce44594bd421def1f168461
[2021-02-11 18:14:21,625] {pod_launcher.py:171} INFO - Event: hello-extractor-0d3841b79ce44594bd421def1f168461 had an event of type Running
[2021-02-11 18:14:21,660] {pod_launcher.py:156} INFO - b'2021-02-11T18:14:20.820Z - \x1b[32minfo\x1b[39m: [Hello] Starting - 9e4e5a0a-44c0-444a-9940-80f2966f4366 - {"start":"2021-02-10T00:00:00.000+00:00","end":"2021-02-10T00:00:00.000+00:00","writeXcom":true}\n'
[2021-02-11 18:14:21,660] {pod_launcher.py:156} INFO - b'2021-02-11T18:14:20.822Z - \x1b[32minfo\x1b[39m: [Hello] Done duration=1ms\n'
[2021-02-11 18:14:21,718] {pod_launcher.py:267} INFO - Running command... cat /airflow/xcom/return.json

[2021-02-11 18:14:21,761] {pod_launcher.py:267} INFO - Running command... kill -s SIGINT 1

foo