Migration-pipeline NPM

Table of Contents

Data Pipeline Primer (Edits Welcome!)

This is the Data Pipeline repository

Requirements for local development

Git
nvm
Visual Studio Code
AWS Account Access - https://digops-docs.meredithcorp.io/docs/aws_accounts
Saml2Aws - https://confluence.meredith.com/display/COM/Setup+AWS+Single+Sign-On+for+Dev+environment
REST Client (Postman)

Install / Configure Git

For Mac

Using Homebrew

Open your terminal and install Git using Homebrew:
- $ brew install git
Verify the installation was successful
- $ git --version

Using the Mac Installer

Download the latest Git for Mac installer
Launch the installer and follow the prompts to install Git
Verify the installation was successful
- $ git --version

For Windows

Download the latest Git for Windows installer
Launch the installer and follow the prompts to install Git
Verify the installation was successful
- $ git --version

For Linux / Debian/Ubuntu (apt-get)

$ sudo apt-get update
$ sudo apt-get install git
Verify the installation was successful
- $ git --version

Configure Git

Configure your Git username and email using the following commands, replacing the name with your own. These details will be associated with any commits that you create
- $ git config --global user.name "Kermit Frog"
- `$ git config --global user.email "kermit-d-frog@muppet.org
Save some typing by setting up Git aliases
- $ git config --global alias.co checkout
- $ git config --global alias.br branch
- $ git config --global alias.ci commit
- $ git config --global alias.st status
- $ git config --global alias.df diff
- $ git config --global alias.dfc diff --cached

GitHub / SSH Keys

Follow these instructions to configure SSH Keys for your Github account
To Summarize:
- $ ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
- $ ssh-add -K ~/.ssh/{ssh_file}
- Add the key to your GitHub account
- Key must be "SSO Authorized" to be used on Meredith github repositories.

Local Development Setup

Install JQ

JQ is an amazin CLI JSON processor. If you have the time, it's worth learning and it's also used by some of our scripts.

Mac

$ brew install jq

Linux

$ sudo apt-get install jq

Install Node Version Manager (nvm)

Mac/Linux

$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash

Windows

Follow these instructions to install NVM for windows

Install Node 12.x

$ nvm install 12

Other handy nvm commands

Show Installed versions
- $ nvm ls
Use a specific version of Node
- $ nvm use XX.XX.XX
Use the version of node specified in ~/.nvmrc
- $ nvm use 12

Install Visual Studio Code

Download and install Visual Studio Code

Install saml2aws

Email helpdesk@meredith.com requesting: Please add "Development" role to my (your!) meredith AD profile on AWS account 152962632001 cc jeremy.chou@dotdashmdp.com
using these instructions, install saml2aws | (Original Confluence Link)

Clone the data-pipeline repo

$ git clone git@github.com:MeredithCorp/data-pipeline.git

Dependencies Setup

First of all, you need to be at the cloned repo root path:

$ cd data-pipeline

NPM Registry Login

Some dependencies needs you to be logged in, because it's on a private registry. To set up, just do the following steps:

1) create an .npmrc file (with a dot at the beginning) following the below pattern:

@meredithcorp:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=[auth token will go here, generated from the below steps]
always-auth = true

2) go to your Github user profile developer settings to generate new access token 3) set a descriptive note to remind you why you've created this token (i.e.: data-pipeline-project) 4) set an expiration date (it's possible to create a forever token, but it's not recommended for security reasons. Max should be 1 year) 5) check write:packages option 6) click on generate token button 7) copy this token and set it in your .npmrc file, after the authToken= (suggestion to also save your token in another backup file) 8) after this, you need to authorize this token to work with Meredith private registry. To do that, follow steps as below 9) go to your Github user profile personal access token list 10) click on configure SSO option right after your token name (i.e.: data-pipeline-project) 11) click on authorize meredith corp and goes until the end of the wizard path (next, next, next...) 12) back on your local, execute npm login command. NPM will request your username, password and email. (Or create new account if you don't have one) 13) After that, npm will deliver OTP via email for authentication.

NPM Install

With your user properly logged in, Do the following steps:

1) $ npm install

Right after this command, you can execute the following command to check if your execution was successful:

2) echo $? (we expect a zero as output. Anything else is a failure)

Visual Studio Custom Integrations

We've made a couple of small enhancements to help with local development...

Setting up vscode settings

If you already have .vscode directory at the root of project directory then please make sure to take backup before running following command because it will override .vscode directory

sh scripts/sync-settings.sh

Environment settings

Added a custom ./local/.env file for development environment variables. These should typically stay "as is" unless you have a custom saml2aws profile. In that case, AWS_PROFILE should be updated. Note that this file is in the .gitignore and custom changes should NOT be committed to the repository.

./local/.env

Recommended Extensions

During Visual Studio Code startup, you will be prompted to install Recommended Extensions. If you choose not to install these extensions, some functionality such as the custom saml2aws login task will not work correctly.
For consistent env and successfully adding "saml2aws" as VSCode custom button, select the extensions icon, then type "@recommended". You must install all the extensions listed on the left panel (intellicode is optional).

npm.io

saml2aws Integration

Added a custom task and script to launch an interactive terminal for saml2aws login.
- click the saml2aws button shown in the image below
- handle login through the terminal window as shown
  - Note: This will use the saml2aws -p {profile} option using the AWS_PROFILE environment variable from ./local/.env discussed above.
if custom button "saml2aws" does not show up on the bottom right of VSCode window, please follow step 2 of "Recommended Extensions"

npm.io

Development Workflow

Working a ticket - Process Overview

Find a ticket to work
Go to your data-pipeline repository directory
- example: $ cd ~/projects/data-pipeline/ | $ git pull
Check out the dotdash-v5 branch - This should be the default branch for dotdash migration development
- example: $ git co dotdash-v5
Create a branch for your development work. Branch names should include the ticket number you're working
- example: $ git co -b DS-111_some_new_feature
Push the new branch to origin: $ git push origin
Run npm install to make sure all dependencies are available $ npm i
Make your code updates...
Write some unit tests for your changes...
Commit & Push your changes to the branch
- Make sure all tests pass! $ npm run test
- Review your code for ESlint warnings/errors $ npm run lint (Also available in VS Code)
- Make sure your code is up-to-date with the base branch
  - $ git merge origin/dotdash-v5 and resolve merge conflicts as necessary
- Add any new files to git $ git add <files>
- Commit and push your branch updates:
  - $ git commit -m 'some meaningful message'
  - $ git push

Local Debugging

How to debug

til

Local Pipeline Stage Execution

Pipeline stages have a function available for local execution. Looking at the data-pipeline/package.json, these functions are easily identified. For example, the following highlighted example points to the DeveloperExperienceEnhancerLocalDebuggingEntryPoint function in ./functions/SeleneQARegressionReports.js

npm.io

Lets take a closer look at that function:

const DeveloperExperienceEnhancerLocalDebuggingEntryPoint = async () => {
  const {ExtractMigrationItemId} = require('../lib/localdevHelper.js')

// Specify ID or full SourceData URL of migration item that you want to run locally against. If you leave RunId blank,
// the RunId wildfdl be extracted from the MigrationItemId

  const SourceIdOrUrl = 'https://dotdash-data-pipeline-reports.commerce.meredithcorp.io/migrations/item/payload/9a6ba483-919e-43fa-a40e-fd450ca8aa25-cms_onecms_posts_health_193392/SeleneQARegressionResult'
}

Note: some of these functions also require an access token for Selene. You can find instructions for acquiring an access_token value in Keeper under "data-services-developer". If you don't have this available in Keeper, please contact your team lead for access. In addition to the curl command provided in Keeper notes, you can also update your local/.env to include the client_id/client_secret from Keeper.

# for developer access token fetching uncomment the following and provide values
#keycloakUrl=https://keycloak-dotdash-stable.a-ue1.dotdash.com/auth/realms/greenhouse/protocol/openid-connect/token
#client_id={put client_id from Keeper here}
#client_secret={put client_secret from Keeper here}

To start debugging with a specific "Run" from the dashboard, choose a specific launch configuration as shown in Local Pipeline Stage Execution above and find the applicable script for its execution (example: ./functions/SeleneQARegressionReports.js). Once that has been located, go to the DeveloperExperienceEnhancerLocalDebuggingEntryPoint function in that script and you should see a function very similar to the one above.

Go to the data-pipeline dashboard and locate a migration item in the migration run of your choice and copy the SourceData payload URL as shown in the image below. You can then paste this URL into the DeveloperExperienceEnhancerLocalDebuggingEntryPoint function:

const SourceIdOrUrl = 'https://dotdash-data-pipeline-reports.commerce.meredithcorp.io/migrations/item/payload/0b747bb0-fa53-4255-bcff-68e5e32f9ae0-cms_onecms_posts_health_195594/SourceData'

Once this has been completed, you can execute the applicable launch configuration and it will use the data from the SourceData link (See "How to Debug").

npm.io

Common problems/troubleshoot while local debugging

If your debugger is not working as above mentioned, then
- Update your vscode to the latest.
- Make sure vscode is installed in and running from Applications.(for mac os users)
- Ensure that the data-pipeline repo is the root folder in your workspace.

Creating Regression Tests

Regression Test Configuration

The configuration for regression tests are located in brand-environments/common-pipelines.js. The selene-qa-regression-reports StageName can be found under each of test Development, Test and Production environments. There are two applicable objects here named RegressionTests and RegressionTestsOptions:

RegressionTests: {
  // Enabled/Disabled both utilize Glob patterns
  Enabled: ['**/*.js'],
  Disabled: ['component-type-invalid.js', 'image-dimensions-out-of-bounds.js'],
},
RegressionTestsOptions: {
  'image-dimensions-out-of-bounds': {
    height: {min: 600, },
    width: {min: 600, },
  },
  'component-type-invalid': {
    invalidImageTypes: ['image']
  },
  // following are examples
  'bylines': {
    testBylinesOpt: 'example',
  },
  'tagged-images': {
    testTaggedImagesOpt: 'example'
  },
},

RegressionTests
- This block identifies Enabled and Disabled tests using a Glob patterns. For example, the pattern in the example above "*/.js" will match all .js files in any directory. This will provide us a great amount of flexibility in organizing tests per run configuration by environment, brand or any other criteria. When tests are executed, any items matching the Disabled glob pattern will not be executed.
RegressionTestsOptions
- This block contains properties for each test by test name (without the .js). This must match the test filename. Options can be defined at your own discretion based on the regression test you are creating.

Creating A New Regression Test

Creating a new regression test is as simple as:

add a new .js file to the /data-pipeline/RegressionTests folder, or any subdirectory you'd like for test organization
adding the test "boilerplate" code
implementing your test functionality

The first step is pretty self-explanitory, but let's cover the other two in more detail.

Test boilerplate code

After you've created the regression test .js file, copy and past the example boilerplate code below into your script.

const {
  RegressionHandler,
  GetError,
  GetPropertyByPath,
} = require("../lib/regressionUtil.js");

async function handler(handlerOptions) {
  let errors = []

  return errors
}

module.exports = {
  handler: RegressionHandler(handler),
}

These few lines of code are necessary in every test. The only other modifications you'll need to make are inside the handler function.

Implementing Your Test

Before you start writing your regression test, you need to determine what values you want to work with and where they come from. Here is what data you have available in the handlerOptions parameter:

TestOptions
- These are the test options you defined in brand-environments/common-pipelines.js under RegressionTestsOptions
SourceData
- The SourceData for the run, including udf
DestinationPayload
- The migrated Selene payload
DestinationResponse
- The Selene response payload

So the first thing you'll want to do is pull in the test options for your test, if you have any. This can be accomplished by destructuring the handlerOptions parameter. Here's an example that creates MinWidth, MaxWidth, MinHeight and MaxHeight variables based on the configuration values. Note that default Object (={}) values are provided for each object wrapper. This will ensure there are no errors attempting to destructure a property from an undefined object.

  const {
    handlerOptions: {
      RegressionTest: {
        TestOptions: {
          imageTypes = [],
            width: {
              min: MinWidth,
              max: MaxWidth,
            } = {},
            height: {
              min: MinHeight,
              max: MaxHeight,
            } = {},
        } = {}
      } = {}
    } = {},
  } = handlerOptions

The udf, DestinationPayload, DestinationResponse and TestOptions values can be destructured in the same fashion.

  const {
    RegressionTest: {
      SourceData: {
        udf: {
          title,
          someProperty,
          someOtherObj: {
            withSomeOtherProperty,
          }
        } = {},
        DestinationPayload: {
          title,
          templateType,
          anotherProperty,
        } = {},
        DestinationResponse: {
          data: {
            someProperty
          }
        } = {},
        TestOptions: {
          someOption
        } = {},
      } = {},
    } = {},
  } = handlerOptions

Adding Regression Test Errors

All that's left is to implement the functionality specific to your regression test and return errors to be displayed in the dashboard. Notice that the boilerplate code included an errors array variable declaration and return. Any errors that you wish to add in your test will fall between these and can be created with the GetError function that's also been included in boilerplate code. Here's an example.

async function handler(handlerOptions) {
  let errors = []

  // Get what you need from handlerOptions.RegressionTest
  const {
    RegressionTest: {
      SourceData: {
        udf: {
          tout_image,
        } = {},
      } = {},
    }
  } = handlerOptions

  // make this something meaningful and create as many errors as necessary!
  if (!tout_image) {
    errors.push(GetError({
      message: `missing tout_image`,
      name: 'some-meaningful-name',
    }))
  }

  // and return the errors
  return errors
}

Running Your Regression Test On Local

Running your regression test should be a straightforward process after reviewing:

Regression tests are another script available to run in package.json:

    "run-local-selene-qa-regression": "node --inspect -e 'require(\"./functions/SeleneQARegressionReports.js\").DeveloperExperienceEnhancerLocalDebuggingEntryPoint()'"

This script points to the file at functions/SeleneQARegressionReports.js and the function DeveloperExperienceEnhancerLocalDebuggingEntryPoint. Simply provide the SourceIdOrUrl variable with a valid SourceData URL.

Available Tests Scope

In Pipeline Dashboard -> Pipeline Run Config, there is enable/disable scope for QA Regression Test. Example is as follows:

"StageName":"selene-qa-regression-reports"
"RegressionTests":{
  "Enabled":[
    0:"**/art-gal*.js"
    1:"**/common*.js"
  ]
  "Disabled":[
    0:"**/deprecated-*.js"
    1:"**/*.spec.js"
  ]
}

In function/SeleneQARegressionReports.js, when Enable is not assigned to a specific file, availableTests will pickup the .js files under RegressionTests folder based on the scope defined Pipeline Run Config. in the following example, all the art-gal.js and common.js will included in the array of availableTests.

  const availableTests = GetRegressionTests({
    Enabled,
    Disabled,
    Options: {}
  })

Debug Test Cases in a Batch

Add a debugger snippet to SeleneQARegressionReports.js, under regressionTestJobs function. The whole function is now like the following.

const regressionTestJobs = availableTests.map(testFile => () => {
      const {
        handler = ThrowRegressionError('regression test requires a handler function'),
      } = require(`../RegressionTests/${testFile}`)

      const testName = testFile.split('.')[0]
      const {
        [testName]: TestOptions = {},
      } = RegressionTestsOptions

      const returnValue = handler({
        SourceData,
        DestinationPayload,
        DestinationResponse,
        TaxeneResponse,
        TestOptions,
      })
      const tap = data => {
        if (!data || data.length) {
          debugger
        }
        return data
      }
      const tapError = data => {
        debugger
        throw data
      }
      return returnValue.then(tap).catch(tapError)
    })

An example of debugger usage is as follows: 1. In Pipeline Dashboard StageQARegressionResult, filter the docs with "false" error 2. Pick one SourceData url from "false" filter list, assign it to SourceIdOrUrl in SeleneQARegressionReports.js 3. Make sure that availableTests contains art-gal.js and common.js files 4. Run debugger, when undefined data is returned from one of the test cases, the test case file will be displayed in Closure session. 5. You can open the identified .js file and debug further. npm.io

Finding Data For Your Regression Test

Refer to the Test Options sections for a description of the contents of handlerOptions. After you determine what data is needed for your regression test and what JSON object to get it from, you can use destructuring to safely assign a JSON property to a variable. There are lots of good tutorials available on destructuring. Refer to the Test Options section above for a description of the SourceData, DestinationPayload, Destination Response and Test Options data.

JSONPath

Another, less verbose, option is to retrieve JSON data using the provided GetPropertyByPath function. Here's an example:

  const pickedImages = await GetPropertyByPath({ udf, path: '$.[tout_image, primary_media, meta.social]'})

For each of the three images found in the query, there will be and object returned with path and value properties. The path property is the path to the JSON in the value property. Here's an example response:

[
  {
    "path": "$.tout_image",
    "value": {
      "orientation": "default",
      "enable_auto_crop": false,
      "_type": "image-reference",
      "align": "default",
      "entity": {
        "$id": "cms/onecms_posts_parents_2008281",
        "$name": "cms/onecms_posts_parents_2008281",
        "udf": {
          "original": {
            "width": 557,
            "src": "https://static.onecms.io/wp-content/uploads/sites/38/2016/07/12214608/screen-shot-2016-07-22-at-7.54.52-am.png",
            "mime_type": "image/png",
            "file_size": 177359,
            "height": 298
          },
          "cms_id": "2008281",
          "rights": {
            "usage": "unknown",
            "source": "unknown"
          },
          "_type": "node-image",
          "alt": "What little girls think \"beach bodies\" means",
          "send_to_media_cloud": false,
          "credit": "Real Simple/YouTube",
          "title": "screen-shot-2016-07-22-at-7.54.52-am.png",
          "brand": "parents",
          "watermarked": false
        }
      }
    }
  },
  {
    "path": "$.tout_image.entity.udf",
    "value": {
      "original": {
        "width": 557,
        "src": "https://static.onecms.io/wp-content/uploads/sites/38/2016/07/12214608/screen-shot-2016-07-22-at-7.54.52-am.png",
        "mime_type": "image/png",
        "file_size": 177359,
        "height": 298
      },
      "cms_id": "2008281",
      "rights": {
        "usage": "unknown",
        "source": "unknown"
      },
      "_type": "node-image",
      "alt": "What little girls think \"beach bodies\" means",
      "send_to_media_cloud": false,
      "credit": "Real Simple/YouTube",
      "title": "screen-shot-2016-07-22-at-7.54.52-am.png",
      "brand": "parents",
      "watermarked": false
    }
  },
  {
    "path": "$.meta.social.image",
    "value": {
      "orientation": "default",
      "enable_auto_crop": false,
      "_type": "image-reference",
      "align": "default",
      "entity": {
        "$id": "cms/onecms_posts_parents_2008281",
        "$name": "cms/onecms_posts_parents_2008281",
        "udf": {
          "original": {
            "width": 557,
            "src": "https://static.onecms.io/wp-content/uploads/sites/38/2016/07/12214608/screen-shot-2016-07-22-at-7.54.52-am.png",
            "mime_type": "image/png",
            "file_size": 177359,
            "height": 298
          },
          "cms_id": "2008281",
          "rights": {
            "usage": "unknown",
            "source": "unknown"
          },
          "_type": "node-image",
          "alt": "What little girls think \"beach bodies\" means",
          "send_to_media_cloud": false,
          "credit": "Real Simple/YouTube",
          "title": "screen-shot-2016-07-22-at-7.54.52-am.png",
          "brand": "parents",
          "watermarked": false
        }
      }
    }
  },
  {
    "path": "$.meta.social.image.entity.udf",
    "value": {
      "original": {
        "width": 557,
        "src": "https://static.onecms.io/wp-content/uploads/sites/38/2016/07/12214608/screen-shot-2016-07-22-at-7.54.52-am.png",
        "mime_type": "image/png",
        "file_size": 177359,
        "height": 298
      },
      "cms_id": "2008281",
      "rights": {
        "usage": "unknown",
        "source": "unknown"
      },
      "_type": "node-image",
      "alt": "What little girls think \"beach bodies\" means",
      "send_to_media_cloud": false,
      "credit": "Real Simple/YouTube",
      "title": "screen-shot-2016-07-22-at-7.54.52-am.png",
      "brand": "parents",
      "watermarked": false
    }
  }
]

There are lots of good references available for JSONPath and even tools that let you evaluate JSONPath expressions online.

Pipeline Feature Flags

See FEATURES

AWS CodeBuild

See CodeBuild Readme

Finding documents in Graph

Setting up

Postman Examples

Content Graph Query Examples

Other Tips & Tricks

VSCode extensions

Serverless IDE

Resources

Javascript/NodeJs Development

ES6 Self-paced Online Training Course

Serverless

Serverless Framework Reference

ESLint

Code Style

Feature flags

This document describes the feature flags that can be set in your configurations.

Troubleshooting & Tips

AWS Cloudformation Stack Errors

Documentation

Please make updates to this documentation as needed. Either add new sections to this file, or create a new Markdown file and add a link to it in this document.

This repository uses the DocToc tool to auto-generate a table of contents (TOC) for Markdown files.

When you're done making documentation updates, run doctoc to update the Table of Contents in the Markdown files. If you add a new Markdown file and want to insert a Table of Contents in that file, you will have to update the file list in the doctoc script in package.json.

To run the doctoc tool, execute the following from the root of the data-pipeline directory:

npm run doctoc

NOTE: There is currently an issue with the npm run doctoc command when run with Node version 12. A workaround is to switch to a newer version of Node (I tested this with version 18), run the npm run doctoc command, then switch back to Node 12.

Data Pipeline

What is Data Pipeline?

Important Links

Developer Onboarding

Lumigo Login Steps:

Verify your gmail account is logged in with your official email.
Goto Lumigo Web App
Click on 'Continue With Google' button to goto Lumigo dashboard.
Or Fill your official credentials to login with Lumigo.
if you do not have access to lumigo then please reach out to Jeremy Chou or Oleg hardt to have access.

Data Pipeline Primer (Edits Welcome!)

Requirements for local development

Install / Configure Git

For Mac

Using Homebrew

Using the Mac Installer

For Windows

For Linux / Debian/Ubuntu (apt-get)

Configure Git

GitHub / SSH Keys

Local Development Setup

Install JQ

Mac

Linux

Install Node Version Manager (nvm)

Mac/Linux

Windows

Install Node 12.x

Other handy nvm commands

Install Visual Studio Code

Install saml2aws

Clone the data-pipeline repo

Dependencies Setup

NPM Registry Login

NPM Install

Visual Studio Custom Integrations

Setting up vscode settings

Environment settings

Recommended Extensions

saml2aws Integration

Development Workflow

Working a ticket - Process Overview

Local Debugging

How to debug

Local Pipeline Stage Execution

Common problems/troubleshoot while local debugging

Creating Regression Tests

Regression Test Configuration

Creating A New Regression Test

Test boilerplate code

Implementing Your Test

Adding Regression Test Errors

Running Your Regression Test On Local

Available Tests Scope

Debug Test Cases in a Batch

Finding Data For Your Regression Test

JSONPath

Pipeline Feature Flags

AWS CodeBuild

Finding documents in Graph

Setting up

Postman Examples

Other Tips & Tricks

VSCode extensions

Resources

Javascript/NodeJs Development

Serverless

ESLint

Code Style

Feature flags

Troubleshooting & Tips

Documentation

Data Pipeline

Important Links

Developer Onboarding

Lumigo Login Steps:

Project Team Leads

Readme completed by: