2.0.1 • Published 4 years ago

@apostrophecms-pro/assembly-cloud-tools v2.0.1

Weekly downloads
-
License
UNLICENSED
Repository
-
Last release
4 years ago

Purpose

Ops tools to stand up and maintain Assembly Clouds. For use exclusively by the Apostrophe Technologies internal hosting team and our self-hosted enterprise customers.

Assembly Cloud Architecture

30,000 feet: managing several clouds with Assembly Cloud Tools (act)

This diagram shows the overall architecture. Here there are two platforms, with two clouds each: one for staging sites, and one for production sites. Depending on your needs and your license you will most likely be managing just one platform, with two clouds.

Each platform would run the same multisite application on both its staging and its production cloud, as described below.

The tool provided in this repository is used to manage clouds at this level: creating, updating, repairing and deleting entire clouds, not individual sites. Adding and removing individual sites can be done entirely within the user interface provided by the dashboard of a single cloud.

This diagram shows the architecture of a single Apostrophe Assembly "cloud."

  • Platform balancers are AWS ec2 instances that receive traffic from the browser and proxy it to app servers.
  • App servers are AWS ec2 instances that run the actual Node.js ApostropheCMS multisite-based application.
  • Workers are AWS ec2 instances responsible for deployments and scheduled tasks.
  • Amazon S3 hosts persistent media uploads.
  • MongoDB Atlas hosts persistent databases. There should be one replica set cluster per cloud. That includes SEPARATE replica sets for "staging" and "production" clouds.

In addition:

  • assembly-boilerplate-project provides a starting point for multisite projects compatible with this architecture. Usually such a project will have two clouds associated with it, staging and production.
  • The act utility provided in this package is used to manage one or more clouds hosted on AWS. It should be run on a secured MacOS or Linux workstation.
  • Your metadata database contains sensitive metadata about all of your clouds. This is also a MongoDB Atlas replica set cluster, and it should be separate from any replica sets used to host individual clouds.

Prerequisites on your workstation

  • aws (the CLI), version 2.x; see AWS documentation for installation. Version 1 is no longer supported. If you still have version 1 you should uninstall it and install version 2.
  • nodejs version 10 or better.
  • homebrew, if running on MacOS.
  • jq (the JSON command line utility); on a Mac, this can be installed through Homebrew with brew install jq
  • multitail (allows you to watch logs of several servers in a single merged view); on a Mac, this can be installed through Homebrew with brew install multitail.
  • The realpath command must be available, installed via brew install coreutils (MacOS). On most Linux distributions this is standard, but if not you can use apt-get install coreutils (on Debian/Ubuntu).
  • A project powered by the apostrophe-multisite module. We recommend starting from our Assembly project boilerplate. For best results you should get your project working in a local dev environment first.

Prerequisites in the cloud

  • A MongoDB URI to a permanent MongoDB instance for storage of sensitive metadata about your clouds. This must be a secure and durable database. It should ONLY be used for this purpose, not for hosting websites in your clouds. We recommend using a small MongoDB Atlas instance for this purpose, with "encryption at rest" enabled. Connections must be permitted from your workstation's IP or all IPs.
  • A primary AWS account (your "root" AWS account, the one you get when you sign up). Note: these tools will require resources not included in the "free tier."
  • The key and secret for an AWS iam account within your AWS account, with the following privileges assigned: IAMUserChangePassword, AmazonEC2FullAccess, IAMFullAccess, AmazonEC2ContainerRegistryFullAccess, AmazonS3FullAccess.
  • The default VPC of your AWS account must have a default security group which permits access to ports 22 (SSH), 80 (HTTP) and 443 (HTTPS). Of course, SSH access is secured by private keys. Access "EC2" in the AWS console, then "Security Groups," and add "inbound rules" which are set to match "Anywhere" for these three ports. If you operate this software exclusively from a specific workstation with a fixed IP you may be able to restrict port 22 more tightly.
  • Registered domain names for both your staging cloud and your production cloud. If your company name is example, we suggest example.dev for staging and example.app for production, but the choice is yours. However these must be real domain names and you must be able to edit DNS settings for them freely (use a registrar such as GoDaddy or Google Domains that gives you full control of DNS). You may use a domain that is already in use for your main website, as long as you are comfortable with pointing a DNS wildcard "A" record at the cloud, which means that the cloud will serve all subdomains that do not have explicit DNS records.

If you don't want to create the IAM account, you may use the key and secret for your root AWS account, however we recommend creating the IAM account.

Installation on your secure workstation

This software is released as an npm private package, and installed with the npm command line interface. Here are the steps to install it.

First, create an npm account if you do not have one.

Second, provide your npm account username to your Apostrophe Technologies representative, requesting access to install the cloud tools. You may receive a confirmation email from npm.

Third, log into npm with the npm login command if you have not already done so.

Finally, install the software via npm:

# Use your npm account credentials
npm login
# Install the software with npm
npm install -g @apostrophecms-pro/assembly-cloud-tools
# The `act` command should now be available on your workstation
which act

Configuration on your workstation

  • Use the newly installed act utility to set the URI to your MongoDB database for the metadata:
act set-metadata-uri 'mongodb://someuser:somepass@mongodbatlas/somedatabase'

This URI will be retained in ~/.act-uri. For durability, cross-team use and security, all other sensitive metadata will be retained in the specified database rather than on your local computer, except when necessary to the functioning of standard utilities such as ssh. Such information may be temporarily mirrored in the ~/.cloud-tmp subdirectory of your account.

Creating an Assembly Cloud

An "assembly cloud" is a platform designed to serve a constellation of websites powered by the apostrophe-multisite module. You will want one cloud for staging and another for production ("prod"), although you can economize with prod only if you wish. A dev environment is usually achieved locally without these tools.

  1. Choose permanent names for your clouds

If your company is named example, we recommend naming your repository example, and naming yorur clouds example-staging and example-prod. Your names should be short and involve only letters, digits and hyphens.

  1. Create a MongoDB replica set for your content, separate from your metadata

Create a MongoDB replica set cluster in MongoDB Atlas. Make sure you choose AWS, and the same region where you plan to host your Assembly Cloud. Make note of the full root read-write URI for access to the replica set. Connections must be permitted from all IPs owing to the wide range of IP addresses assigned to EC2 instances by AWS.

  1. Configure your cloud

You can set all of these environment variables in a single call, but we'll look at them one at a time to explain:

CLOUD=example-staging KEY=aws-access-key SECRET=aws-access-secret act configure-cloud
  • Every command begins with setting the CLOUD environment variable to the name of the cloud you want to act upon.
  • KEY and SECRET are the key and secret associated with the AWS iam account you created earlier. They will be stored only in the metadata database and will not be accessible to your websites or cloud servers. Separate IAM accounts will be created.

You will be prompted to pass more information in successive commands until you have passed it all. That's OK, see below.

# OPTIONAL: manually set the S3 bucket name
CLOUD=example-staging BUCKET=apos-example-staging KEY=aws-access-key SECRET=aws-access-secret act configure-cloud

You may optionally set the AWS S3 bucket name for media storage yourself. If you do not, it defaults to the cloud name, which is fine. However since the bucket namespace is global, there is a chance that someone else has claimed it. If this is the case you will get an error message recommending that you set a custom value for BUCKET and try again. apos-YOUR-CLOUD-NAME-HERE is a good value to use if you must set this variable.

CLOUD=example-staging NPM_TOKEN="xyz" act configure-cloud

You may pass an NPM access token. This is required for Apostrophe 3.x projects based on a3-assembly-boilerplate, in which case your npm token must have access to our private packages. The account you used to install act should have access to these private packages, so generating a read-only token for it should be sufficient. Note: if you do not set the token, you will receive a 404 error for modules like @apostrophecms-pro/multisite.

CLOUD=example-staging URI="mongodb+srv://just-an-example-use-what-atlas-gives-you" act configure-cloud

Pass the connection URI for the MongoDB replica set you created for this individual cloud. DO NOT pass your metadata URI here. That should be in a separate cluster.

You will want to quote the URI as is done here. Currently the database name in the URI MUST be test. In actuality many logical databases will be created in the cluster.

The URI you are looking for will start with mongodb:// or mongodb+srv://. Ideally you should use the URI Atlas offers for use with newer MongoDB drivers. Be sure to change the database name part of the string to /test.

CLOUD=example-staging REGION=us-east-1 act configure-cloud

Configure the AWS region for your cloud. Should match your choice when creating your MongoDB Atlas cluster.

CLOUD=example-staging PLATFORM_BALANCER_INSTANCE_TYPE=t2.small act configure-cloud

The AWS ec2 instance type for your platform balancers. Defaults to t2.micro, which typically works well for this simple role, but you might go higher if you expect high traffic.

CLOUD=example-staging WORKER_INSTANCE_TYPE=t2.small act configure-cloud

The AWS ec2 instance type for your platform balancers. Defaults to t2.small, the least expensive that is suitable for Apostrophe. There usually is limited benefit in a larger instance for the worker role.

CLOUD=example-staging APP_SERVER_INSTANCE_TYPE=t2.medium act configure-cloud

The AWS ec2 instance type for your application servers, which run the Node.js application, receive web requests via the platform balancer, and do most of the work. Defaults to t2.small, the smallest that is suitable for Apostrophe.

You may wish to go with a larger size as an alternative to running more total instances, in which case you should set PORTS (below) to make good use of each CPU included with the instance. As a general rule AWS pricing does not penalize you for this "fewer, larger instances" approach until the instances are quite large.

CLOUD=example-staging PORTS="3000 3001" act configure-cloud

Each application server will run one instance of Apostrophe listening on each port you configure here. Defaults to just 3000, which is suitable for the default t2.small instance type, because it has only one CPU. For t2.medium, you should run Apostrophe on two consecutive ports (3000 3001). The cloud will load-balance traffic across all available servers and ports. Must be a space-separated, quoted list.

CLOUD=example-staging APP_SERVERS_TOTAL=3 act configure-cloud

The number of application servers ("app servers") running your Apostrophe site. Defaults to 2, which is the minimum number for high availability.

CLOUD=example-staging DASH=example-dashboard act configure-cloud

This will be the Apostrophe shortName configured for your cloud's "dashboard site."

If your cloud is named example-staging, then example-dashboard is a good choice. Note that it can be the same in separate staging and production clouds, which simplifies later content migration if needed.

The prefix of this shortname, i.e. example-, MUST match the shortNamePrefix setting in app.js of the project. Otherwise newly created sites will not work.

MongoDB Atlas note: if you are self-hosting and you plan to use a low-end MongoDB Atlas cluster (below M10), you must use a name with fewer than 12 characters before the -.

CLOUD=example-staging DASH_HOSTNAME=dashboard.example.dev act configure-cloud

This will be the actual Internet hostname of the dashboard site for your cloud. DO NOT specify a protocol, this is just the hostname. This must be dashboard. followed by the domain name you registered for this cloud.

The shortnames of all other sites in this cloud will be based on it. Every site will have a subdomain, similar to dashboard in this hostname. Of course, in your production cloud, a separate domain name for each site is also supported.

CLOUD=example-staging REPO=git@github.com:apostrophecms/assembly-project-boilerplate.git act configure-cloud

Specify your github repo URL. This repo must exist and contain your project. You will be prompted to configure a github "deploy key" in a later step. That key is provided to you and will need to be pasted into github.

CLOUD=example-staging BRANCH=staging act configure-cloud

The github repo branch associated with this cloud. All git pushes to this branch will result in a deployment to the cloud.

CLOUD=example-staging ENV=staging act configure-cloud

Should be set to staging or prod according to the purpose of this particular cloud.

CLOUD=example-staging DEPLOYMENT_SLACK_WEBHOOK=https://api.slack.com/webhooks/xyz

You must supply a Slack incoming webhook URI in order to receive notifications as deployments proceed and view deployment logs. See the Slack API documentation for details on how to create a Slack webhook.

Immediately after you set your Slack webhook, configure-cloud will provide you with a URL to be set as a github webhook. There is no command to set this in act because it is generated for you.

In github, create a webhook that fires on "push" operations. You do not need to enable the webhook for any other operations. Configure it to point to the URL you are given by configure-cloud, which will look like:

https://dashboard.example.app/ci-server/deploy/passphrase

Your domain name and passphrase will be different.

Make sure you set the content type to JSON, not the default.

This allows your Assembly Cloud to deploy automatically when you push to the appropriate github branch.

CLOUD=example-staging NODE_VERSION=14 act configure-cloud

You must configure the major version of Node.js to be installed on the servers. There is no default. You should not specify minor or patchlevel versions, just the major version. It will automatically be kept up to date.

Legacy clouds configured prior to the 1.2 release of the cloud tools will prompt for this the next time you reconfigure them. They are running major version 10. You may set NODE_VERSION to 10, however bear in mind that long term support for Node.js 10.x ends in April 2021.

To upgrade to a newer major version, first test locally, then run:

CLOUD=example-staging NODE_VERSION=14 act refresh-cloud

Downgrading is not currently straightforward.

CLOUD=example-staging IGNORE_PACKAGE_LOCK=1 act configure-cloud

Ignore package-lock.json files when deploying this cloud. Useful with staging clouds when the goal is to install the latest main branch of several dependencies for bleeding-edge testing. Should never be used on production clouds.

If issues occur during configuration

Make any necessary changes, passing only the environment variables that need to change, if any, on subsequent runs of CLOUD=example-staging act configure-cloud. The system is designed to reuse any work already successfully completed. Please reach out to us with any concerns.

Configuring DNS for cloud sites

A wildcard DNS "A" record for the domain associated with the cloud should be pointed to the elastic IP address associated with the lead platform balancer. You can find that IP address with this command:

CLOUD=example-staging act read-metadata lead-ip

Note that the platform will not be accessible until your DNS changes have propagated. Never release this "elastic IP address" in AWS, unless you are deleting your cloud permanently.

Deploying to your Assembly Cloud

After you have completed all of the steps above and the resulting servers and resources have been allocated and configured by act, you should be ready to deploy by pushing to an appropriate github branch, staging or production as chosen above.

At the start of deployment you will receive a notification in the Slack channel associated with the Slack webhook taht you configured. You can follow the link in that notification to view deployment logs; you may wish to click refresh. At the end of deployment you will receive another notification.

Creating your initial admin account on the dashboard

To create your initial admin account, see executing an Apostrophe command line task below.

Creating sites in a cloud

Once you have deployed successfully, you can visit your dashboard site (dashboard.example.dev/login in the examples above) and add sites via the "Sites" option in the admin bar. Be sure to populate the URLs tab, particularly the shortname.

When you save a site in the dashboard, a minute or so may pass while letsencrypt generates any necessary secure certificates. Once the site is successfully saved, you can access it as shortname.example.dev, where shortname is the value you chose for the shortname field.

Known issue: if letsencrypt is experiencing delays, you may receive an error notification due to a proxy timeout. Wait an additional minute or two before trying the hostname; do not add the site again.

Viewing server logs

You can watch the console logs of the Apostrophe processes on your application servers in real time with this command:

CLOUD=example-staging act tail-node-logs

For more information, see the multitail documentation.

You can also output the entirety of the current server logs with this command:

CLOUD=example-staging act cat-all-pm2-logs

Executing an Apostrophe command line task

To get started, you will want to add an admin user to your dashboard site:

CLOUD=cloud-name act task apostrophe-users:add admin admin --site=dashboard

You may also specify the hostname of any site you have added to your cloud for --site, or use --all-sites to run a task for every non-dashboard site in your cloud.

Making an ssh connection to an application server

NOTE: application servers are not permanent and should not be individually adjusted. Configuration changes should NOT be made on the fly. But, it is sometimes useful to open a shell as a debugging technique.

CLOUD=cloud-name CATEGORY=app-servers act instance-ids
CLOUD=cloud-name act ssh-instance ONE-OF-THOSE-IDS

This opens an interactive shell on the first app server, where the actual site is running. You can add a command after the ID to run it directly and exit, just like you can with ssh.

Other categories are platform-balancers and workers.

Making an ssh instance to the lead platform balancer in one step

CLOUD=cloud-name act ssh-platform-balancer

Making an ssh instance to the worker in one step

CLOUD=cloud-name act ssh-worker

Accessing the AWS CLI conveniently on behalf of the cloud

Rather than just typing aws and setting env vars for it manually, you can use:

CLOUD=cloud-name act aws-cli [appropriate command here]

This utility runs the aws CLI tool with the configured key and secret for this cloud.

Deploying without pushing new code

If the branch hasn't changed but you wish to redeploy, you can type:

CLOUD=example-staging act deploy

This will generate Slack notifications as usual.

Note that this won't work until the DNS records for the domain have been set up.

Adding application server capacity

You can use configure-cloud to change the number of application servers at a later time:

CLOUD=example-staging APP_SERVERS_TOTAL=8 act configure-cloud

The platform balancer will begin using the new servers within one minute after this command completes. If necessary servers are deleted. All content resides in MongoDB and AWS s3.

Restarting the node processes with pm2

Ideally, your application won't need to be restarted, but if you do encounter a wedged state in the Node.js application code, you can quickly restart it on all application servers:

CLOUD=example-staging act restart-node-apps

Replacing an application server

Don't start here. Start with restart-node-apps, which resolves most problems quickly. This tool takes time and is mainly appropriate in situations where AWS has notified you that an instance must be stopped or removed.

If an individual application server ID no longer responds to ssh connections, it may be necessary to replace it:

CLOUD=cloud-name CATEGORY=app-servers act instance-ids
CLOUD=cloud-name CATEGORY=app-servers act ssh-instance ONE-OF-THOSE-IDS
# No response or otherwise dysfunctional, let's replace it
CLOUD=example-staging act replace-app-server ONE-OF-THOSE-IDS

This will replace the impacted application server after configuring a new one and making the platform balancer aware of the replacement. Some time is required.

How do I know which application server to replace?

In addition to checking for accessibility via ssh as shown above, it is possible to check the nginx proxy server logs on the platform balancer for evidence of an unhealthy application server:

CLOUD=example-staging act ssh-platform-balancer
# connects to shell on the lead platform balancer
sudo bash
# now we are in that shell
cd /var/log/nginx
tail website-in-question.error-log
# notice 504 timeout errors that are all when connecting to one backend
# application server IP. This is a private IP

# Log out of the platform balancer
exit

# Let's turn that private IP into an instance id
CLOUD=example-staging CATEGORY=app-server PRIVATE_IP=ip-you-saw-above act instance-ids
# Let's tell it to reboot
CLOUD=example-staging CATEGORY=app-server act ssh-instance id-goes-here sudo reboot
# Is there still a problem? If not, we saved some time here
# OK, if it's still unhappy, let's replace it
CLOUD=example-staging act replace-app-server id-goes-here

Replacing the worker

If AWS reports that the worker instance needs to be replaced, deployments are timing out entirely with zero indication of progress (not failing due to build problems), or the worker is refusing ssh connections, you may need to replace it:

# If this instance id matches one that AWS indicates requires replacement...
CLOUD=example-staging CATEGORY=workers act instance-ids 
# Or if this command times out...
CLOUD=example-staging act ssh-worker
# You may wish to replace the worker
CLOUD=example-staging CATEGORY=workers act instance-ids 
CLOUD=example-staging act replace-worker id-goes-here

This will replace the worker after configuring a new one and making the platform balancer aware of the replacement. Some time is required.

Promoting the backup platform balancer

If AWS reports that the lead platform balancer instance needs to be replaced, it is refusing HTTP/HTTPS connections entirely (as opposed to a "502 Bad Gateway" error), or it is refusing ssh connections, you may wish to replace it with the backup. This will automatically create a new backup as well.

# If this instance id matches one that AWS indicates requires replacement...
CLOUD=example-staging act lead-platform-balancer-id
# Or if this command is timing out...
CLOUD=example-staging act ssh-platform-balancer
# Or if requests to the platform are timing out entirely, switch to the
# backup platform balancer:
CLOUD=example-staging act promote-backup-platform-balancer

This will terminate the lead platform balancer, make the backup the lead platform balancer, and then create a new backup platform balancer. Some time is required.

Replacing the backup platform balancer

If AWS reports that the backup platform balancer instance needs to be replaced, or it can no longer be accessed via ssh, you will want to replace it so that a healthy backup is available:

# If this instance id matches one that AWS indicates requires replacement...
CLOUD=example-staging act backup-platform-balancer-id
# Or if this command is timing out...
CLOUD=example-staging act ssh-backup-platform-balancer
# Replace the backup platform balancer:
CLOUD=example-staging act replace-backup-platform-balancer

This will replace the backup platform balancer after configuring a new one. Some time is required.

Replacing both platform balancers from scratch (LAST RESORT)

If the lead platform balancer must be replaced and promoting the backup has failed, it is possible to replace both from scratch:

# For use only if promoting the backup fails!
CLOUD=example-staging act replace-all-platform-balancers

Although no site content is lost, this will require that certbot issue all certificates from scratch. This will take time and if the letsencrypt rate limits are reached, there may be additional delays before sites can be accessed. However thanks to the backup platform balancer, which receives frequent updates of the certbot data from the lead, this command should be needed only in very rare situations. It is included for completeness.

This will replace both platform balancers. Some time is required and due to the loss of letsencrypt certificates (see above), there may be an extended delay before all HTTPS certificates are reissued to fully enable access to the sites.

Re-configuring cloud servers

It may occasionally be necessary to re-configure the existing platform balancers, worker and app servers. Usually this will not be necessary except when a new release of assembly-cloud-tools has been provided to you, typically to fix a bug or improve performance.

Running act configure-cloud will only configure resources that don't appear to already be configured. To force a re-configuration, run:

CLOUD=example-staging act refresh-cloud

It is also possible to refresh all clouds at once with refresh-clouds, however we recommend doing so one at a time so you can evaluate the outcome.

This can take considerable time. If you are only interested in re-configuring one category of instance, you may use one of these commands:

CLOUD=example-staging CONFIGURE_WORKERS=1 act refresh-cloud
CLOUD=example-staging CONFIGURE_APP_SERVERS=1 act refresh-cloud
CLOUD=example-staging CONFIGURE_PLATFORM_BALANCERS=1 act refresh-cloud

Deleting a cloud

DELETING A CLOUD WILL COMPLETELY DELETE ALL CONTENT, except, currently, for MongoDB Atlas clusters. DO NOT DO THIS UNLESS YOU INTEND TO DELETE ALL CONTENT AND ALL WEBSITES IN THAT CLOUD FOREVER.

# Yikes! Why are you doing this? Think carefully!
CLOUD=example-staging act delete-cloud

This DOES remove the Amazon S3 bucket with all the uploaded media, the ec2 instances, the roles and policies, etc. It DOES NOT remove other clouds.

Currently this does NOT remove the MongoDB Atlas cluster as you must do that within your own MongoDB Atlas account.

Listing your clouds

To list all of your clouds:

act list-clouds

This will list all clouds, even false starts with almost no metadata. To make it easier to track down such issues, you can use the --has-database option. This takes more time:

act list-clouds --has-database

A connection will be made to the actual MongoDB instance dedicated to that cloud. If a mongodb URI is present in the metadata but not actually reachable, the connection error is printed to help distinguish between actual connection problems with real clouds and clouds that were never fully configured.

By default, clouds marked as deleted by act 1.4.0 or newer are not included in the list. To list only the clouds marked as deleted, use:

act list-clouds --deleted

Caveats

  • If configure-cloud should fail for any reason, you should be able to resume progress by running it again. Be aware that billable resources may have already been allocated in AWS. To avoid unexpected billing, if you decide to abandon a cloud rollout, you should use delete-cloud to clean it up, and also review the AWS console.

  • Any workstation on which this software is run should be kept secure. Although the main AWS key and secret are not kept on disk, the ~/.cloud-tmp subdirectory of your account may temporarily contain access keys for S3 media and ssh private keys for EC2 instances, which are removed after commands complete. Also, the ~/.act-uri file contains a MongoDB URI providing access to all of the metadata, including the main AWS key and secret.

2.0.1

4 years ago

2.0.0

4 years ago

1.7.1

4 years ago

1.7.0

4 years ago

1.6.1

4 years ago

1.6.0

4 years ago

1.5.1

4 years ago

1.5.0

4 years ago

1.4.1

4 years ago

1.3.1

4 years ago

1.3.0

4 years ago

1.2.0

4 years ago

1.1.6

4 years ago

1.1.5

4 years ago

1.1.4

4 years ago

1.1.3

4 years ago

1.1.2

4 years ago

1.1.1

4 years ago

1.1.0

4 years ago