This project depends on:
- python 3.10
- virtualenv
- aws cli
- terraform
- kubectl cli
- helm cli
- CMake
- 1Password CLI
It creates a VPC and a EKS cluster. On that it then setups the Github first party ARC solution for GHA runners using helm
In order to deploy, you'll need to setup the AWS CLI and 1Password CLI
- Get an AWS account. You may need to contact someone with admin access to send you an invite
- Ensure 2FA is setup on your AWS account
- Install the AWS CLI
- To Auth into the AWS CLI, get a new AWS Access Key ID and Secret Access Key. On the AWS console go to IAM->Users->Your user->Security credentials->Create access key.
- In your terminal, run
aws configure --profile {account}to setup your login (currently{account}is always391835788720). It'll ask you for the AWS access key id and secret access key from the previous step. For default region name sayus-east-1. For default output format sayjson.- This will setup your
.awsfolder and createconfigandcredentialsfiles in there. - Here we have AWS CLI set up with a profile named with the account where the target will be deployed (based on the path for each account module on
aws/<acc-id>/<region>/) with all the permissions and keys set up locally.
- This will setup your
- To use the above config as your default setup, you can run
aws configurea second time, but without the--profileparam. - Run
aws ec2 describe-instancesto verify that you're properly authenticated.
You need 1Password to fetch environment secrets and pass them to make.
- Create a 1Password account. Linux Foundation owns 1Password. Ask teammember from there to invite you to create a 1Password account
- Install and setup the 1Password CLI as per their docs.
The root folder's make.env contains paths to various secrets defined in 1Password. To actually use those secrets, you'll want to prefix any command you run with op run --env-file make.env -- [YOUR_COMMAND]. This is particularly important for the make commands.
You can see what your combined your environment contains by running op run --env-file make.env -- env
You can add the following function to your .bashrc or .zshrc file to simplify adding the op prefix. It'll traverse up the tree to find the first file named make.env and pass the path to that into op.
# Alias the 1Password cli. This is for the ci-tools repo. See https://support.1password.com/command-line-getting-started/
# It makes calling "op make" equivalent to "op run --env-file PATH_TO_make.env -- make"
op() {
command op run --env-file $(file="make.env"; pushd . > /dev/null 2>&1; while [[ "$PWD" != "/" && ! -e "$file" ]]; do cd ..; done; if [[ -e "$file" ]]; then echo "$PWD/$file"; fi; popd > /dev/null 2>&1) -- "$@"
}
Ensure you have python 3.10 installed.
Optional, but this lets you run terraform lint locally via make tflint
- Install Terraform: https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli
- Enable tab completion on bash/zsh:
terraform -install-autocomplete(optional)
Instructions: https://github.com/terraform-linters/tflint
Run tflint using the op prefix: op run --env-file make.env -- make tflint
- Or if you setup the shortcut function, you can run
op make tflint
Instructions: https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-eksctl.html
To get authenticated to the kubernetes cluster, you need to be added to the list of authorized users: the EKS list. It's a somewhat complicated process:
- Get the current EKS_USERS list from 1Password. It's stored as a base64 string.
- Use
base64to decode it. (e.g.echo "the_string" | base64 --decode > users.txt). - Add a line for yourself to the resulting users list
- Encode the new list back to base64:
base64 -i user.txt - Replace the old EKS_USERS value in 1Password with this new value
- Go to the Github secrets for this repo. Replace the EKS_USERS secret with this base64 encoded value
- Get someone who already has access to the clusters to run
CLUSTER_TARGET=[cluster-you-want] make clean apply-arc-canaryto actually propate your name to the relevant clusters
Now you can deploy bits to the cluster
Once the above setup steps are complete you can run make as follows:
$ op run --env-file make.env -- make
Or if invoking make from a different folder, pass a path to the make.env file:
# If invoking from aws/<acc-id>/<region>
$ op run --env-file ../../../make.env -- make
If you're testing changes in packages and want to force make to install newer dependencies, just trigger a make clean, it should remove any installed dependency or package locally in the project;
It can be the case that kubectl/helm fail to detect changes in some situations, except from fixing it up and submiting a PR to it and wait to the newer version, you have the option to delete some K8s setup in order to force-replace with make delete
There are canary environments to help develop, to update terraform in all canary environments:
$ cd aws/<acc-id>/<region>
$ op run --env-file ../../../make.env -- make apply-arc-canary
There are 3 canary environments and they can be deployed in steps, the variable CLUSTER_TARGET is optional and used to specify one of the environments:
# installs/update docker registry and mirrors
$ cd aws/<acc-id>/<region>
$ CLUSTER_TARGET="ghci-arc-c-runners-eks-I" op run --env-file ../../../make.env -- make install-docker-registry-canary
# installs/update karpenter and node config
$ cd aws/<acc-id>/<region>
$ CLUSTER_TARGET="ghci-arc-c-runners-eks-I" op run --env-file ../../../make.env -- make karpenter-autoscaler-canary
# installs/update ARC and runner config
$ cd aws/<acc-id>/<region>
$ CLUSTER_TARGET="ghci-arc-c-runners-eks-I" op run --env-file ../../../make.env -- make k8s-runner-scaler-canary
# do it all inside K8s
$ cd aws/<acc-id>/<region>
$ CLUSTER_TARGET="ghci-arc-c-runners-eks-I" op run --env-file ../../../make.env -- make arc-canary
In order to save resources, by default in the canary cluster the minimum number of runners are set to 0 for all runner types. But if other values are needed in order to conduct testing, it is possible to set this number to any other value by setting the variable CANARY_MIN_RUNNERS:
$ CANARY_MIN_RUNNERS=1 CLUSTER_TARGET="ghci-arc-c-runners-eks-I" op run --env-file ../../../make.env -- make k8s-runner-scaler-canary
To upgrade EKS clusters to a new version:
- Go to the AWS Console (https://us-east-1.console.aws.amazon.com/eks/home?region=us-east-1#/clusters)
- For the Cluster(s) you wish to upgrade delete the node groups associated with them
- Delete the Cluster
- Run
make apply# more specifically apply-canary apply-vanguard apply-prod
To release the latest main branch to prod do the following:
- Trigger the "Runners Open Release PR" workflow to create a release PR. This PR will be used to manage the deployment
- Once that PR is ready, get it approved by teammates
- On that PR, comment
PROCEED_TO_VANGUARDto deploy the bits to Vanguard (our staging environment) - Once vanguard has ben successfully deployed, comment
PROCEED_TO_PRODUCTIONto deploy to prod - Once that's done, finally comment
CLEANUP_DEPLOYMENTto finish up the deployment
On the path starting with aws/ everything that is considered critical and secret should be placed. The idea is that all the other paths could be OpenSourced and any config that is only specific for the cluster being deployed or the account being managed for the responsible team should be placed there. Eventually those configs should be broken into different repositories. Enabling collaborators to reuse the project in a modular approach.
The monitoring infrastructure (except scrappers) is deployed in a separate cluster and is not integrated with the current Makefile targets nor is integrated in the rollout procedure. The reasoning is that it is assumed that there won't be required frequent updates in it and that deploying both in symultaneously can create problems of becoming blind right when monitoring is the most important for infra. So, to update, after communicating with everyone on slack, get a pair programming session with another person in the team and run:
$ cd aws/391835788720/us-east-1 && make clean && op run --env-file ../../../make.env -- make apply-arc-canary-monitoring arc-canary-monitoring apply-arc-prod-monitoring arc-prod-monitoring