Better AWS Account Governance With Infrastructure as Code

Summary
Close

There are numerous ways to run Amazon Web Services workloads – the applications, services, and processes you run in Amazon’s cloud. AWS recommends organizing workloads into multiple separate accounts for security, governance, cost management, and operational efficiency. Here, I’ll demonstrate the way we manage AWS workloads in a straightforward and efficient way at Brightly. As an alternative to AWS Control Tower, the benefits of our method are numerous – our solution reduces costs, speeds things up, and allows for more customization, to name a few.

AWS recommends organizing workloads into multiple separate accounts. Typically, this is done using AWS Organizations which allows AWS customers to organize the accounts under a root account. You can also use Organizational Units to group specific accounts into logical groups. AWS has also created a service for governing AWS multi-account environments, called Control Tower Landing Zones. It also manages some other services like AWS Service Catalog and AWS IAM Identity Center.

AWS Control Tower is not a bad tool, but for us here at Brightly it is too heavy and complex. AWS Control Tower leverages several other AWS services like Service Catalog, Organizations, AWS IAM Security Center, AWS Step Functions, and more. Some of the underlying resources incur costs. That’s why we decided to build our own tool to manage our AWS accounts.

Our tool uses Terraform, Terragrunt and GitHub actions. This way we do not use AWS Step Functions or Service Catalog, which keeps the solution simpler. We keep the account details in a GitHub repository. Any changes to the repository must go through a reviewed and accepted pull request, after which the GitHub actions apply the changes to our AWS Organizations.

Benefits

The benefits of a multi-account approach can be seen with a relatively small number of accounts (e.g., three). The main benefits relate to security, prototyping, and cost management. There are also other smaller benefits to this approach. This means that one should adopt multi-account approach regardless of tools used for it (AWS Control Tower, Terraform, Something totally different).

Each AWS account provides security and access boundaries that form a contained entity. This allows you to restrict access to sensitive data, e.g., in production. You can define coarse-grained access control per account, which is a valuable complement to more fine-grained permission policies.

To make prototyping easier, we have created some short-lived sandbox accounts per project. For example, I want to test some new service that was launched recently. Then I will create a new AWS account for it. Because there is one account for each project, it is easy to delete all associated resources when the prototype is finishedfinished and the resources are no longer needed. Because of the account boundaries, people working on one prototype do not have to worry about adverse effects on other projects. The same is true for AWS service quotas and API request rate limits.

AWS accounts also create a billing limit, which makes managing costs easier. The Accounts can be organized into Organizational Units (OU). Here at Brightly we have defined account-level ownership, which provides cost and usage awareness. The use of sandbox accounts has also been useful in this regard.

On a high level, AWS’s approach with Control Tower could be described as elaborate and robust, but our solution is simpler, and more customizable. Check the comparison table below for further details:

AWS Control Tower Terragrunt and GitHub
+ Robust + Cheaper
+ Scalable + Faster (with a small number of accounts)
+ Extensive documentation + Simple
+ Customizable
- The resources used generate some costs - Does not scale well
- Complex to set up - Custom built
- Can be confusing

Technical details

We use Terragrunt for setting up the baseline configuration on all accounts. For those who are not familiar with it, it is a wrapper for Terraform that allows you to reuse your Terraform code more efficiently. Ultimately, Terragrunt relies on Terraform modules. Terragrunt is a much more opinionated tool than Terraform, which means that it forces users to a specific pattern. All the resources are still defined in Terraform and Terragrunt is used to refer to the Terraform modules with the correct variables.

At a high level, creating an AWS account with our tool will do the following things:

  1. Create a new account for the AWS Organization
  1. Log into each account (IAM assume role) and add baseline configuration
  1. Assign users using the AWS IAM Identity Center

Steps 1 and 3 are quite simple and have Terraform resources for them. The difficulty comes from the fact that the third step expects the baseline configuration to be in place and it will fail if there is no corresponding IAM role in the target account. To be precise, we define a Permission set for users in the AWS IAM Identity Center. For the user assignment to work, there must be a matching IAM role in the target account. Our solution is to ask Terragrunt to output all accounts defined for each Organizational Unit (OU). We then loop through each account in the list. Because Terragrunt handles setting up the Terraform backend for us, we can apply the same Terragrunt code for each account in the OU. After that we can move on to the third step of the setup.

AWS Control Tower Landing Zones handle this with an elaborate setup of AWS Step Functions. We chose a simpler approach, but you may want to use the AWS Control Tower, especially if you have more than 20 AWS accounts.

Summary

AWS governance is a lot of work, and it makes sense to have tools for it. We studied one service provided by AWS and realized that we could build one ourselves that would suit our needs better than the AWS service. Now we can leverage several AWS accounts effectively.

Examples

Here is an example of Terragrunt in use. Not much to see since most of the configuration is in the module.

terraform { 
  source = "git::https://example.com/example/example-modules//aws/example-module?ref=v0.1.4" 
} 

include "root" { 
  path   = find_in_parent_folders() 
  expose = true 
} 

# Indicate the input values to use for the variables of the module. 

inputs = { 
  sg_names = ["example-sg”] 
  tags = include.root.inputs.tags 
}

With these commands one can validate and apply the changes.

terragrunt run-all validate 
terragrunt run-all hclfmt --terragrunt-check 
terragrunt apply --terragrunt-non-interactive

Then we just want to do it again to all accounts in the organizational unit

account_list=$(terragrunt output --json | jq -c '.training_ou_accounts') 

for acc in $account_list; do 
 	rm -rf $(find . -name .terragrunt-cache) 
  	clean_acc_no=$(echo $acc | tr -d '"') 
  	echo $clean_acc_no 

# check account status and skip all accounts that are not active 

  	acc_status=$(aws organizations describe-account --account-id "$clean_acc_no" | jq -r '.Account.Status') 
 	if [[ $acc_status != "ACTIVE" ]]; then 
      		echo "Account not active" 
      		continue 
  	fi 

 	export CURRENT_AWS_ACC="$clean_acc_no" 

TERRAGRUNT_IAM_ROLE="arn:aws:iam::$clean_acc_no:role/OrganizationAccountAccessRole" 

terragrunt run-all validate --terragrunt-non-interactive 
terragrunt run-all hclfmt --terragrunt-check 

  	terragrunt run-all plan 
  	terragrunt run-all apply --terragrunt-non-interactive 

done

About the author
Panu Simolin
Panu is a solution-oriented developer with over an decade of experience on software engineering. He specializes in DevOps as well as forward-thinking public cloud and network solutions.

How can
we help you?
Are you looking for data driven digital solutions that add business value? Our senior technical experts help you build just the right solutions for your unique challenges and operational environment.