0% found this document useful (0 votes)
321 views42 pages

Definitive Guide To Elastic Kubernetes Service (EKS) Security

EKS clusters require security measures across three main layers - workloads, the cluster itself, and underlying AWS services. The document provides guidance on designing secure EKS clusters including following best practices for the AWS account, using private subnets, dedicating IAM roles, and controlling access to nodes. It also covers securing cluster components, container images, runtimes, and monitoring the cluster.

Uploaded by

Sepehr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
321 views42 pages

Definitive Guide To Elastic Kubernetes Service (EKS) Security

EKS clusters require security measures across three main layers - workloads, the cluster itself, and underlying AWS services. The document provides guidance on designing secure EKS clusters including following best practices for the AWS account, using private subnets, dedicating IAM roles, and controlling access to nodes. It also covers securing cluster components, container images, runtimes, and monitoring the cluster.

Uploaded by

Sepehr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Definitive Guide to

Elastic Kubernetes Service (EKS) Security


Table of Contents
Introduction 3 Networkings 27
Install Calico for Cluster Network Controls 27
Cluster Design 5 Limit Network Access to the Kubernetes API Endpoint 28
Cloud Infrastructure Security 5 Enable the Private Endpoint for the Kubernetes API 28
VPC Layout 5 Block Access to Kubelet 29
Dedicated IAM Roles 6 Securing Service Load Balancers 30
Managed vs Self-managed Node Groups 7
Cluster Resource Tagging 8 Container Images 31
Control SSH Access to Nodes 9 Build Secure Images 31
EC2 Security Groups for Nodes 9 Use a Vulnerability Scanner 32

Cluster Add-ons 13 Runtime Security for Workloads 33


Namespaces 33
Don't Install the Kubernetes Dashboard 13
Kubernetes RBAC 34
Fargate for Nodeless EKS 14 Protect the Cluster's RBAC Authorization Configuration 34
Amazon EFS CSI Driver 14 Limit Container Runtime Privileges 35
Protecting EC2 Instance Metadata Credentials 16 Use Pod Security Policies 36
and Securing Privileged Workloads Use an Admission Controller to Enforce Best Practices 37
IAM Policies and Principles of Least Privilege 17
Monitoring and Maintenance 39
Isolating Critical Cluster Workloads 18 Collect Control Plan Logs 39
Manage IAM Credentials for Pods 19 Monitor Container and Cluster Performance for Anomalies 39
Step-by-Step Instructions for Isolating Critical Add-ons 21 Monitor Node (EC2 Instance) Health and Security 40
Keep Clusters Up-to-date 41

Conclusion 42

2 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Definitive Guide to
Elastic Kubernetes Service (EKS) Security
As the desire to leverage the power of Kubernetes continues to grow at a phenomenal pace, so
has the need for help managing this complex container platform. Major cloud and on-premises
providers are racing to provide Kubernetes solutions for their customers. In response to this
growing demand for managed Kubernetes, Amazon Web Services (AWS) launched their Elastic
Kubernetes Service (EKS) in June 2018.

While the Cloud Native Computing Federation requires all Customers share the responsibility for the security and
certified Kubernetes offerings to use a standard set of core compliance of their use of services with AWS. AWS takes
API groups, their conformance program still leaves a great responsibility for securing their infrastructure and addressing
deal of leeway for the exact feature set and management of security issues in their software. The customer must ensure
the various Kubernetes managed services. Users of these the security of their own applications and also must correctly
platforms will need to understand what features and tools use the controls offered to protect their data and workloads
their cloud provider makes available, as well as which pieces within the AWS infrastructure.
of the management role falls on the user. That share of the
workload becomes even more critical with respect to securing Kubernetes brings some specific security requirements to
the Kubernetes cluster, the workloads deployed to it, and its the table. For a managed Kubernetes service like EKS, users
underlying infrastructure. have three main layers that require action: the workloads

3 Definitive Guide to Elastic Kubernetes Service (EKS) Security


running on the cluster, the cluster and its components, and the underlying AWS services on which the
cluster depends, which include much of the Elastic Compute Cloud (EC2) ecosystem: instances, storage,
Virtual Private Cloud (VPC), and more. Lack of adequate security for any one of these areas puts them all at
risk. Unreliable container images, applications with sub-par security, or containers with excessive runtime
permissions increase the possibility that the EC2 instances which act as the EKS cluster worker nodes could
be infiltrated. Inadequate firewall protections for the EKS control plane put the integrity of the cluster and its
workloads in danger. Additionally, customers still need to address all the protections required for ensuring
the security of the cluster’s EC2 instances and other VPC tenants.

We will cover general Kubernetes cluster security, including the standard controls and best practices for
minimizing the risk around cluster workloads. We will go over the specific requirements for securing an EKS
cluster and its associated infrastructure. EKS as a service requires a great deal of care and consideration,
not to mention actual effort, before it can be used securely for production workloads. We hope this guide
proves valuable for achieving a secure cloud-native journey.

4 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Cluster Design
The path to running secure EKS clusters starts with designing a secure cluster. By understanding the controls available for
Kubernetes and EKS, while also understanding where EKS clusters need additional reinforcement, it becomes easier to implement
and maintain cluster security.

For existing clusters, most of these protections can be applied at any time. However, creating replacement clusters and migrating
workloads would provide a fresh environment and an opportunity to apply workload protections, described in later sections, as
the deployments get migrated.

Cloud Infrastructure Security


Why: A secure EKS cluster needs to run in a secure AWS environment.

What to do: Follow recommended security best practices for your AWS account, particularly for AWS IAM
permissions, VPC network controls, and EC2 resources.

VPC Layout
Why: Creating a VPC with network security in mind will be key for protecting EKS nodes from external
threats.

What to do: Place your nodes on private subnets only. Public subnets should only be used for external-
facing load balancers or NAT gateways. For external access to services running on the EKS cluster, use load
balancers managed by Kubernetes Service resources or Ingress controllers, rather than allowing direct
access to node instances. If your workloads need Internet access or you are using EKS managed node
groups, with their requirement of having Internet access, use NAT gateways for VPC egress.

5 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Dedicated IAM Role for Cluster Creation
Why: EKS gives the IAM user or role creating the cluster permanent authentication on the cluster’s
Kubernetes API service. AWS provides no ability to make this grant optional, to remove it, or to move it to a
different IAM user or role. Furthermore, by default, this cluster user has full admin privileges in the cluster’s
RBAC configuration. The official EKS documentation does not explicitly discuss this implementation or its
serious implications with respect to the user’s ability to manage cluster access effectively.

What to do: Your options for locking down this access depend on whether you are trying to secure existing
clusters or create new clusters. For reference, the cluster creator authenticates against the Kubernetes API
as user kubernetes-user in the group system:masters.

For new clusters:


1. Use a different dedicated IAM role for each cluster to create the cluster.
2. After creation, remove all IAM permissions from the role.
3. Update the aws-auth ConfigMap in the kube-system namespace to add more IAM users/roles
assigned to groups other than system:masters.
4. Add these groups as subjects of RoleBindings and ClusterRoleBindings in the cluster RBAC as
needed.
5. Test the changes using the credentials or assuming the role of those IAM entities.
6. Edit the cluster-admin ClusterRoleBinding to remove the system:masters group as a subject.
Important: Note that if you edit or remove the cluster-admin ClusterRoleBinding without first
adding alternative ClusterRoleBindings for admin access, you could lose admin access to the cluster
API.
7. Remove the right to assume this role from all IAM entities. Grant the right to assume this role only
when needed to repair cluster authorization issues due to misconfiguration of the cluster’s RBAC
integration with AWS IAM.

6 Definitive Guide to Elastic Kubernetes Service (EKS) Security


For existing clusters:
1. Update the aws-auth ConfigMap in the kube-system namespace to add more IAM users/roles
assigned to groups other than system:masters.
2. Add these groups as subjects of RoleBindings and ClusterRoleBindings in the cluster RBAC as
needed.
3. Test the changes using the credentials or assuming the role of those IAM entities.
4. Edit the cluster-admin ClusterRoleBinding to remove the system:masters group as a subject.
Important: Note that if you edit or remove the cluster-admin ClusterRoleBinding without first
adding alternative ClusterRoleBindings for admin access, you could lose admin access to the cluster
API.

You may also want to go through your AWS support channels and ask them to change this immutable
cluster admin authentication behavior.

Managed vs Self-managed Node Groups


Why: AWS introduced managed node groups at re:Invent 2019 to simplify the creation and management of
EKS node groups. The original method of creating EKS node groups, by creating an AWS Autoscaling Group
configured for EKS, can also still be used. Both types of node groups have advantages and disadvantages.

Managed Node Groups:


• Benefits
• Easier to create
• Some reduction of user management requirements during node version patching/upgrades by
draining nodes of pods and replacing them

• Drawbacks and limitations


• Every node gets a public IP address and must be able to send VPC egress traffic to join the cluster,
even if the node is on a private subnet and the EKS cluster has a private Kubernetes API endpoint.
• Greatly reduced user options for instance and node configuration, such as not allowing custom

7 Definitive Guide to Elastic Kubernetes Service (EKS) Security


instance user data (often used for installing third-party monitoring agents or other system
daemons) or supporting the addition of automatic node taints. In particular, the inability to
modify instance user data has the side effect of leaving customization of the cluster networking
unsupported for managed node groups and makes it very difficult to install monitoring agents
directly on the nodes.
• Only Amazon Linux is supported.

Self-managed Node Groups:


• Benefits
• User has much more control over node and network configuration
• Supports Amazon Linux, Ubuntu, or even custom AMIs for node image
• Drawbacks
• Node group creation is not as automated as it is for managed node groups. Tools like eksctl can
automate much of this work, however.
• Requires manually replacing nodes or migrating to new node groups during node version
upgrades. eksctl or other automation can ease this workload.

What to do: Decide which type suits your needs best. The fact that every node in a managed group gets
a public IP address will be problematic for some security teams, while other teams may be more able or
willing to fill the automation gap between managed and self-managed node groups.

Cluster Resource Tagging


Why: Using unique AWS resource tags for each cluster will make it easier to limit IAM resource permissions
to specific clusters by using conditionals based on tag values. Not all resources support tags, but most do,
and AWS continues to add support for tagging existing resource types.

What to do: Tag your resources.

8 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Control SSH Access to Nodes
Why: Workloads running in pods on the cluster should never need to ssh to the nodes themselves.
Blocking the pods’ access to the ssh port reduces the opportunities a malicious pod has for getting access
directly on the node.

What to do: Options for preventing access to the node’s SSH port:
• Do not enable ssh access for node instances. In rare cases, not being able to ssh to the node may
make troubleshooting more difficult, but system and EKS logs generally contain enough information for
diagnosing problems. Usually, terminating problematic nodes is preferable to diagnosing issues, unless
you see frequent node issues which may be symptomatic of chronic problems.
• Install the Calico CNI (Container Network Interface) and use Kubernetes Network Policies to block all
pod egress traffic to port 22.
• You can use the AWS Systems Manager Session Manager instead of running sshd to connect to nodes.
Note that this option only works for self-managed node groups as it requires installing an agent on
each node. You need to add the agent’s installation to the user data field, which cannot be modified in
managed node groups.
• See the next section on EC2 security groups for EKS nodes for additional options.

EC2 Security Groups for Nodes


Why: The ability to provide network Isolation of the pods to any system services listening on the node
provides critical control for safeguarding the node itself from malicious or compromised pods that may be
running. However, it requires considerable effort to accomplish this traffic segmentation effectively in EKS
clusters.

9 Definitive Guide to Elastic Kubernetes Service (EKS) Security


EC2 security groups assigned to an ENI (Elastic Network Interface) apply to all the IP addresses associated
with that ENI. Because the AWS VPC CNI (Container Network Interface) used for EKS cluster networking
defaults to putting pod IP addresses on the same ENI as the node’s primary IP address, the node shares EC2
security groups with the pods running on the node. No simple way exists under this scheme to limit traffic
to and from the pods without also affecting the nodes, and vice versa.

The issue gets more complicated. Starting with Kubernetes 1.14, EKS now adds a cluster security group that
applies to all nodes (and therefore pods) and control plane components. This cluster security group has one
rule for inbound traffic: allow all traffic on all ports to all members of the security group. This security group
ensures that all cluster-related traffic between nodes and the control plane components remains open.
However, because the pods also by extension share this security group with the nodes, their access to the
nodes and the control plane is also unrestricted on the VPC network.

Each node group also generally has its own security group. Users of self-managed node groups will need to
create the security group for the node group. Managed node groups each have an automatically-created
security group. Why not just add a rule to the node security groups limiting access to port 22 for ssh,
perhaps to the security group of a bastion host? Unfortunately, that won’t work, either. All security group
rules are of type “allow.” When an EC2 network interface has two or more security groups that apply, their
rules get merged additively. If multiple groups have rules that include the same port, the most permissive
rule for that port gets applied.

What to do: EKS users have a few options that can be used alone or in combination for mitigating the
default EKS security group issues.

• Users can customize the AWS VPC CNI networking configuration. The default behavior of the CNI, to
share the node’s primary ENI with pods, can be changed to reserving one ENI for only the node’s IP
address by setting the configuration variable AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG to true.
• Benefits
• The nodes can now have their own security groups which do not apply to pods.
• Also allows the pods to be placed on different subnets than the nodes, for greater control and to
manage limited IP space.

10 Definitive Guide to Elastic Kubernetes Service (EKS) Security


• Drawbacks
• This solution cannot be used for clusters with managed node groups, as it requires changing the
user data used to bootstrap EC2 instances, which managed node groups do not allow.
• Not supported with Windows nodes.
• Customizing the AWS VPC CNI behavior for a cluster requires a great deal of work, including tasks
required when new subnets for nodes are added to the VPC or when new node groups get added to
the cluster.
• Because one ENI becomes dedicated for the node’s IP address, the maximum number of pods that
can run on the node will decrease. The exact difference depends on the instance type.

• Create a subnet in the VPC dedicated to bastion hosts and which has its own network ACL separate from
the cluster’s subnets. Add a rule to the network ACL for the cluster’s subnets to allow access to sensitive
ports like ssh on the node subnets only from the bastion subnet’s CIDR block and follow that rule with an
explicit DENY rule from 0.0.0.0/0.
• Benefits
• Relatively simple to set up
• Generally requires no regular maintenance or updates
• Drawbacks
• Applies to all interfaces on the node subnets, which may have unintended consequences if the
subnets are shared with other non-EKS EC2 instances and resources.

• Close down the cluster security group, which is left completely open for all traffic within the security
group by default. Note that you will still need to ensure that the cluster security group allows access to
critical ports on the control plane, especially the cluster API endpoint.
• Benefits
• Addresses the security group issue at the security group level
• Drawbacks
• Requires time and research to craft the rules to restrict ingress traffic without breaking cluster
networking

11 Definitive Guide to Elastic Kubernetes Service (EKS) Security


• When using the default AWS VPC CNI configuration, these restrictions would apply to the nodes as
well as the pods.
• Cluster upgrades might overwrite these changes.

While some of the recommendations in this section require action before or at the time of cluster creation,
many can still be applied later. If necessary, prioritize the following protections, and then work on the
remaining tasks as time permits.

• Cloud Infrastructure Security - Creating and maintaining best practices for your AWS account and
the resources you use will always require ongoing vigilance. Always start here when tightening your cloud
security.

• Control SSH Access to Nodes - Isolating the nodes from the containers they host as much as possible is
a critical protection for your cluster. In addition, limit node SSH access from other sources.

12 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Cluster Add-ons
EKS leaves the task of installing and managing most AWS service integrations and common Kubernetes extensions to the user.
These optional features are often called add-ons. They often require heightened privileges or present other challenges addressed
below.

Don’t Install the Kubernetes Dashboard


Why: While the venerable Kubernetes Dashboard is still a popular web UI alternative to kubectl, its lack of
security features makes it a potentially huge risk for cluster security breaches for several reasons.
• It requires cluster admin RBAC permissions to function properly. Even if it has only been granted read
permissions, the dashboard still shares information that most workloads in the cluster should not need
to access. Users who need to access the information should be using their own RBAC credentials and
permissions.
• The installation recommended in the EKS docs tells users to authenticate when connecting to the
dashboard by fetching the authentication token for the dashboard’s cluster service account, which, again,
may have cluster-admin privileges. That means a service account token with full cluster privileges and
whose use cannot be traced to a human is now floating around outside the cluster.
• Even if users set up another authentication method, like putting a password-protected reverse proxy in
front of the dashboard, the service has no method to customize authorization per user.
• Everyone who can access the dashboard can make any queries or changes permitted by the service’s
RBAC role. Even with a role binding that grants read-only permissions for all Kubernetes resources, users
could still read all the cluster secrets. Tesla’s AWS account credentials were stolen and exploited exactly
this way.
• The Kubernetes Dashboard has been the subject of a number of CVEs. Because of its access to the
cluster’s Kubernetes API and its lack of internal controls, vulnerabilities can be extremely dangerous.

What to do: Don’t.

13 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Fargate for Nodeless EKS
Why: Many EKS users were excited when AWS introduced the ability to run EKS pods on the “serverless”
Fargate service. Using Fargate reduces the number of nodes that users need to manage, which, as we
have seen, has a fair amount of operational overhead for the user. It also handles on-demand, temporary
capacity for fluctuating workloads. However, there are some serious drawbacks to using Fargate with EKS,
both operational and for workload security.
• Kubernetes network policies silently have no effect on pods assigned to Fargate nodes. Daemon sets,
which put a pod for a service on each node, cannot place pods on the Fargate virtual nodes. Even if Calico
could run as a sidecar in a pod, it would not have permission to manage the pod’s routing, which requires
root privileges. Fargate only allows unprivileged containers.
• Active security monitoring of a container’s actions on Fargate becomes difficult or nearly impossible.
• Any metrics or log collectors that a user may normally run as a cluster daemon set will also have to be
converted to sidecars, if possible.
• EKS still requires clusters that use Fargate for all their pod scheduling to have at least one node.
• The exact security implications and vulnerabilities of running EKS pods on Fargate remain unknown for
now.

What to do: At this time, even AWS does not recommend running sensitive workloads on Fargate.

For users who have variable loads, the Kubernetes cluster autoscaler can manage creating and terminating
nodes as needed based on the cluster’s current capacity requirements.

Amazon EFS CSI Driver


Why: The EFS (Elastic File System) CSI (Container Storage Interface) allows users to create Kubernetes
persistent volumes from EFS file systems. The Amazon EFS CSI Driver enables the use of EFS file systems as
persistent volumes in EKS clusters.

14 Definitive Guide to Elastic Kubernetes Service (EKS) Security


EFS file systems are created with a top-level directory only writable by the root user. Only a root user can
chown (change ownership) of its own directory, but allowing a container in EKS to use the file system by
running as root creates extremely serious security risks. Running containers only as unprivileged users
provides one of the most important protections for the safety of your cluster.

What to do: Currently the EFS CSI driver cannot actually create EFS file systems. It can only make existing file
systems available to an EKS cluster as persistent volumes. Make changing the permissions or ownership of
the top-level directory part of the standard procedure for creating EFS file systems for use by the driver.

As an alternative, you can add an init container which does have root privileges to chown the filesystem to
the runtime user. When a pod has an init container, the main containers will not run until the init container
has exited successfully. This method is less secure overall, although the risk of exploitation of the init
container running as root is slightly decreased due to the short lifespan of init containers. An example
method could be by adding a block like the following to your PodSpec:

initContainers:
- name: chown-efs
# Update image to current release without using latest tag
image: busybox:1.31.1
# Put your application container’s UID and EFS mount point here
command: [‘sh’, ‘-c’, ‘/bin/chown 1337 /my/efs/mount’]
securityContext:
# We’re running as root, so add some protection
readOnlyRootFilesystem: true
volumeMounts:
- name: efs-volume
mountPath: /my/efs/mount

Then make sure to add runAsUser: 1337 (or whatever UID you chose) to the pod-level securityContext.

15 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Protecting EC2 Instance Metadata Credentials and Securing Privileged
Workloads
Why: The EKS documentation provides instructions for installing a number of Kubernetes services and
controllers into an EKS cluster. Many of these add-ons are widely used in the Kubernetes community,
including the kube-metrics service, which collects key resource usage numbers from running containers and
is a required dependency of the horizontal pod autoscaler (HPA); the cluster autoscaler, which creates and
terminates nodes based on cluster load; and Prometheus for collecting custom application metrics.

Some of these popular components need to interact with AWS service APIs, which requires AWS IAM
credentials. The EKS documentation for installing and configuring some of these services does not always
address or recommend secure methods for keeping these often-critical deployments and their IAM
credentials safely isolated from user workloads.

Add-ons covered in the EKS documentation that require IAM credentials include the cluster autoscaler, the
Amazon EBS CSI driver, the AWS App Mesh controller, and the ALB ingress controller. Other third-party
Kubernetes controllers, including ingress controllers, container storage interface (CSI) drivers, and secrets
management interfaces, may also need the ability to manage AWS resources.

Some of these add-ons which serve important functions for management and maintenance of the cluster,
like the cluster autoscaler, and which tend to be standard options on other managed Kubernetes offerings,
would normally run on master nodes, which generally allow only pods for critical system services. For
managed Kubernetes services, the cloud provider would generally install the service on the control plane
and the user would only need to configure the runtime behavior. Because EKS will not install or manage
these services, not only does that work fall on users who need them, the user also has to take extra
measures to lock down these services because they are not in their normally isolated environment.

16 Definitive Guide to Elastic Kubernetes Service (EKS) Security


In addition to issues around managing the method used to deliver the IAM credentials to the component
pods, all too often, rather than strictly following the principle of least privilege, the recommended IAM
policies for these services are excessive or not targeted to the specific cluster resources that the component
needs to manage.

What to do: Locking down these critical, privileged components requires addressing several different areas
of concern. Below, we describe some of the general considerations for managing IAM credentials for cluster
workloads. At the end of this section, we give step-by-step instructions for locking down privileged add-ons
and isolating their IAM credentials.

IAM Policies and the Principle of Least Privilege

Carefully evaluating the recommended IAM policies for opportunities to limit the scope to the resources
actually needed by the cluster, as well as protecting the IAM credentials tied to these policies, becomes
critical to reducing the risk of the exploitation of the access key or token and to limiting the potential for
damage if the credentials do get compromised.

Privileges for many AWS API actions can be scoped to a specific resource or set of resources by using the
IAM policy Condition element. In the section on Cluster Design, we recommended tagging all your cluster
resources with unique tags for each cluster, largely because tag values can help limit IAM permission grants.
Some AWS API actions do not support evaluating access by tags, but those actions tend to be read-only
calls like Describe* (list). Write actions, which can modify, create, or destroy resources, usually support
conditional scoping.

For example, compare the suggested policy for the cluster autoscaler on the left with the policy on the right,
which scopes write actions to resources with a specific cluster’s tag.

17 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Isolating Critical Cluster Workloads

As mentioned above, controllers that play a role in managing the cluster itself normally would get deployed
to the master nodes or control plane of a cluster, out of reach of user workloads on the worker nodes. For
managed services like EKS, unless the provider offers the option to manage the add-on itself, the user can
only install these components on worker nodes.

We can still add a measure of security by mimicking to a degree the isolation of master nodes by creating a
separate node group dedicated to critical workloads. By using node taints, we can restrict which pods can

18 Definitive Guide to Elastic Kubernetes Service (EKS) Security


get scheduled on a node. (Note that any pod can still get scheduled on a node with a taint if they add the
correct toleration. An admission controller like Open Policy Agent Gatekeeper should be used to prevent
user pods from using unauthorized tolerations.)

Manage IAM Credentials for Pods

A common method to deliver AWS IAM credentials to single-tenant EC2 instances uses the EC2 metadata
endpoint and EC2 instance profiles, which are associated with an IAM role. One of the metadata endpoint’s
paths, /iam/security-credentials, returns temporary IAM credentials associated with the instance’s
role. Kubernetes nodes, however, are very much multi-tenant. Even if all the workloads on your nodes are
the applications you deployed, the mix of workloads provides more opportunities that an attacker can try to
exploit to get access. Even if you wanted the pods of one deployment to share the instance’s credentials and
IAM permissions, which still is not a good idea, you would have to prevent all the other pods on the node
from doing the same. You also do not want to create static IAM access keys for the pods and deliver them as
a Kubernetes secret as an alternative. Some options exist for managing pod IAM tokens and protecting the
credentials belonging to the instance.

1. For the cluster deployments that do need IAM credentials, a few options exist.
a. AWS offers a way to integrate IAM roles with cluster service accounts. This method requires
configuring an external OpenID provider to serve the authentication tokens.
b. Two similar open-source projects exist to intercept requests from pods to the metadata endpoint
and return temporary IAM credentials with scoped permissions: kiam and kube2iam. kiam seems
to be a little more secure, offering the ability to limit the AWS IAM permissions to server pods, which
should therefore be strongly isolated like the controllers mentioned above.
c. Use a service like Hashicorp Vault, which can dynamically manage AWS IAM access and now
supports Kubernetes pod secret injection.

2. (Skip this step if you are using kiam or kube2iam.) Use Network Policies to block pod egress to the
metadata endpoint after installing the Calico CNI. You can use a Kubernetes NetworkPolicy resource
type, but note these resources would need to be added to every namespace in the cluster.

19 Definitive Guide to Elastic Kubernetes Service (EKS) Security


apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-instance-metadata
namespace: one-of-many
spec:
podSelector:
matchLabels: {}
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0 # Preferably whitelist something smaller
except:
- 169.254.169.254/32

However, Calico offers some additional resource types, including a GlobalNetworkPolicy which applies to
all the namespaces in the cluster.

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
name: deny-instance-metadata
spec:
types:
- Egress
egress:
- action: Deny
destination:
nets:
- 169.254.169.254/32
source: {}

20 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Step-by-Step Instructions for Isolating Critical Add-ons

This section will use the Kubernetes cluster autoscaler service as an example for how to deploy privileged
add-ons with some additional protections. The EKS documentation for installing the cluster autoscaler,
which can dynamically manage adding and removing nodes based on the cluster workload, does not always
follow best practices for either cluster or EC2 security. The alternative shown below will yield more security,
partially by mimicking the protected nature of master nodes in a standard Kubernetes cluster.

Note that the cluster autoscaler service can only manage scaling for the cluster in which it is deployed. If you
want to use node autoscaling in multiple EKS clusters, you will need to do the following for each cluster.

1. The recommended IAM policy for the cluster autoscaler service grants permission to modify all the
autoscaling groups in the AWS account, not just the target cluster’s node groups. Use the following policy
instead to limit the write operations to one cluster’s node groups, replacing the CLUSTER_NAME in the
Condition with your cluster’s name. (The Describe* operations cannot be restricted by resource tags.)

{
"Version": "2012-10-17",
"Id”: "LimitedClusterAutoscalerPermissions",
"Statement": [
{
"Sid": "RestrictByTag",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "*",
"Effect": "Allow",
"Condition": {
"StringLike": {
"autoscaling:ResourceTag/kubernetes.io/cluster/CLUSTER_NAME": "owned"
}
}

21 Definitive Guide to Elastic Kubernetes Service (EKS) Security


},
{
"Sid": "TagConditionNotSupported",
"Action": [
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeLaunchCon igurations",
"autoscaling:DescribeTags",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": "*",
"Effect": "Allow"
}
]
}

2. Create a separate node group for the privileged controllers. Using nodes that are not shared with
unprivileged user workloads will help restrict access to these additional AWS IAM permissions and will
help safeguard critical system workloads like the autoscaler service.
a. Use a node taint not used for other node groups. (Managed node groups do not support automatic
tainting of nodes, so either use a self-managed node group or automate a way to taint this group’s
nodes when they join the cluster.) This example uses a taint with key=privileged, value=true, of type
NoSchedule.
b. Add a label for the nodes in the form node-group-name=<name of privileged node group>
c. Attach the above IAM policy to the IAM role used by the node group’s EC2 instance profile, or,
preferably, set up one of the alternative IAM credential delivery methods described above.

3. Make sure the autoscaling groups for the node groups that need to be managed by the cluster autoscaler
have the following tags (eksctl will add these automatically when it creates node groups):
a. k8s.io/cluster-autoscaler/<cluster-name>: owned
b. k8s.io/cluster-autoscaler/enabled: true

22 Definitive Guide to Elastic Kubernetes Service (EKS) Security


4. Find the latest patchlevel of the cluster autoscaler version that matches your EKS Kubernetes version
here. Note that the major and minor versions of the cluster autoscaler should match, so Kubernetes 1.14
will use a cluster autoscaler version starting with “1.14.” You will use this version number in the next step.

5. Save the following script to a file and set the variables at the top to the appropriate values for your
cluster. You may need to make further edits if you use a different node taint or node group label than the
examples in step 2.

#!/bin/bash

set -e

# Set CLUSTER_NAME to the name of your EKS cluster


CLUSTER_NAME=<cluster name>

# Set PRIVILEGED_NODE_GROUP to the name of your target node group


PRIVILEGED_NODE_GROUP=<node group name>

# Set AUTOSCALER_VERSION to the correct version for your EKS


# cluster Kubernetes version
AUTOSCALER_VERSION=<version string>

# Deploy
kubectl apply -f \
https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-
autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

# Update the container command with the right cluster name


# We don’t use kubectl patch because there is no predictable way to patch #
positional array parameters if the command arguments list changes kubectl -n
kube-system get deployment.apps/cluster-autoscaler -ojson \
| sed ‘s/\\u003cYOUR CLUSTER NAME\\u003e/’"$CLUSTER_NAME"’/’\
| kubectl replace -f -

23 Definitive Guide to Elastic Kubernetes Service (EKS) Security


# Annotate the deployment so the autoscaler won’t evict its own pods
kubectl -n kube-system annotate deployment.apps/cluster-autoscaler \
--overwrite cluster-autoscaler.kubernetes.io/safe-to-evict="false"

# Update the cluster autoscaler image version to match the Kubernetes


# cluster version and add flags for EKS compatibility to the cluster
# autoscaler command
kubectl -n kube-system patch deployment.apps/cluster-autoscaler --type=json \
--patch="$(cat <<EOF
[
{
"op": "replace",
"path": "/spec/template/spec/containers/0/image",
"value": "k8s.gcr.io/cluster-autoscaler:v${AUTOSCALER_VERSION}"
},
{
"op": "add",
"path": "/spec/template/spec/containers/0/command/-",
"value": "--skip-nodes-with-system-pods=false"
},
{
"op": "add",
"path": "/spec/template/spec/containers/0/command/-",
"value": "--balance-similar-node-groups"
}
]
EOF
)"

# Add the taint toleration and the node selector blocks to schedule
# the cluster autoscaler on the privileged node group
kubectl -n kube-system patch deployment.apps/cluster-autoscaler \
--patch="$(cat <<EOF
spec:
template:

24 Definitive Guide to Elastic Kubernetes Service (EKS) Security


spec:
tolerations:
- key: "privileged"
operator: "Equal"
value: "true"
effect: "NoSchedule"
nodeSelector:
node-group-name: ${PRIVILEGED_NODE_GROUP}
EOF
)"

6. Check to make sure the cluster-autoscaler pod in the kube-system namespace gets scheduled on a
privileged node and starts running successfully. Once it is running, check the pod’s logs to make sure
there are no errors and that it can successfully make the AWS API calls it requires.

7. Do not allow user workloads to add a toleration for the taint used by the privileged nodegroup. One
method to prevent unwanted pods from tolerating the taint and running on the privileged nodes would
be to use an admission controller like the Open Policy Agent’s Gatekeeper.

To complete protection of the IAM credentials on all your cluster’s nodes, use kiam, kube2iam, or IAM role
for EKS service accounts (the last option requires configuring a OpenID Connect provider for your AWS
account) to manage scoped credentials for the cluster autoscaler and other workloads that need to access
AWS APIs. This would allow you to detach the IAM policy above from the node group’s instance profile
and to use a Network Policy to block pod access to the EC2 metadata endpoint to protect the node’s IAM
credentials.

25 Definitive Guide to Elastic Kubernetes Service (EKS) Security


The recommendations in this section only apply if you use or are considering installing privileged cluster
add-ons. Even without the additional tasks required to deploy these safely to protect their AWS credentials,
managing add-ons in EKS can carry a good deal of operational overhead. Investing the time to perform
these additional recommended steps will protect your cluster and your AWS account from misconfigured
applications and malicious attacks.

Even if you do not use any add-ons, but you do have workloads which need to access AWS APIs, be sure to
apply a credential management option from Manage IAM Credentials for Pods.

26 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Networking

Install Calico for Cluster Network Controls


Why: By default, network traffic in a Kubernetes cluster can flow freely between pods and also leave the
cluster network altogether. In an EKS cluster, by extension, because pods share their node’s EC2 security
groups, the pods can make any network connection that the nodes can, unless the user has customized the
VPC CNI, as discussed in the Cluster Design section.

Creating restrictions to allow only necessary service-to-service and cluster egress connections decreases the
number of potential targets for malicious or misconfigured pods and limits their ability to exploit the cluster
resources.

The Calico CNI (Container Network Interface) offers the ability to control network traffic to and from the
cluster’s pods by implementing the standard Kubernetes Network Policy API. Calico also offers some custom
extensions to the standard policy type. Network policies can control both ingress and egress traffic. Different
pod attributes, like labels or namespaces, can be used in addition to standard IP CIDR blocks to define the
rules.

What to do: Install Calico and create network policies to limit pod traffic only to what is required. See our
posts on guidelines for writing ingress and egress network policies. Test the policies to make sure they block
unwanted traffic while allowing required traffic. Note that the NetworkPolicy resource type can exist in a
cluster even though the cluster’s network setup does not actually support network policies. Therefore it
would be possible to successfully create the policies, but they would have no effect.

Note that open-source Calico does not support Windows at this point, nor do most of the other CNI Network
Policy providers. If you need Windows nodes, you may want to place them on dedicated VPC subnets and
use VPC network ACLs to limit traffic to and from the nodes. Limiting pod-to-pod traffic is not possible
without a cluster-aware solution, though.

27 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Limit Network Access to the Kubernetes API Endpoint
Why: By default, EKS leaves the Kubernetes API endpoint, the management interface to the control plane,
fully open to the Internet. However, EKS runs the API server with the --anonymous-auth=true flag, which
allows unauthenticated connections and which EKS does not allow the user to disable. Even if anonymous
users are not granted any Kubernetes RBAC privileges, this option still poses a danger. We have already
seen at least one major security bug around allowing anonymous connections, last year’s Billion Laughs
Attack. Because of the potential for vulnerabilities like that bug and in general for the Kubernetes API server,
the endpoint should be protected by limiting network access to the API service only to trusted IP addresses.

What to do: EKS provides a couple options for protecting a cluster’s API endpoint.
• Disable the public endpoint and only use a private endpoint in the cluster’s VPC. This method provides
the best protection but access to the endpoint from outside the VPC, if needed, would require going
through a bastion host.
• Restrict the IP addresses that can connect to the public endpoint by using a whitelist of CIDR blocks.

In addition to protecting the API service from Internet traffic, use network policies to block traffic from pods
in the cluster to the API endpoint, only allowing workloads which require access.

Enable the Private Endpoint for the Kubernetes API


Why: As mentioned above, by default, all EKS clusters have only an endpoint publicly available on the
Internet. As a side effect, traffic between the API server and the nodes in the cluster’s VPC leaves the VPC
private network.

What to do: Enable the private endpoint for your cluster. EKS supports having both a public and private
endpoint enabled for a cluster, so even if you still require the public endpoint, you can still keep your cluster
traffic inside the VPC by using a private endpoint.

28 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Block Access to Kubelet
Why: The kubelet runs on every node to manage the lifecycle of the pod containers assigned to that node.
The kubelet needs strong protection because of its tight integration with the node’s container runtime. Even
though EKS runs the kubelet with anonymous authentication disabled and requires authorization from
the TokenReview API on the cluster API server, blocking access to its port from the pod network provides
additional safeguards for this critical service.

What to do: After installing the Calico CNI, create a GlobalNetworkPolicy to prevent all pods from
connection to the kubelet’s port 10250/TCP.

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
name: deny-kubelet-port
spec:
types:
- Egress
egress:
- action: Deny
protocol: TCP
destination:
nets:
- 0.0.0.0/0
ports:
- 10250
source: {}

29 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Securing Service Load Balancers
Why: By default, when creating a Kubernetes Service of type LoadBalancer in an EKS cluster, the cluster’s
AWS controller creates an Internet-facing ELB with no firewall restrictions other than those of the subnet’s
VPC network ACL.

What to do: If only sources inside the cluster’s VPC need to access the service’s endpoint, add the following
annotation to the Kubernetes Service manifest to create an internal load balancer: service.beta.
kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0

If the load balancer needs to be Internet-facing but should not be open to all IP addresses, you can add the
field loadBalancerSourceRanges to the Service specification.

Securing your cluster’s network traffic and access will be crucial for the entire cluster’s security.

All of the considerations in this section provide critical protection, and most of them require minimal effort.
We recommend prioritizing all of them.

30 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Container Images
Kubernetes workloads are built around container images. Ensuring that those images minimize the possibilities for exploitation
and remain free of known security vulnerabilities serves as a cornerstone of a sound EKS security strategy.

Build Secure Images


Why: Following a few best practices for building secure container images will minimize the exploitability
of running containers and simplify both security updates and scanning. Images containing only the files
required for the application’s runtime make it much more difficult for malicious attacks to compromise or
exploit the containers in your cluster. Avoid using container images with frequently exploited or vulnerable
tools like Linux package managers, web or other network clients, or Unix shells installed.

What to do:
1. Use a minimal, secure base image. Google’s “Distroless” images are a good choice because they do
not install OS package managers or shells.
2. Only install tools needed for the container’s application. Debugging tools should be omitted from
production containers.
3. Instead of putting exploitable tools like curl in your image for long-running applications, if you only
need network tools at pod start-time, consider using separate init containers or delivering the data
using a more Kubernetes-native method, such as ConfigMaps.
4. Remove the package manager from the image as a Dockerfile build step.
5. Keep your images up-to-date. This practice includes watching for new versions of both the base
image and any third-party tools you install.

31 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Use a Vulnerability Scanner
Why: Using containers free of known software security vulnerabilities requires ongoing vigilance. All the
images deployed to a cluster should be scanned regularly by a scanner that keeps an up-to-date database of
CVEs (Common Vulnerabilities and Exposure).

What to do: Use an image scanner. A number of open-source and proprietary options exist, but be sure
to choose a solution that scans for vulnerabilities in operating system packages and in third-party runtime
libraries for the programming languages your software uses. Amazon ECR (Elastic Container Registry) offers
a scanning service, but it can only detect vulnerabilities in OS packages. As an example of the importance of
software library scanning, the Apache Struts vulnerability exploited in the Equifax hack would not have been
detected by an OS package scan alone, because Struts is a set of Java libraries, and most Linux distributions
no longer manage software library dependencies that are not required by system services.

To address CVEs when they are found in your internally maintained images, your organization should have
a policy for updating and replacing images known to have serious, fixable vulnerabilities for images that
are already deployed. Image scanning should be part of your CI/CD pipeline process and images with high-
severity, fixable CVEs should generate an alert and fail a build.

If you also deploy third-party container images in your cluster, scan those as well. If those images have
serious fixable vulnerabilities that do not seem to get addressed by the maintainer, you should consider
building your own images for those tools.

Baking image scanning into your continuous integration pipeline for new container images and into cluster
monitoring to detect older images with issues will help drive improvements to building secure images. In
addition, ensuring the security of your build environment plays a crucial role in ensuring your images are
not compromised from the start.

32 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Runtime Security for Workloads
Following best practices for running your workloads on EKS plays a crucial part of keeping the cluster and all its workloads safe.
Overly privileged pods pose a huge danger if they get infiltrated.

In addition to the workload protections below, also make sure you use Kubernetes Network Policies with the Calico CNI, described
in the Networking section.

Namespaces
Why: Kubernetes namespaces provide scoping for cluster objects, allowing fine-grained cluster object
management. Kubernetes RBAC rules for most resource types apply at the namespace level. Controls
like network policies and many add-on tools and frameworks like service meshes also often apply at the
namespace scope.

What to do: Plan out how you want to assign namespaces before you start deploying workloads to your
clusters. Having one namespace per application provides the best opportunity for control, although it does
bring extra management overhead when assigning RBAC role privileges and default network policies. If you
do decide to group more than one application into a namespace, the main criteria should be whether those
applications have common RBAC requirements and whether it would be safe to grant those privileges to the
service accounts and users which need Kubernetes API access in that namespace.

33 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Kubernetes RBAC
Why: Kubernetes Role-Based Access Control provides the standard method for managing authorization for
the Kubernetes API endpoints. The practice of creating and managing comprehensive RBAC roles that follow
the principle of least privilege, in addition to performing regular audits of how those roles are delegated
with role bindings, provides some of the most critical protections possible for your EKS clusters, both from
external bad actors and internal misconfigurations and accidents.

What to do: Configuring Kubernetes RBAC effectively and securely requires some understanding of the
Kubernetes API. You can start with the official documentation, read about some best practices, and you may
also want to work through some tutorials.

Once your team has solid working knowledge of RBAC, create some internal policies and guidelines.
Make sure you also regularly audit your Role permissions and RoleBindings. Pay special attention to
minimizing the use of ClusterRoles and ClusterRoleBindings, as these apply globally across all
namespaces and to resources that do not support namespaces. (You can use the output of kubectl api-
resources in your cluster to see which resources are not namespace-scoped.)

Protect the Cluster’s RBAC Authorization Configuration


Why: EKS uses a Kubernetes ConfigMap resource to grant Kubernetes RBAC privileges to AWS IAM users
and roles. The aws-auth ConfigMap in the kube-system namespace in the cluster assigns IAM entities
to groups for use in RBAC role bindings. Protecting this ConfigMap’s contents from unauthorized writes is
absolutely critical for preventing authenticated users from increasing their RBAC permissions or removing
access for other IAM users and roles.

34 Definitive Guide to Elastic Kubernetes Service (EKS) Security


What to do: Because the ConfigMap is a core Kubernetes resource type used by many, if not most,
Kubernetes workloads, many cluster API users and some service accounts often have permission to modify
ConfigMaps in at least one namespace. Cluster owners will need to perform careful, ongoing curation of
RBAC permission grants to ensure that no unintended entities end up with the ability to change the aws-
auth contents.

Do not grant write access to ConfigMaps in ClusterRoles, which apply globally across all namespaces. Use
RoleBindings to limit these permissions to specific namespaces.

Limit Container Runtime Privileges


Why: Most containerized applications will not need any special host privileges on the node to function
properly. By following the principle of least privilege and minimizing the capabilities of your cluster’s running
containers, you can greatly reduce the level of exploitation for malicious containers and of accidental
damage by misbehaving applications.

What to do: Use the PodSpec Security Context to define the exact runtime requirements for each workload.
Use Pod Security Policies and/or admission controllers like Open Policy Agent (OPA) Gatekeeper to enforce
those best practices by the Kubernetes API at object creation time.

Some guidelines:
1. Do not allow containers to run as root. Running as root creates by far the greatest risk, because root
in a container has root on the node.
2. Do not use the host network or process space. Again, these settings create the potential for
compromising the node and every container running on it.
3. Do not allow privilege escalation.
4. Use a read-only root filesystem in the container.
5. Use the default (masked) /proc filesystem mount.

35 Definitive Guide to Elastic Kubernetes Service (EKS) Security


6. Drop unused Linux capabilities and do not add optional capabilities that your application does not
absolutely require. (Available capabilities depend on the container runtime in use on the nodes. EKS
uses the Docker runtime, which supports these capabilities. The first table lists capabilities loaded by
default, while the second table shows optional capabilities that may be added.)
7. Use SELinux options for more fine-grained process controls.
8. Give each application its own Kubernetes Service Account rather than sharing or using the
namespace’s default service account.
9. Do not mount the service account token in a container if the container does not need to access the
Kubernetes API.

Use Pod Security Policies


Why: Kubernetes Pod Security Policy provides a method to enforce best practices around minimizing
container runtime privileges, including not running as the root user, not sharing the host node’s process
or network space, not being able to access the host filesystem, enforcing SELinux, and other options. Most
cluster workloads will not need special permissions and by forcing containers to use the least-required
privilege, their potential for malicious exploitability or accidental damage can be minimized.

Pod Security Policies are enabled automatically for all EKS clusters starting with platform version 1.13. EKS
gives them a completely-permissive default policy named eks.privileged.

What to do: Create policies which enforce the recommendations under Limit Container Runtime Privileges,
shown above. Policies are best tested in a non-production environment running the same applications as
your production cluster, after which you can deploy them in production.

Once you have migrated all your workloads to stricter policies, remove the capability to
deploy user workloads using the permissive default policy by deleting the ClusterRoleBinding
eks:podsecuritypolicy:authenticated.

36 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Alternatively, because plans exist to deprecate PSPs in the future and because they only apply to a subset of
controls, consider deploying a configurable admission controller, described below.

Use an Admission Controller to Enforce Best Practices


Why: Kubernetes supports using admission controllers, which can be configured to evaluate requests to the
Kubernetes API. In the case of validating controllers, an admission controller can deny requests that fail to
meet certain requirements, while mutating controllers can make changes to the request, such as injecting a
sidecar container into a pod or adding labels to an object, before sending it to the Kubernetes API.

One increasingly popular option to use for a validating admission controller is Open Policy Agent (OPA)
Gatekeeper. The Gatekeeper admission controller uses custom Kubernetes resources to configure the
requirements for Kubernetes resources. Users can create policies tailored to their needs and applications to
enforce a variety of best practices by preventing non-conforming objects from getting created in a cluster.
While some overlap of Pod Security Policy capabilities exists, OPA allows restrictions not just on pods, but on
any cluster resource using virtually any field.

What to do: You can write a custom admission controller to suit your specific needs, or install Gatekeeper or
similar tool in your cluster. Note that while some example resources for enforcing common requirements
in Gatekeeper exist, the policy configuration language and management come with a rather steep learning
curve. As OPA and Gatekeeper gain greater adoption, more community resources should become available.

Note that Gatekeeper requires Kubernetes version 1.14 or higher.

37 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Compromised or otherwise misbehaving workloads pose possibly the greatest threat to your cluster’s
security, making enforcement of minimal privilege for the container processes critical. Using and properly
configuring and monitoring Kubernetes RBAC in the cluster, as well as limiting runtime privileges, in
combination with pod security policies or a third-party admission controller, should be among the highest
priorities for security lockdown.

38 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Monitoring and Maintenance
EKS leaves a large portion of the responsibility for applying security updates and upgrading Kubernetes versions, and for
detecting and replacing failed nodes, to the user. EKS users, especially those with multiple clusters, will want to set up some
form of automation to lighten the manual load and to ensure that critical security patches get applied to clusters quickly. They
will also need comprehensive monitoring to provide visibility into the cluster’s health and to help with the detection of possible
unauthorized activity and other security incidents.

Collect Control Plane Logs


Why: The control plane logs capture Kubernetes audit events and requests to the Kubernetes API server,
among other components. Analysis of these logs will help detect some types of attacks against the cluster,
and security auditors will want to know that you collect and retain this data.

What to do: EKS clusters can be configured to send control plane logs to Amazon CloudWatch. At a
minimum, you will want to collect the following logs:
• api - the Kubernetes API server log
• audit - the Kubernetes audit log
• authenticator - the EKS component used to authenticate AWS IAM entities to the Kubernetes API

Monitor Container and Cluster Performance for Anomalies


Why: Irregular spikes in application load or node usage can be a signal that an application may need
programmatic troubleshooting, but they can also signal unauthorized activity in the cluster. Monitoring key
metrics provides critical visibility into your workload’s functional health and that it may need performance
tuning or that it may require further investigation.

39 Definitive Guide to Elastic Kubernetes Service (EKS) Security


• Set up Amazon CloudWatch Container Insights for your cluster.
• Deploy Prometheus in your cluster to collect metrics.
• Deploy another third-party monitoring or metrics collection service.

Note that if you choose a solution like Prometheus running in your cluster, not only will you need to make
sure to deploy it securely to prevent data tampering if the cluster becomes compromised, but you will also
want to forward the most critical, if not all, metrics to an external collector to preserve their integrity and
availability.

Monitor Node (EC2 Instance) Health and Security


Why: EKS provides no automated detection of node issues. Node replacement only happens automatically if
the underlying instance fails, at which point the EC2 autoscaling group will terminate and replace it. Changes
in node CPU, memory, or network metrics that do not correlate with the cluster workload activity can be
signs of security events or other issues.

What to do: Monitor your EKS nodes as you would any other EC2 instance. For managed node groups,
Amazon CloudWatch remains the only viable option, because you can not modify the user data to automate
to install a third-party collection agent at boot time nor can you use your own AMI (Amazon Machine Image)
with the agent baked in. Self-managed node groups allow much more flexibility.

If you do use CloudWatch, you will want to enable detailed monitoring for the best observability.

40 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Keep EKS Clusters Up-to-date
Why: While EKS takes responsibility for making updated node images and control plane versions available to
its users, it will not patch your nodes or control plane for you. You will want to formulate a reliable process
for tracking these updates and applying them to your EKS cluster.

What to do: Plan how to get notifications and how to handle security patches for your cluster and its nodes.
As EKS provides little automation around any part of this process, you will probably want to create your own
automation based on your needs and resources.

• Watch for security updates from AWS.


• EKS updates (control plane and nodes)
• Linux AMI (nodes)
• AWS does not provide a feed for Windows updates
• Upgrade instructions
• Kubernetes/EKS platform upgrades
• Worker nodes
• Managed node groups
• Self-managed node groups
• AWS doesn’t manage “add-on” upgrades, or even upgrades for the mandatory AWS VPC CNI, so you
should make upgrading these components a standard part of cluster upgrades and patching.

EKS leaves a large share of operational overhead to the user. Auditors will expect all of the practices
described here to be in place. Prioritizing and maintaining all of the initial set up and ongoing tasks will be
foundational to tracking the health and security of your clusters.

41 Definitive Guide to Elastic Kubernetes Service (EKS) Security


Conclusion
As you have probably noticed, particularly if you compare it against other managed Kubernetes services,
EKS puts a large share of the burden of cluster security and maintenance on the customer. While some of
the issues and remediations mentioned here are more critical than others, they all act as important steps
toward preventing successful attacks on an EKS cluster. In the event of a successful infiltration attempt,
these controls would also severely constrict the potential blast radius of damage to the cluster and its
multiple workloads.

Ready to see StackRox in action?


Get a personalized demo tailored for your business,
environment, and needs.

42 Definitive Guide to Elastic Kubernetes Service (EKS) Security

You might also like