The Main point of the cloud and Kubernetes is the ability to scale in the way that we can be able to add new nodes if the existing ones get full and at the same if the demand drops we should be able to delete those nodes. To solve this problem we can use Kubernetes auto scaler which is a component that allows us to scale the resources up and down according to the usage this method is called Kubernetes autoscaling. There are three different methods of Kubernetes autoscaling:
- Horizontal Pod Autoscaler (HPA)
- Vertical Pod Autoscaler (VPA)
- Cluster Autoscaler (CA)
Life before Kubernetes is like writing our code and pushing the code into physical servers in a data center and managing the resources needed by that server to run our application smoothly and another type is deploying our code in virtual machines(VM). With VMs also have problems with hardware and software components required by VMs costs are high and there are some security risks with VMs. Here comes the role of Kubernetes. It is an open-source platform that allows users to manage, deploy and maintain a group of containers and it is like a tool that manages multiple docker environments together. The problems we faced in VMs can be overcome by Kubernetes(K8s).
1. Kubernetes Horizontal Pod Autoscaling(HPA)
Horizontal Pod Autoscaler(HPA) is a controller that can scale most of the pod-based resources up and down based on your application workload. It does this by scaling the number of replicas of your pod once certain preconfigured thresholds are met and for the many applications we deploy scaling mostly depends on only a single metric which is CPU usage. To use HPA we need to define the number of maximum and minimum pods that we want to use for a particular application and also the memory percentage. If HPA is successfully enabled for a particular application Kubernetes will automatically monitor and controls the scaling up and down of pods based on the minimum and maximum limit we have defined.
For example, we will consider an application like Airbnb that runs in Kubernetes and it experiences high traffic of users if there is any offer on booking hotels and flights if the application is not optimized for handling this traffic, users may experience slow response times or even downtime. By using HPA, you may specify a target CPU usage percentage, a minimum and a maximum number of running pods, and other parameters. Kubernetes will automatically increase the number of pods to manage the increasing traffic when the CPU utilization reaches the specified level.
YAML code for HPA
apiVersion: autoscaling/v2
#this specifies Kubernetes API Version
kind: HorizontalPodAutoscaler
# this specifies Kubernetes object like HPA or VPA
metadata:
name: name_of_app
spec:
scaleTargetRef:
apiVersion: apps/v2
kind: Deployment
name: name_of_app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 40
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 40
The last line ‘targetCPUUtilizationPercentage’ specifies the target CPU utilization percentage that the HPA will aim for when scaling the deployment. In this case, it is set to 50%, meaning that the HPA will attempt to keep the CPU utilization of the deployment at or below 50%. This YAML code will automatically scale the specified deployment based on CPU Utilization with a minimum of 1 and a maximum of 10 replicas. If the average CPU utilization of the container exceeds 50%, the HPA will automatically scale up the deployment to maintain optimal performance
Working of Horizontal Pod Autoscaler
The working of HPA can be broken down into these key steps:
- Metrics Collection: The HorizontalPodAutoscaler continuously monitors the resource usage (e.g., CPU, memory) of the pods in your deployment. This is typically achieved by the Kubernetes Metrics Server, which collects data at regular intervals (default: every 15 seconds).
- Threshold Comparison: The collected resource metrics are compared against the desired threshold (e.g., CPU usage target of 60%). If the usage exceeds the target threshold, Kubernetes determines that the application requires more resources, and HPA triggers an action to add more pods.
- Scaling Logic: The HPA uses this logic to decide when and how much to scale:
- Scale Up: When resource utilization surpasses the defined threshold, HPA increases the number of pods. For instance, if the CPU utilization exceeds 70% across multiple pods, HPA might add more replicas to distribute the load evenly.
- Scale Down: If resource utilization falls below the threshold (e.g., CPU usage drops to 30%), HPA scales down by removing some of the pods, ensuring resource efficiency during low traffic periods.
- Feedback Loop: HPA operates in a feedback loop. As the traffic and resource demand changes, HPA will continuously adjust the pod count in response to real-time data. This ensures the system dynamically adapts to current workloads.
Limitations of HPA
The HorizontalPodAutoscaler (HPA) is great for scaling applications automatically in Kubernetes but it does have limitations that can impact its use in real-world scenarios:
- Limited Metric Support: HPA mainly uses CPU and memory for scaling which may not represent the true load. Applications often need to scale based on other factors like request rates or network traffic. Custom metrics can be added but this requires extra setup and complexity.
- Cold Starts and Delays: When HPA scales up there is a delay before new pods are ready. This can lead to performance drops when the current pods are overloaded. Pre-warming pods or planning for spikes can help but it requires more effort and resources.
- Reactive Scaling: HPA reacts after thresholds are breached rather than scaling proactively. This can leave your application under-provisioned during sudden traffic spikes causing poor performance. You can use predictive scaling models but that adds complexity to infrastructure.
- One Metric at a Time: HPA typically scales based on one metric like CPU or memory. Many applications need multiple factors like network or request rate considered together. To handle this you can use tools like KEDA but it increases operational overhead.
- Handling Burst Traffic: HPA struggles with burst traffic since it does not scale fast enough to handle sudden demand spikes. Using queue-based systems like RabbitMQ can help manage bursts but adds more complexity.
- Scaling Granularity: HPA scales pods as whole units which may be inefficient for applications that need finer control over resources like just increasing CPU. For more precise scaling the VerticalPodAutoscaler (VPA) can adjust resources for individual pods.
- Dependence on Metrics: HPA relies on the availability and accuracy of resource metrics. If the Metrics Server fails HPA cannot make scaling decisions which can lead to resource issues. Ensuring high availability for metrics is crucial.
- Fixed Scaling Intervals: HPA checks metrics at fixed intervals which can miss short traffic spikes. This can lead to delayed scaling or inefficient resource usage in dynamic environments. Adjusting the interval or combining HPA with event-driven scaling can help.
For a practical implementation guide on how to set up the Autoscaling nn Amazon EKS, refer to - Implementing Autoscaling in Amazon EKS
Usage and Cost Reporting with HPA
The Horizontal Pod Autoscaler (HPA) in Kubernetes helps keep applications performing optimally by adjusting the number of pod replicas based on demand to avoid over-provisioning and reduce costs. This guide explains how to monitor and report on HPA-driven usage to manage costs effectively.
Tracking HPA’s impact on costs helps avoid unnecessary expenses while capturing usage patterns to refine scaling decisions based on real data.
Setting Up Usage and Cost Reporting with HPA
- Define Metrics and Cost Allocation to track CPU memory and scaling events with tags for accurate cost attribution
- Use Monitoring Tools like Prometheus and Grafana to visualize usage patterns and compare metrics to cost data
- Add Custom Metrics to tailor HPA for specific application needs to keep scaling efficient
Cost Optimization Tips
- Spot Cost Anomalies by identifying scaling events that drive up costs unexpectedly
- Refine HPA with Historical Data by adjusting thresholds and cooldowns to reduce unneeded scaling
- Automate Reporting to maintain insight into usage trends and make informed scaling choices that are cost-conscious
2. Kubernetes Vertical Pod Autoscaler(VPA)
The Vertical Pod Autoscaler (VPA) for Kubernetes is a tool that provides automated CPU and memory requests and limits modifications based on past resource utilization metrics. It may assist you in effectively and automatically allocating resources inside a Kubernetes cluster, down to the level of individual containers, when utilized appropriately. In addition to enhancing a pod's performance and efficiency by managing its resource demands and limits, VPA may lower the cost of maintaining the application by reducing the wastage of resources. Pod resource use in a Kubernetes cluster may be improved using VPA, a useful feature.
The VPA deployment has three components namely:
- VPA Admission Controller
- VPA Recommender
- VPA Updater
2.1 VPA Admission Controller
It is a component that makes sure that before it is created or changed in the cluster, any new or updated Pod spec complies with the VPA criteria. All the pod creation or update requests are monitored by the VPA Admission Controller, who then applies a set of rules to the pod specifications. These rules are set up in accordance with the active VPA policy. The Kubernetes resources that are being created or altered must agree to the VPA policy, which is another check performed by the VPA Admission Controller.
2.2 VPA Recommender
It is a component in Kubernetes that is based on the resource utilization of those containers over time and suggests resource requests and limitations for specific containers in a pod. The Kubernetes Metrics Server, which offers real-time resource usage analytics for all containers running in the Kubernetes cluster, provides data on resource consumption to the VPA Recommender. The VPA Recommender creates suggestions for resource requests and restrictions for each container in a pod based on this data. It considers the factors like past usage, limits, and pod requirements while generating the recommendations.
2.3 VPA Updater
It is a component that modifies the resource usage and limitations for each container in a pod using the changes made by the VPA Recommender. The VPA Updater updates the Pod standard with the recommended resource requests and restrictions by continuously monitoring the suggestions made by the VPA Recommender. Using the Kubernetes API server, the changes are applied to the Pod standard. Moreover, the VPA Updater makes sure that the updated resource requests and restrictions correspond to the existing VPA policy. The VPA Updater will reject the update and stop the pod from being updated if the new values do not satisfy the VPA policy's requirements.
YAML file for VPA:
apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
name: example-vpa
spec:
targetRef:
apiVersion: "apps/v2"
kind: Deployment
name: example_deployment
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: example_container
minAllowed:
cpu: 110m
memory: 150Mi
maxAllowed:
cpu: 500m
memory: 1Gi
mode: "Auto"
Here the ‘resource policy’ specifies the resource policies that the VPA should use. In this case, there is only one container policy specified for a container named "example_container". The minAllowed and maxAllowed fields specify the minimum and maximum allowed resource requests and limits, respectively. Here the mode is set to "Auto", which means that the VPA will automatically adjust the resource requests and limits of the container within the specified range.
3. Kubernetes Cluster Autoscaler(CA)
The cluster Autoscaler is a tool that acts according to the requirements of your workloads, cluster autoscaler dynamically changes the number of nodes in a certain node pool. The cluster autoscaler scales back down to a minimum size that you choose when demand is low. This can increase the availability of your workload when you needs it. We don’t need to manually add or remove the nodes instead we can set a limit of maximum and minimum size for the node pool and the rest is taken care of by cluster autoscaler.
For example, A replica's Pod could be rescheduled onto a new node if its current node is removed, for instance, if your workload comprises a controller with a single replica. Design your workloads to endure unexpected interruptions or make sure that crucial Pods are not disrupted before activating cluster autoscaler. Scaling choices are not made by CA based on CPU or memory use. It just looks at a pod's requests and allotted amounts of CPU and memory. Due to this limitation, CA will not be able to identify any unused computing resources requested by users, creating a cluster with inefficient use and waste. The Cluster autoscaler eliminates nodes to the minimal size of the node pool if nodes are underutilized and all Pods can be scheduled even with fewer nodes in the node pool. Cluster autoscaler won't try to scale down a node if there are Pods on it that can't relocate to other nodes in the cluster. Cluster autoscaler does not address resource shortages on nodes if pods have requested insufficient amounts of resources (or) have left the defaults in place, which may be insufficient. By explicitly requesting resources for each job, you may ensure that the cluster autoscaler operates as correctly as possible.
YAML for cluster autoscaling:
apiVersion: autoscaling/v2
kind: ClusterAutoscaler
metadata:
name: cluster_autoscaler
spec:
scaleTargetRef:
apiVersion: apps/v2
kind: Deployment
name: cluster-autoscaler
minReplicas: 1
maxReplicas: 8
autoDiscovery:
clusterName: my_kubernetes_cluster
tags:
k8s.io/cluster_autoscaler/enabled: "true"
balanceSimilarNodeGroups: true
The Cluster Autoscaler's auto-discovery section described in this is the name of the Kubernetes cluster in which the Cluster Autoscaler is currently executing is specified in this example's clusterName parameter. The tags field instructs the Cluster Autoscaler to scale the cluster by nodes that have the tag "k8s.io/cluster-autoscaler/enabled" set to "true".And the ‘balanceSimilarNodeGroups’ section field specifies whether the Cluster Autoscaler should attempt to balance similar node groups when scaling the cluster.
Kubernetes HPA vs VPA
Feature | Horizontal Pod Autoscaler (HPA) | Vertical Pod Autoscaler (VPA) |
---|
Purpose | Scales the number of pod replicas | Adjusts CPU and memory resources within individual pods |
Primary Metric | CPU and memory usage or custom metrics | CPU and memory usage |
Use Case | Handling fluctuating demand by adding/removing pods | Optimizing resource allocation for existing pods |
Scaling Direction | Horizontal (increases/decreases the number of pods) | Vertical (adjusts resources for existing pods) |
Ideal For | Applications needing more instances during high demand | Applications requiring optimized resources per pod |
Impact on Application Design | Minimal; scales out by adding more pods | May require adjustments if resources are constrained |
Common Usage Scenarios | Web applications, microservices | Resource-intensive applications, background processing |
Configuration Complexity | Typically straightforward | Requires tuning to avoid excessive scaling |
Conclusion
Kubernetes Autoscaling plays a crucial role in modern application deployment, enabling dynamic and efficient management of resources that respond to changing demands. By implementing Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler, organizations can ensure their applications remain resilient, responsive, and cost-effective. Autoscaling not only optimizes resource usage but also enhances application performance by allocating resources precisely when and where they’re needed. With Kubernetes Autoscaling, businesses can manage traffic surges seamlessly, reduce operational costs, and improve the overall user experience, making it an indispensable tool for any cloud-native environment.
Similar Reads
Kubernetes - Kubectl Kubectl is a command-line software tool that is used to run commands in command-line mode against the Kubernetes Cluster. Kubectl runs the commands in the command line, Behind the scenes it authenticates with the master node of Kubernetes Cluster with API calls and performs the operations on Kuberne
12 min read
Kubernetes kOps Kubernetes Kops smoothes out the deployment and management of the Kubernetes cluster, tending to the intricacies related to orchestrating containerized applications. Kubernetes, an open-source container orchestration platform, mechanizes application sending and scaling. Be that as it may, proficient
11 min read
Kubernetes Cluster A group of nodes (Computers/Operating systems) working together to communicate with the help of Kubernetes software is known as the Kubernetes cluster. It works in a mechanism in such a way that it performs actions on the worker node with the help of the manager node. Need for a Kubernetes ClusterIn
6 min read
Kubernetes - Creating a ReplicaSet Pre-requisite: Kubernetes A ReplicaSet is a key component of a Kubernetes application. It is a controller that ensures that a specified number of pod replicas are running at any given time. It is used to automatically replace any pods that fail, get deleted, or are terminated, ensuring the desired n
9 min read
What is Auto Scaling? A characteristic of cloud computing called auto-scaling automatically adjusts the number of servers in use in response to demand. This implies that your applications can save money during slower periods and function flawlessly at peak periods. In this post, we'll examine the benefits of auto-scaling
11 min read
Introduction to Kubernetes (K8S) Before Kubernetes, developers used Docker to package and run their applications inside containers. Docker made creating and running a single container easy, but it became hard to manage many containers running across different machines. For example, what if one container crashes? How do you restart
15+ min read