Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
Kubernetes provides a few mechanisms for scalability of workloads. Three primary mechanisms are Vertical Pod Autoscaler (VPA), Horizontal Pod Autoscaler (HPA), and Cluster Autoscaler (CA).
Cluster Autoscaler automatically adapts the number of Kubernetes nodes in your cluster to your requirements. When the number of pods that are pending or “unschedulable” increases, indicating there are insufficient resources in the cluster, CA adds new nodes to the cluster. It can also scale down nodes if they have been underutilized for a long period of time.
The Cluster Autoscaler is typically installed as a Deployment object in a cluster. It scales one replica at a time, and uses leader election to ensure high availability.
This is part of an extensive series of guides about microservices.
Horizontal Pod Autoscaler (HPA) adjusts the number of pod replicas in a deployment or replica set based on observed CPU utilization or other custom metrics. This means HPA scales the workload horizontally by increasing or decreasing the number of pod instances. For example, if an application experiences a spike in CPU usage, HPA will add more pod replicas to handle the load, and it will reduce the number of replicas when the load decreases.
Cluster Autoscaler manages the number of nodes in a cluster. When there are insufficient resources to schedule new pods (e.g., CPU or memory constraints), the CA will provision additional nodes to accommodate the new pods. If nodes are underutilized, the CA can scale down the number of nodes to save costs.
Related content: Read our guide to Horizontal Pod Autoscaler (coming soon)
Vertical Pod Autoscaler (VPA) automatically adjusts the resource limits and requests (CPU and memory) for running pods based on their usage. It helps ensure that each pod has the appropriate amount of resources, optimizing performance and efficiency. For example, if a pod consistently uses more memory than requested, VPA will increase its memory limits, and if it uses less, VPA will decrease them.
Cluster Autoscaler focuses on the entire cluster’s resource capacity by adding or removing nodes. While VPA fine-tunes the resource allocation for individual pods, CA ensures that there are enough nodes to meet the overall demand of the cluster.
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better utilize the Kubernetes Cluster Autoscaler:
Define scaling policies that match your application’s load patterns.
Use monitoring tools to track and analyze scaling events.
Ensure pods have realistic resource requests to avoid over or under-scaling.
Use taints and tolerations to control pod placement during scaling.
Validate autoscaling configurations in a staging environment before applying to production.
For simplicity, we’ll explain the Cluster Autoscaler process in a scale out scenario. When the number of pending (unschedulable) pods in the cluster increases, indicating a lack of resources, CA automatically starts new nodes.
This occurs in four steps:
--scan-interval
When using CA, it’s important to be aware of its limitations.
CA relies on the resource requests specified for each pod to determine whether additional nodes are required. If these resource requests are inaccurately configured, CA might over-provision or under-provision the cluster. Over-provisioning leads to unnecessary costs, while under-provisioning can cause pods to remain unschedulable.
Although CA can detect unschedulable pods quickly, the actual process of provisioning new nodes and making them ready for scheduling can take several minutes, depending on the underlying infrastructure and cloud provider. During this delay, applications might experience degraded performance or downtime. This delay can be particularly problematic for applications with sudden spikes in demand.
When CA decides to remove underutilized nodes, it must first evict all the pods running on those nodes. While Kubernetes attempts to reschedule these pods onto other available nodes, there is no guarantee that sufficient capacity exists elsewhere in the cluster. This can lead to temporary unavailability of certain spods, impacting application stability. The disruption is especially noticeable when the node being removed has stateful stateful pods.
Here are some of the ways to ensure optimal use of CA.
This feature simplifies the management and scaling of Kubernetes clusters. It enables Cluster Autoscaler to automatically detect and manage node groups based on predefined tags or labels without manual intervention. This functionality is particularly beneficial in environments where node group configurations are frequently updated or when operating multi-zone or multi-region clusters.
To enable Node Group Auto-Discovery, configure the --node-group-auto-discovery flag with appropriate tags, like this:
--node-group-auto-discovery
--node-group-auto-discovery=k8s.io/cluster-autoscaler/<cluster-name> --node-group-auto-discovery=k8s.io/cluster-autoscaler/node-template/label/<label-key>=<label-value>
--node-group-auto-discovery=k8s.io/cluster-autoscaler/<cluster-name>
--node-group-auto-discovery=k8s.io/cluster-autoscaler/node-template/label/<label-key>=<label-value>
By doing so, CA can dynamically scale different types of node groups as needed.
To prevent Cluster Autoscaler from making rapid, consecutive scaling decisions that could lead to instability, it is advisable to configure a delay period after new nodes are added before any scale-down operations are permitted. This delay can be set using the --scale-down-delay-after-add flag.
--scale-down-delay-after-add
A common configuration is to set this delay to around 10 minutes. This buffer period allows new nodes to stabilize and start accepting workloads, ensuring that the cluster has enough time to adjust to the new capacity before any nodes are considered for removal. This helps avoid nodes being added and then immediately removed, causing unnecessary resource churn.
This threshold determines the level of resource usage below which nodes are eligible for scale-down. It can be configured using the --scale-down-utilization-threshold flag. For example, setting a utilization threshold of 50% means that the CA will consider nodes for removal if their average CPU or memory utilization falls below this level.
--scale-down-utilization-threshold
This configuration helps to balance cost savings with performance by ensuring that underutilized nodes are removed. It saves on infrastructure costs while maintaining enough capacity to handle changes in workload demand. Adjusting this threshold based on the needs and workload patterns of your applications can optimize both cost and performance.
In cases where a scale-down operation fails, it is useful to introduce a delay before retrying another scale-down. This delay helps to prevent repeated, rapid attempts that could destabilize the cluster. The delay period can be configured using the --scale-down-delay-after-failure flag.
--scale-down-delay-after-failure
A typical delay setting is around 3 to 5 minutes, which provides enough time to diagnose and address the cause of the failure before attempting another scale-down. This helps maintain cluster stability and ensures that scaling operations do not affect application availability or performance by repeatedly triggering disruptions.
The estimator type in the Cluster Autoscaler configuration determines how CA estimates the number of nodes required to accommodate pending pods. Different estimator types can be configured using the --estimator flag, such as binpacking and simple.
--estimator
binpacking
simple
The binpacking estimator, for example, aims to optimize node usage by fitting as many pods as possible onto each node, which is useful in resource-constrained environments. The simple estimator provides a simpler calculation method suitable for less complex environments. Selecting the right estimator type based on the cluster’s workload and resource requirements can improve the accuracy of scaling decisions.
Limiting the maximum number of nodes that can be added in a single scaling operation helps control infrastructure costs and prevent sudden spikes in resource usage. This limit can be set using the max-node-proportional-expansion flag. For example, setting this limit to 10 nodes ensures that scaling operations add a manageable number of nodes.
max-node-proportional-expansion
This setting is particularly useful in large-scale environments where rapid, large-scale expansions could lead to significant cost increases and potential resource wastage. By controlling the rate of node additions, cluster administrators can better manage budget and resource utilization.
Integrating spot instances into your cluster can reduce costs for non-critical workloads. Spot instances are typically available at a lower cost compared to on-demand instances, making them useful for certain tasks. To configure the CA to use spot instances, set up node groups with spot instance types and label them appropriately.
You can use tags such as k8s.io/cluster-autoscaler/node-template/spot-instance=true to identify these groups. CA then prioritizes adding spot instances for scaling operations, taking advantage of the discounts offered. However, it is important to account for the volatility of spot instances.
k8s.io/cluster-autoscaler/node-template/spot-instance=true
Balancing similar node groups helps prevent imbalances in node usage across different node groups within the cluster. This can be achieved by using the --balance-similar-node-groups flag.
--balance-similar-node-groups
When enabled, this flag ensures that scaling operations are distributed evenly across node groups with similar configurations, improving resource utilization and preventing over-reliance on a single node group. This balance helps to maintain cluster performance and availability, especially in multi-zone or multi-region deployments where different node groups may have varying capacities and costs.
Cluster Autoscaler is a useful mechanism, but it can sometimes work differently than expected. Here are the primary ways to diagnose an issue with CA:
Logs on control plane nodes
Kubernetes control plane nodes create logs of Cluster Autoscaler activity in the following path: /var/log/cluster-autoscaler.log
/var/log/cluster-autoscaler.log
Events on control plane nodes
The kube-system/cluster-autoscaler-status ConfigMap emits the following events:
kube-system/cluster-autoscaler-status
ScaledUpGroup
ScaleDownEmpty
ScaleDown
Events on nodes
ScaleDownFailed
Events on pods
TriggeredScaleUp
NotTriggerScaleUp
Here are specific error scenarios that can occur with the Cluster Autoscaler and how to perform initial troubleshooting.
These instructions will allow you to debug simple error scenarios, but for more complex errors involving multiple moving parts in the cluster, you might need automated troubleshooting tools.
Here are reasons why CA might fail to scale down a node, and what you can do about them.
Here are reasons why CA might fail to scale up the cluster, and what you can do about them.
If CA appears to have stopped working, follow these steps to debug the problem:
resourceUnready
Kubernetes troubleshooting is complex and involves multiple components; you might experience errors that are difficult to diagnose and fix. Without the right tools and expertise in place, the troubleshooting process can become stressful, ineffective and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong – simply because it can – especially across hybrid cloud environments.
This is where Komodor comes in – Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.
Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.
By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of microservices.
Authored by NetApp
Share:
and start using Komodor in seconds!