Cluster Autoscaler: How It Works and Solving Common Problems

What is Cluster Autoscaler?

Kubernetes provides a few mechanisms for scalability of workloads. Three primary mechanisms are Vertical Pod Autoscaler (VPA), Horizontal Pod Autoscaler (HPA), and Cluster Autoscaler (CA).

Cluster Autoscaler automatically adapts the number of Kubernetes nodes in your cluster to your requirements. When the number of pods that are pending or “unschedulable” increases, indicating there are insufficient resources in the cluster, CA adds new nodes to the cluster. It can also scale down nodes if they have been underutilized for a long period of time.

The Cluster Autoscaler is typically installed as a Deployment object in a cluster. It scales one replica at a time, and uses leader election to ensure high availability.

This is part of an extensive series of guides about microservices.

Cluster Autoscaler vs Other Autoscalers

Cluster Autoscaler vs Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler (HPA) adjusts the number of pod replicas in a deployment or replica set based on observed CPU utilization or other custom metrics. This means HPA scales the workload horizontally by increasing or decreasing the number of pod instances. For example, if an application experiences a spike in CPU usage, HPA will add more pod replicas to handle the load, and it will reduce the number of replicas when the load decreases.

Cluster Autoscaler manages the number of nodes in a cluster. When there are insufficient resources to schedule new pods (e.g., CPU or memory constraints), the CA will provision additional nodes to accommodate the new pods. If nodes are underutilized, the CA can scale down the number of nodes to save costs. 

Related content: Read our guide to Horizontal Pod Autoscaler (coming soon)

Cluster Autoscaler vs Vertical Pod Autoscaler (VPA)

Vertical Pod Autoscaler (VPA) automatically adjusts the resource limits and requests (CPU and memory) for running pods based on their usage. It helps ensure that each pod has the appropriate amount of resources, optimizing performance and efficiency. For example, if a pod consistently uses more memory than requested, VPA will increase its memory limits, and if it uses less, VPA will decrease them.

Cluster Autoscaler focuses on the entire cluster’s resource capacity by adding or removing nodes. While VPA fine-tunes the resource allocation for individual pods, CA ensures that there are enough nodes to meet the overall demand of the cluster.

How Cluster Autoscaler Works

For simplicity, we’ll explain the Cluster Autoscaler process in a scale out scenario. When the number of pending (unschedulable) pods in the cluster increases, indicating a lack of resources, CA automatically starts new nodes.

This occurs in four steps:

  1. CA checks for pending pods, scanning at an interval of 10 seconds (configurable using the --scan-interval flag).
  2. If there are pending pods, CA spins up new nodes to scale out the cluster, within the constraints configured by the administrator. CA integrates with public cloud platforms such as AWS and Azure, using their autoscaling capabilities to add more virtual machines.
  3. Kubernetes registers the new virtual machines as nodes in the control plane, allowing the Kubernetes scheduler to run pods on them.
  4. The Kubernetes scheduler assigns the pending pods to the new nodes.
Cluster Autoscaler

Cluster Autoscaler Limitations

When using CA, it’s important to be aware of its limitations.

Resource Estimation Accuracy

CA relies on the resource requests specified for each pod to determine whether additional nodes are required. If these resource requests are inaccurately configured, CA might over-provision or under-provision the cluster. Over-provisioning leads to unnecessary costs, while under-provisioning can cause pods to remain unschedulable.

Delay in Scaling

Although CA can detect unschedulable pods quickly, the actual process of provisioning new nodes and making them ready for scheduling can take several minutes, depending on the underlying infrastructure and cloud provider. During this delay, applications might experience degraded performance or downtime. This delay can be particularly problematic for applications with sudden spikes in demand.

Pod Disruption

When CA decides to remove underutilized nodes, it must first evict all the pods running on those nodes. While Kubernetes attempts to reschedule these pods onto other available nodes, there is no guarantee that sufficient capacity exists elsewhere in the cluster. This can lead to temporary unavailability of certain spods, impacting application stability. The disruption is especially noticeable when the node being removed has stateful stateful pods. 

Best Practices for Optimizing Configurations in Cluster Autoscaler

Here are some of the ways to ensure optimal use of CA.

Node Group Auto-Discovery

This feature simplifies the management and scaling of Kubernetes clusters. It enables Cluster Autoscaler to automatically detect and manage node groups based on predefined tags or labels without manual intervention. This functionality is particularly beneficial in environments where node group configurations are frequently updated or when operating multi-zone or multi-region clusters. 

To enable Node Group Auto-Discovery, configure the --node-group-auto-discovery flag with appropriate tags, like this:

--node-group-auto-discovery=k8s.io/cluster-autoscaler/<cluster-name>
--node-group-auto-discovery=k8s.io/cluster-autoscaler/node-template/label/<label-key>=<label-value>

By doing so, CA can dynamically scale different types of node groups as needed.

Scale-Down Delay After Add

To prevent Cluster Autoscaler from making rapid, consecutive scaling decisions that could lead to instability, it is advisable to configure a delay period after new nodes are added before any scale-down operations are permitted. This delay can be set using the --scale-down-delay-after-add flag. 

A common configuration is to set this delay to around 10 minutes. This buffer period allows new nodes to stabilize and start accepting workloads, ensuring that the cluster has enough time to adjust to the new capacity before any nodes are considered for removal. This helps avoid nodes being added and then immediately removed, causing unnecessary resource churn.

Scale-Down Utilization Threshold

This threshold determines the level of resource usage below which nodes are eligible for scale-down. It can be configured using the --scale-down-utilization-threshold flag. For example, setting a utilization threshold of 50% means that the CA will consider nodes for removal if their average CPU or memory utilization falls below this level. 

This configuration helps to balance cost savings with performance by ensuring that underutilized nodes are removed. It saves on infrastructure costs while maintaining enough capacity to handle changes in workload demand. Adjusting this threshold based on the needs and workload patterns of your applications can optimize both cost and performance.

Scale-Down Delay After Failure

In cases where a scale-down operation fails, it is useful to introduce a delay before retrying another scale-down. This delay helps to prevent repeated, rapid attempts that could destabilize the cluster. The delay period can be configured using the --scale-down-delay-after-failure flag. 

A typical delay setting is around 3 to 5 minutes, which provides enough time to diagnose and address the cause of the failure before attempting another scale-down. This helps maintain cluster stability and ensures that scaling operations do not affect application availability or performance by repeatedly triggering disruptions.

Estimator Type

The estimator type in the Cluster Autoscaler configuration determines how CA estimates the number of nodes required to accommodate pending pods. Different estimator types can be configured using the --estimator flag, such as binpacking and simple

The binpacking estimator, for example, aims to optimize node usage by fitting as many pods as possible onto each node, which is useful in resource-constrained environments. The simple estimator provides a simpler calculation method suitable for less complex environments. Selecting the right estimator type based on the cluster’s workload and resource requirements can improve the accuracy of scaling decisions.

Max Node Proportional Expansion

Limiting the maximum number of nodes that can be added in a single scaling operation helps control infrastructure costs and prevent sudden spikes in resource usage. This limit can be set using the max-node-proportional-expansion flag. For example, setting this limit to 10 nodes ensures that scaling operations add a manageable number of nodes. 

This setting is particularly useful in large-scale environments where rapid, large-scale expansions could lead to significant cost increases and potential resource wastage. By controlling the rate of node additions, cluster administrators can better manage budget and resource utilization.

Spot Instances Integration

Integrating spot instances into your cluster can reduce costs for non-critical workloads. Spot instances are typically available at a lower cost compared to on-demand instances, making them useful for certain tasks. To configure the CA to use spot instances, set up node groups with spot instance types and label them appropriately. 

You can use tags such as k8s.io/cluster-autoscaler/node-template/spot-instance=true to identify these groups. CA then prioritizes adding spot instances for scaling operations, taking advantage of the discounts offered. However, it is important to account for the volatility of spot instances.

Balancing Similar Node Groups

Balancing similar node groups helps prevent imbalances in node usage across different node groups within the cluster. This can be achieved by using the --balance-similar-node-groups flag. 

When enabled, this flag ensures that scaling operations are distributed evenly across node groups with similar configurations, improving resource utilization and preventing over-reliance on a single node group. This balance helps to maintain cluster performance and availability, especially in multi-zone or multi-region deployments where different node groups may have varying capacities and costs. 

Diagnosing Issues with Cluster Autoscaler

Cluster Autoscaler is a useful mechanism, but it can sometimes work differently than expected. Here are the primary ways to diagnose an issue with CA:

Logs on control plane nodes

Kubernetes control plane nodes create logs of Cluster Autoscaler activity in the following path: /var/log/cluster-autoscaler.log

Events on control plane nodes

The kube-system/cluster-autoscaler-status ConfigMap emits the following events:

  • ScaledUpGroup—this event means CA increased the size of the node group (provides previous size and current size)
  • ScaleDownEmpty—this event means CA removed a node that did not have any user pods running on it (only system pods)
  • ScaleDown—this event means CA removed a node that had user pods running on it. The event will include the names of all pods that are rescheduled as a result.

Events on nodes

  • ScaleDown—this event means CA is scaling down the node. There can be multiple events, indicating different stages of the scale-down operation.
  • ScaleDownFailed—this event means CA tried to remove the node but did not succeed. It provides the resulting error message.

Events on pods

  • TriggeredScaleUp—this event means CA scaled up the cluster to enable this pod to schedule.
  • NotTriggerScaleUp—this event means CA was not able to scale up a node group to allow this pod to schedule.
  • ScaleDown—this event means CA tried to evict this pod from a node, in order to drain it and then scale it down.

Cluster Autoscaler: Troubleshooting for Specific Error Scenarios

Here are specific error scenarios that can occur with the Cluster Autoscaler and how to perform initial troubleshooting.

These instructions will allow you to debug simple error scenarios, but for more complex errors involving multiple moving parts in the cluster, you might need automated troubleshooting tools.

Nodes with Low Utilization are Not Scaled Down

Here are reasons why CA might fail to scale down a node, and what you can do about them.

REASON CLUSTER DOESN’T SCALE DOWNWHAT YOU CAN DO
Pod specs indicate it should not be evicted from the node.Identify the missing ConfigMap and create it in the namespace, or mount another, existing ConfigMap.
Node group already has the minimum size.Reduce minimum size in CA configuration.
The node has “scale-down disabled” annotation.Remove the annotation from the node.
CA is waiting for the duration specified in one of these flags:
–scale-down-unneeded-time
–scale-down-delay-after-add flag
–scale-down-delay-after-failure
–scale-down-delay-after-delete
–scan-interval
Reduce the time specified in the relevant flag, or wait the specified time after the relevant event.
Failed attempt to remove the node (CA will wait another 5 minutes before trying again).Wait 5 minutes and see if the issue is resolved.

Pending Nodes Exist But Cluster Does Not Scale Up

Here are reasons why CA might fail to scale up the cluster, and what you can do about them.

REASON CLUSTER DOESN’T SCALE UPWHAT YOU CAN DO
Existing pods have high resource requests, which won’t be satisfied by new nodes.Enable CA to add large nodes, or reduce resource requests by pods.
All suitable node groups are at maximum size.Increase the maximum size of the relevant node group.
Existing pods are not able to schedule on new nodes due to selectors or other settings.Modify pod manifests to enable some pods to schedule on the new nodes. Learn more in our guide to node affinity.
NoVolumeZoneConflict error—this indicates that a StatefulSet needs to run in the same zone with a PersistentVolume (PV), but that zone has already reached its scaling limit.From Kubernetes 1.13 onwards, you can run separate node groups per zone and use the –balance-similar-node-groups flag to keep them balanced across zones.

Cluster Autoscaler Stops Working

If CA appears to have stopped working, follow these steps to debug the problem:

  1. Check if CA is running—you can check the latest events emitted by the kube-system/cluster-autoscaler-status ConfigMap. This should be no more than 3 minutes.
  2. Check if cluster and node groups are in healthy state—this should be reported by the ConfigMap.
  3. Check if there are unready nodes (CA version 1.24 and later)—if some nodes appear unready, check the resourceUnready count. If any nodes are marked as resourceUnready, the problem is likely with a device driver failing to install a required hardware resource.
  4. If both cluster and CA are healthy, check:
    • Nodes with low utilization—if these nodes are not being scheduled, see the Nodes with Low Utilization section above.
    • Pending pods that do not trigger a scale up—see the Pending Nodes Exist section above.
    • Control plane CA logs—could indicate what is the problem preventing CA from scaling up or down, why it cannot remove a pod, or what was the scale-up plan.
    • CA events on the pod object—could provide clues why CA could not reschedule the pod.
    • Cloud provider resources quota—if there are failed attempts to add nodes, the problem could be resource quota with the public cloud provider.
    • Networking issues—if the cloud provider is managing to create nodes but they are not connecting to the cluster, this could indicate a networking issue.

Cluster Autoscaler Troubleshooting with Komodor

Kubernetes troubleshooting is complex and involves multiple components; you might experience errors that are difficult to diagnose and fix. Without the right tools and expertise in place, the troubleshooting process can become stressful, ineffective and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong – simply because it can – especially across hybrid cloud environments. 

This is where Komodor comes in – Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.

Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.

By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial

See Additional Guides on Key Microservices Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of microservices.

Application Mapping

Authored by CodeSee

Container Monitoring

Authored by Lumigo

Edge Computing        

Authored by Run.AI

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.