Kubernetes CPU Limits: What’s the Right Way to Assign CPU Resources?

What Is the Kubernetes CPU Limit Used For?

Kubernetes CPU limits define the maximum CPU resources a pod is allowed to use on the host machine. When you create a template for a pod, you can optionally specify how many resources each container is allowed to use on a Kubernetes node. The most common resources are CPU and memory (RAM), but you can also specify others.

You can specify a resource request which indicates the minimal resources needed for containers in a pod—the kube-scheduler uses this information to decide which node to schedule the pod on and reserves at least the requested amount of the resource specifically for that container to use. When you specify a resource limit for a container, the kubelet enforces this limit, making sure that the running container does not use more than the resources specified.

However, setting CPU limits in Kubernetes is not always a good idea. It can lead to inefficiencies such as underutilized resources, as containers may be restricted even when additional CPU capacity is available on the node. This can result in throttling, where applications are slowed down unnecessarily, impacting performance. Limits also add complexity to resource management, requiring continuous tuning to avoid performance issues or resource contention. 

As an alternative, consider setting CPU requests to guarantee minimum resources for workloads, while avoiding strict limits unless absolutely necessary, thereby balancing resource utilization and performance.This is part of a series of articles about Kubernetes Management.

How CPU Requests and Limits Work

Each node in a Kubernetes cluster is allocated memory (RAM) and compute power (CPU) that can be used to run containers. A Kubernetes cluster defines a logical grouping of one or more containers into pods. You can then deploy and manage pods on top of your nodes.

When you create a pod, you typically specify the storage and networking that containers share within that pod. The Kubernetes scheduler finds a node that has the required resources to run the pod.

You can provide more information for the scheduler using two parameters that specify RAM and CPU utilization:

  • Request—sets the minimum amount of RAM or CPU required for the container. Kubernetes aggregates all container requests into a single pod request. The scheduler uses this pod request to ensure that pods are deployed to nodes with sufficient resources.
  • Limit—you can set a maximum amount of allows RAM or CPU utilization by specifying a limit on the container. Kubernetes translates and enforces restrictions by interacting with container engines, such as Docker or containerd. When a container exceeds its memory limit, the kubelet typically kills and restarts it. CPU limits are more lenient and can be exceeded for long periods of time.

What Is CPU Throttling?

CPU throttling occurs when a container exceeds its specified CPU limit, and Kubernetes restricts the container’s CPU access to prevent it from using more than the allocated resources. This mechanism uses cgroups, which enforce CPU usage boundaries by controlling how much CPU time a container can consume.

Instead of abruptly stopping a container, throttling slows down its execution by limiting the number of CPU cycles it receives. For instance, if a container’s limit is set to 500 millicores (0.5 CPUs), it can only use half of a CPU’s time. If the container tries to use more than this allocation, the Linux kernel will delay its tasks, causing a slowdown in processing speed.

Kubernetes CPU Limits vs. Kubernetes CPU Requests

CPU requests define the minimum amount of CPU resources a container needs to run. Kubernetes uses this information during scheduling to ensure that a pod is placed on a node with sufficient available CPU resources. Once scheduled, the CPU requested by the pod is reserved for it, guaranteeing that the container will always have at least this much CPU available. This prevents resource starvation for critical workloads and ensures predictable performance.

CPU limits specify the maximum CPU resources a container is allowed to use. Unlike requests, limits do not reserve resources upfront; instead, they act as a ceiling to prevent a container from consuming excessive CPU, which could impact other workloads. If a container exceeds its CPU limit, it will experience throttling, where its CPU usage is slowed to stay within the defined boundary.

There are two important differences between requests and limits:

  • Impact on scheduling: Requests influence pod placement on nodes since the scheduler uses them to calculate resource availability. Limits do not affect scheduling decisions but apply when the container is running.
  • Behavior during resource contention: If CPU usage exceeds requests but remains below limits, the container can use spare CPU capacity on the node. If usage exceeds the limit, the container is throttled, potentially reducing performance.

What Can Go Wrong If You Don’t Specify the CPU Limit in Kubernetes?

If you do not specify a CPU limit, the container can use all the CPU resources available on the node. This can cause containers with high CPU utilization to slow down other containers on the same node and use all available CPU, and may even cause Kubernetes components such as the kubelet to become unresponsive. The node then enters a NotReady state, causing its pods to be rescheduled on another node.

By setting limits on all containers, you can avoid most of the following problems:

  • Out of Memory (OOM) issues—can cause a node to go down, affecting the stability of the cluster. For example, applications with memory leaks can cause OOM problems. However, memory limits on containers can prevent memory leaks within a container from affecting the node.
  • CPU starvation—applications that are too CPU-intensive can affect all applications on the same node. Other applications can slow down or become unresponsive.
  • Pod eviction—when a node runs out of resources, the node initiates an eviction process that terminates pods. The first pods evicted are those that have no resource requests.
  • Financial waste—if there is no need for resource requests or limits, and there are no errors, this probably means you have over-provisioned the cluster and are overpaying for hardware resources.

The Disadvantages of Using CPU Limits

Applying limits on CPU also has several potential drawbacks. In fact, some have suggested that CPU limits are an antipattern. Here are a few reasons:

Resource Inefficiency

Implementing CPU limits within a Kubernetes environment can often lead to resource underutilization. This occurs when containers are restricted by CPU limits that are set lower than the potential peak usage they might achieve under optimal conditions. As a result, even if additional CPU cycles are available on the node, they remain unused..

Complexity in Resource Management

Setting and managing CPU limits introduces additional complexity into the resource management strategy of a Kubernetes cluster. Administrators must use meticulous planning to define appropriate CPU limits that reflect the needs of each container while avoiding resource contention. This balancing act requires continuous monitoring and adjustment of CPU limits.

Starvation

CPU starvation occurs when the limits set are too restrictive, causing processes to receive insufficient CPU time for their execution needs. This is particularly problematic for compute-intensive applications like AI, big data processing, or real-time applications.

Potential Service Disruption

Strict CPU limits can also lead to potential disruptions in service delivery. When containers reach their CPU capacity, they are throttled, meaning they are temporarily restricted from using CPU resources beyond their set limit. This throttling can increase the time it takes for the container to complete its tasks, leading to bottlenecks and decreasing application performance.

When to Set CPU Limits

CPU limits are sometimes useful, but it’s important to understand the context and determine if they are the best solution for your needs. Key considerations include:

Benchmarking

Benchmarking involves executing the application under various operational scenarios to measure the actual CPU usage across different states of application load. This data provides a baseline that helps in setting CPU limits that are neither excessively high, which would lead to wasted resources, nor too low, which might trigger CPU throttling. 

Multi-Tenant Environments

In environments where Kubernetes hosts multiple tenants — different teams or applications sharing the same cluster resources — CPU limits prevent any single tenant from consuming disproportionate CPU resources. Such limits ensure that all tenants have equitable access to CPU resources, preventing one application’s excessive consumption from degrading the others.

Predictability

CPU limits enhance the predictability of application performance by ensuring a stable allocation of CPU resources. This stability is crucial for applications that need to guarantee a specific level of performance or for those operating under stringent service level agreements (SLAs). By defining clear CPU boundaries, administrators can better manage the behavior of applications.

expert-icon-header

Tips from the expert

Itiel Shwartz

Co-Founder & CTO

Itiel is the CTO and co-founder of Komodor. He’s a big believer in dev empowerment and moving fast, has worked at eBay, Forter and Rookout (as the founding engineer). Itiel is a backend and infra developer turned “DevOps”, an avid public speaker that loves talking about things such as cloud infrastructure, Kubernetes, Python, observability, and R&D culture.

In my experience, here are tips that can help you better manage Kubernetes CPU limits and throttling:

Monitor CPU utilization

Use monitoring tools to keep track of CPU usage and identify throttling instances.

Set realistic CPU limits

Configure CPU limits based on the actual needs of your applications to avoid unnecessary throttling.

Use resource requests

Define resource requests to ensure your pods get the necessary CPU resources.

Analyze performance metrics

Regularly review performance metrics to adjust CPU limits and requests appropriately.

Optimize application code

Ensure your application code is efficient and not excessively consuming CPU.

Preventing Errors by Detecting Containers Without CPU Limits

If you use CPU limits, it’s important to identify containers without any limits set. Here is how to do it.

Finding containers without CPU limits by namespace

Use this query to discover containers without CPU limits in a specific namespace.

sum by (namespace)
(count by (namespace,pod,container)(kube_pod_container_info{container!=""})
unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))

Finding containers with tight CPU limits

This technique aims to avoid CPU throttling by identifying containers that have CPU limits close to their actual utilization.

Use this query to find containers with CPU utilization close to the limit:

(sum by
(namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) /
sum by(namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"})) > 0.8

Checking if the cluster has enough capacity

Kubernetes makes sure that pods are only scheduled on a node if that node has enough resources for the aggregate requests of all the container’s pods. This also means that the node commits to each container the CPU and memory resources specified in its resource request.

Consider a Kubernetes cluster where the sum of all resource requests is greater than the resources available in the cluster. This is known as “overcommitting”. When the cluster is overcommitted, pods might work well in normal circumstances, but when there are high loads, containers can start using resources up to the limit. This will cause certain pods to evict, and in extreme cases, nodes can die due to resource starvation in the cluster.

To check for CPU overcommits in the cluster, use the following query:

100 * sum(kube_pod_container_resource_limits{container!="",resource="cpu"} ) /
sum(kube_node_status_capacity_cpu_cores)

Quick Tutorial: How to Assign CPU Resources to Containers and Pods

This is based on an example from the official Kubernetes documentation.

Step 1: Create a separate namespace
First, we’ll create a separate Namespace so that resources created in the tutorial are isolated from the rest of your cluster.

kubectl create namespace cpu-example

Step 2: Create a pod with one container and a resource request
Here is a pod template with one container. The container has a resources:requests field that specifies a request of 0.5 CPU and a resources:limits field that specifies a limit of 1 CPU.

Note that the pod template can also specify how many CPUs the container should be allowed to use. The args section in the template below indicates that the container should attempt to use 2 CPUs.

apiVersion: v1
kind: Pod
metadata:
name: cpu-demo
namespace: cpu-example
spec:
containers:
—name: cpu-demo-ctr
image: vish/stress
resources:
limits:
cpu: "1"
requests:
cpu: "0.5"
args:
—-cpus
—"2"

Step 3: Create the pod
Create the pod in your namespace using this command:

kubectl apply -f https://k8s.io/examples/pods/resource/cpu-request-limit.yaml --namespace=cpu-example

Step 4: View pod requests and limits
Run this command:

kubectl get pod cpu-demo --output=yaml --namespace=cpu-example

The output shows that the pod running in the cluster has a request of 0.5 CPU and a limit of 1 CPU.

resources:
limits:
cpu: "1"
requests:
cpu: 500m

Run this command to get actual runtime metrics for the pod:

kubectl top pod cpu-demo --namespace=cpu-example

The output will look something like this. The example below shows that the pod is actually using 0.974 of the CPU, which is slightly less than the limit. In this example, the application on the container is throttled by Kubernetes because we configured it to use 2 CPUs, but its limit allows it to use only one.

NAME                        CPU(cores)   MEMORY(bytes)
cpu-demo 974m [something]

Solving Kubernetes Node Errors with Komodor

Troubleshooting Kubernetes CPU issues requires visibility into Kubernetes cluster node, and the ability to correlate node status with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production.

Komodor can help with its ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can rapidly:

  • See service-to-node associations
  • Correlate service and node health issues
  • Gain visibility over node capacity allocations, restrictions, and limitations
  • Identify “noisy neighbors” that use up cluster resources
  • Keep track of changes in managed clusters
  • Get fast access to historical node-level event data

Beyond node error remediations, Komodor can help troubleshoot a variety of Kubernetes errors and issues. As the leading Continuous Kubernetes Reliability Platform, Komodor is designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.

Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.

By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.

Related content: Read our guide to Kubernetes RBAC

If you are interested in checking out Komodor, use this link to sign up for a Free Trial.