Pod in Pending State? Top 6 Causes and How to Resolve

What Is a Kubernetes Pod Pending State? 

In Kubernetes, a pod is the basic unit of deployment. A pod can have several states—the “Pending” state indicates that the pod is not yet running on a node, meaning it awaits assignment for execution. This situation typically arises when the scheduler has yet to allocate the necessary resources or lacks available nodes meeting the pod’s requirements. 

During this phase, the pod’s containers have not started, and any issues preventing scheduling need resolution. A pod often enters the pending state due to constraints such as insufficient compute resources, node selection criteria, or unmet storage demands.

This is part of a series of articles about Kubernetes troubleshooting

Common Causes of Pods Remaining in Pending State 

1. Insufficient Node Resources

A common cause for pods remaining in the pending state is insufficient node resources. Kubernetes requires adequate CPU and memory on nodes to launch new pods. When these resources are depleted, the scheduler cannot place pods, leaving them in  pending status. It’s critical to monitor the resource utilization across the cluster to ensure that nodes can accommodate new workloads as needed.

Resource over-commitment is another aspect leading to this state. If too many pods request more resources than nodes can supply, competition for these resources intensifies. Operators need to manage the distribution of workloads carefully, possibly reconfiguring resources or scaling up cluster capacity to handle the increased demand.

2. Node Not Ready or Unschedulable

Pods may remain  pending if nodes become unschedulable due to health issues or if administrators intentionally cordon nodes for maintenance. A node marked as “Not Ready” indicates it cannot host additional workloads, likely due to failed liveness or readiness probes. Identifying and resolving the underlying issues is crucial to restoring node functionality and freeing up resources for pod scheduling.

Network disruptions or component failures on nodes can also contribute to this scenario, rendering nodes temporarily unschedulable. Regular node health checks and automated recovery processes help mitigate these risks.

3. Node Selectors and Affinity Constraints

Pods can specify node selectors or affinity rules to control on which nodes they run. If these criteria are too restrictive, the scheduler may struggle to find an eligible node, extending the  pending state duration. Node selectors use labels to designate nodes while affinity constraints dictate more complex relationships like co-locating or separating pods.

Misconfigured labels or overly stringent rules may limit the scheduler’s choices, exacerbating resource scarcity and prolonging delays. Evaluating and adjusting these constraints allows for a more flexible deployment strategy.

4. Taints and Tolerations

Another reason pods might remain pending is the use of taints and tolerations. Taints are applied to nodes to repel unsuitable pods, while compatible pods bear corresponding tolerations to override these effects. Mismatched or missing tolerations cause pods to remain  pending, unable to land on any nodes. Correctly configuring these attributes is crucial for harmonious scheduling.

If taints and tolerations are overused or not properly aligned with the intended workflow, they can inadvertently restrict node availability. Establishing a simplified approach to applying these configurations balances node preferences with workload readiness.

5. PersistentVolumeClaim Issues

PersistentVolumeClaims (PVCs) are utilized by pods to request persistent storage. When the requested volumes are unattainable or improperly configured, pods tend to remain in the  pending state. This could arise due to storage class misconfigurations, unbound PersistentVolumes (PVs), or temporarily unavailable storage backends, which directly impact the pod’s ability to schedule.

Addressing PVC issues requires checking the existence and status of associated PVs, ensuring they are bound to the PVCs correctly. Verifying storage class parameters can also identify discrepancies that prevent successful bindings.

6. Image Pull Errors

Image pull errors can impede a pod from transitioning out of the  pending state as they prevent container images from loading successfully. Such errors often stem from incorrect image names, tag specifications, or authentication issues with private registries. Additionally, network issues can disrupt access to external container registries.

Diagnosing these errors starts with verifying image details in the pod specification and credentials set up for accessing the registry. Network connectivity checks can identify infrastructure-induced delays in image downloads.

Related content: Read our guide to pod status

expert-icon-header

Tips from the expert

Itiel Shwartz

Co-Founder & CTO

Itiel is the CTO and co-founder of Komodor. He’s a big believer in dev empowerment and moving fast, has worked at eBay, Forter and Rookout (as the founding engineer). Itiel is a backend and infra developer turned “DevOps”, an avid public speaker that loves talking about things such as cloud infrastructure, Kubernetes, Python, observability, and R&D culture.

In my experience, here are tips that can help you troubleshoot and prevent Kubernetes pods from remaining in the Pending state:

Enable resource reservations for system components:

Reserve CPU and memory resources for critical system components like the kubelet and the API server using --system-reserved or --kube-reserved flags. This prevents resource starvation that might delay pod scheduling.

Use LimitRange to set resource defaults:

Define default resource requests and limits using LimitRange in namespaces to ensure that pods always specify appropriate resource allocations. This helps prevent overcommitment or under-requesting of resources.

Implement overprovisioning nodes:

Deploy a small number of low-priority pods simulating load on nodes to ensure there is always some buffer capacity for higher-priority pods when needed. This approach works well in clusters with variable workloads.

Audit PVC binding modes:

Use the correct volumeBindingMode in the storage class (Immediate or WaitForFirstConsumer) to optimize how PersistentVolumeClaims are bound to PersistentVolumes, avoiding unnecessary delays in pod scheduling.

Optimize cluster autoscaler configurations:

Tune the cluster autoscaler to handle specific workload patterns by configuring features like scale-down thresholds and custom resource limits, ensuring quick reactions to pending pods caused by resource shortages.

Troubleshooting Pods Stuck in Pending State 

When pods remain in the pending state, identifying and addressing the root cause is crucial. Kubernetes offers various tools and methods to diagnose and resolve issues. Below are common troubleshooting steps with explanations and code examples.

1. Describe the Pod

The kubectl describe pod command provides detailed information about the Pod, including its status, conditions, and recent events.

Command:

kubectl describe pod <pod-name>

Example output (Events section):

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 30s (x2 over 1m) default-scheduler 0/5 nodes are available: 2 Insufficient CPU, 3 Insufficient memory.

The “Events” section in the output often indicates why the pod is not being scheduled. For instance, “FailedScheduling” could highlight resource constraints or node-related issues. Use this information to adjust resource requests or evaluate node availability.

2. Check Events Related to the Pod

Developers can list all events in the cluster and filter them to find those associated with the pod. This can provide additional context, especially if certain events are not captured in kubectl describe pod.

Command:

kubectl get events | grep <pod-name>

This approach supplements the describe output, helping to uncover environment-wide factors affecting pod scheduling, such as delays or conflicts in resource allocation.

3. Analyze Scheduler Logs

The Kubernetes scheduler logs provide a detailed view of scheduling operations, offering insights into why pods remain pending. This is particularly helpful for debugging complex scheduling scenarios.

Command:

kubectl -n kube-system logs $(kubectl -n kube-system get pods | grep scheduler | awk '{print $1}')

By reviewing scheduler logs, teams can pinpoint scheduling errors or constraints, such as affinity conflicts or node taints. This is an advanced troubleshooting step when basic methods fail to identify the issue.

4. Use Pod Priority or Daemonsets

Pod priority and preemption

Kubernetes makes it possible to assign higher priorities to critical pods to preempt less important workloads. This can prevent pending pod issues.

Example:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000
globalDefault: false
description: "This priority class is used for critical workloads."

Pods with this priority class can displace lower-priority pods  if resources are scarce.

DaemonSets

For system-critical pods, DaemonSets ensure deployment to specified nodes regardless of other scheduling constraints.

Example:

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: critical-service
spec:
selector:
matchLabels:
app: critical-service
template:
metadata:
labels:
app: critical-service
spec:
containers:
- name: critical-service-container
image: my-critical-service:latest

5. Check Node Status and Capacity

Inspecting the status and capacity of nodes is essential for identifying resource constraints that could prevent pods from being scheduled. Use the following commands to analyze node conditions, resource availability, and allocations.

Command: Check Node Status

kubectl get nodes

Example output:

NAME          STATUS   ROLES    AGE     VERSION
node-1 Ready worker 15d v1.26.0
node-2 Ready worker 10d v1.26.0
node-3 NotReady worker 12d v1.26.0

The STATUS column shows the readiness of nodes. Nodes marked as NotReady are unavailable for scheduling. Investigate these nodes by examining their conditions.

Command: Describe a Node

kubectl describe node <node-name>

Example output (truncated):

Conditions:
Type Status LastHeartbeatTime Reason
---- ------ ----------------- ------
MemoryPressure False 2024-12-30T10:32:45Z KubeletHasSufficientMemory
DiskPressure False 2024-12-30T10:32:45Z KubeletHasNoDiskPressure
PIDPressure False 2024-12-30T10:32:45Z KubeletHasSufficientPID
Ready True 2024-12-30T10:32:45Z KubeletReady

Allocatable:
cpu: 4
memory: 16Gi
pods: 110

The Conditions section provides insights into issues like memory, disk, or CPU pressure. Allocatable shows the resources available for scheduling.

Command: View Node Resource Usage

To evaluate resource usage and detect overcommitment, check node metrics (requires Metrics Server):

kubectl top nodes

Example output:

NAME          CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
node-1 1200m 30% 8Gi 50%
node-2 900m 22% 6Gi 37%
node-3 - - - -

The percentage of resource usage indicates how heavily loaded a node is. Nodes nearing 100% utilization might not accommodate new pods.

Command: List Node Taints

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Example output:

NAME          TAINTS
node-1 <none>
node-2 key=value:NoSchedule
node-3 key=value:NoExecute

Taints applied to nodes can restrict scheduling. Pods require matching tolerations to be scheduled on such nodes.

Addressing node issues

  • Resolve NotReady status: Investigate the node’s health by checking system logs, verifying connectivity, or restarting the kubelet.
  • Scale resources: Add nodes to the cluster or resize existing nodes to provide additional capacity.
  • Modify taints and tolerations: Adjust taints or configure pod tolerations to align with workload requirements.

Kubernetes Troubleshooting with Komodor

Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.

Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. 

By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial.