Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Automatically analyze and reconcile drift across your flee
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Meet Klaudia, Your AI-powered SRE Agent
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Automate and optimize AI/ML workloads on K8s
Easily manage Kubernetes Edge clusters
Smooth Operations of Large Scale K8s Fleets
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Your single source of truth for everything regarding Komodor’s Platform.
Keep up with all the latest feature releases and product updates.
Leverage Komodor’s public APIs in your internal development workflows.
Get answers to any Komodor-related questions, report bugs, and submit feature requests.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
In Kubernetes, a pod is the basic unit of deployment. A pod can have several states—the “Pending” state indicates that the pod is not yet running on a node, meaning it awaits assignment for execution. This situation typically arises when the scheduler has yet to allocate the necessary resources or lacks available nodes meeting the pod’s requirements.
During this phase, the pod’s containers have not started, and any issues preventing scheduling need resolution. A pod often enters the pending state due to constraints such as insufficient compute resources, node selection criteria, or unmet storage demands.
This is part of a series of articles about Kubernetes troubleshooting
A common cause for pods remaining in the pending state is insufficient node resources. Kubernetes requires adequate CPU and memory on nodes to launch new pods. When these resources are depleted, the scheduler cannot place pods, leaving them in pending status. It’s critical to monitor the resource utilization across the cluster to ensure that nodes can accommodate new workloads as needed.
Resource over-commitment is another aspect leading to this state. If too many pods request more resources than nodes can supply, competition for these resources intensifies. Operators need to manage the distribution of workloads carefully, possibly reconfiguring resources or scaling up cluster capacity to handle the increased demand.
Pods may remain pending if nodes become unschedulable due to health issues or if administrators intentionally cordon nodes for maintenance. A node marked as “Not Ready” indicates it cannot host additional workloads, likely due to failed liveness or readiness probes. Identifying and resolving the underlying issues is crucial to restoring node functionality and freeing up resources for pod scheduling.
Network disruptions or component failures on nodes can also contribute to this scenario, rendering nodes temporarily unschedulable. Regular node health checks and automated recovery processes help mitigate these risks.
Pods can specify node selectors or affinity rules to control on which nodes they run. If these criteria are too restrictive, the scheduler may struggle to find an eligible node, extending the pending state duration. Node selectors use labels to designate nodes while affinity constraints dictate more complex relationships like co-locating or separating pods.
Misconfigured labels or overly stringent rules may limit the scheduler’s choices, exacerbating resource scarcity and prolonging delays. Evaluating and adjusting these constraints allows for a more flexible deployment strategy.
Another reason pods might remain pending is the use of taints and tolerations. Taints are applied to nodes to repel unsuitable pods, while compatible pods bear corresponding tolerations to override these effects. Mismatched or missing tolerations cause pods to remain pending, unable to land on any nodes. Correctly configuring these attributes is crucial for harmonious scheduling.
If taints and tolerations are overused or not properly aligned with the intended workflow, they can inadvertently restrict node availability. Establishing a simplified approach to applying these configurations balances node preferences with workload readiness.
PersistentVolumeClaims (PVCs) are utilized by pods to request persistent storage. When the requested volumes are unattainable or improperly configured, pods tend to remain in the pending state. This could arise due to storage class misconfigurations, unbound PersistentVolumes (PVs), or temporarily unavailable storage backends, which directly impact the pod’s ability to schedule.
Addressing PVC issues requires checking the existence and status of associated PVs, ensuring they are bound to the PVCs correctly. Verifying storage class parameters can also identify discrepancies that prevent successful bindings.
Image pull errors can impede a pod from transitioning out of the pending state as they prevent container images from loading successfully. Such errors often stem from incorrect image names, tag specifications, or authentication issues with private registries. Additionally, network issues can disrupt access to external container registries.
Diagnosing these errors starts with verifying image details in the pod specification and credentials set up for accessing the registry. Network connectivity checks can identify infrastructure-induced delays in image downloads.
Related content: Read our guide to pod status
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you troubleshoot and prevent Kubernetes pods from remaining in the Pending state:
Pending
Reserve CPU and memory resources for critical system components like the kubelet and the API server using --system-reserved or --kube-reserved flags. This prevents resource starvation that might delay pod scheduling.
--system-reserved
--kube-reserved
LimitRange
Define default resource requests and limits using LimitRange in namespaces to ensure that pods always specify appropriate resource allocations. This helps prevent overcommitment or under-requesting of resources.
Deploy a small number of low-priority pods simulating load on nodes to ensure there is always some buffer capacity for higher-priority pods when needed. This approach works well in clusters with variable workloads.
Use the correct volumeBindingMode in the storage class (Immediate or WaitForFirstConsumer) to optimize how PersistentVolumeClaims are bound to PersistentVolumes, avoiding unnecessary delays in pod scheduling.
volumeBindingMode
Immediate
WaitForFirstConsumer
Tune the cluster autoscaler to handle specific workload patterns by configuring features like scale-down thresholds and custom resource limits, ensuring quick reactions to pending pods caused by resource shortages.
When pods remain in the pending state, identifying and addressing the root cause is crucial. Kubernetes offers various tools and methods to diagnose and resolve issues. Below are common troubleshooting steps with explanations and code examples.
The kubectl describe pod command provides detailed information about the Pod, including its status, conditions, and recent events.
kubectl describe
Command:
kubectl describe pod <pod-name>
Example output (Events section):
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 30s (x2 over 1m) default-scheduler 0/5 nodes are available: 2 Insufficient CPU, 3 Insufficient memory.
The “Events” section in the output often indicates why the pod is not being scheduled. For instance, “FailedScheduling” could highlight resource constraints or node-related issues. Use this information to adjust resource requests or evaluate node availability.
Developers can list all events in the cluster and filter them to find those associated with the pod. This can provide additional context, especially if certain events are not captured in kubectl describe pod.
kubectl describe pod
kubectl get events | grep <pod-name>
This approach supplements the describe output, helping to uncover environment-wide factors affecting pod scheduling, such as delays or conflicts in resource allocation.
describe
The Kubernetes scheduler logs provide a detailed view of scheduling operations, offering insights into why pods remain pending. This is particularly helpful for debugging complex scheduling scenarios.
kubectl -n kube-system logs $(kubectl -n kube-system get pods | grep scheduler | awk '{print $1}')
By reviewing scheduler logs, teams can pinpoint scheduling errors or constraints, such as affinity conflicts or node taints. This is an advanced troubleshooting step when basic methods fail to identify the issue.
Pod priority and preemption
Kubernetes makes it possible to assign higher priorities to critical pods to preempt less important workloads. This can prevent pending pod issues.
Example:
apiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata: name: high-priorityvalue: 1000globalDefault: falsedescription: "This priority class is used for critical workloads."
Pods with this priority class can displace lower-priority pods if resources are scarce.
DaemonSets
For system-critical pods, DaemonSets ensure deployment to specified nodes regardless of other scheduling constraints.
apiVersion: apps/v1kind: DaemonSetmetadata: name: critical-servicespec: selector: matchLabels: app: critical-service template: metadata: labels: app: critical-service spec: containers: - name: critical-service-container image: my-critical-service:latest
Inspecting the status and capacity of nodes is essential for identifying resource constraints that could prevent pods from being scheduled. Use the following commands to analyze node conditions, resource availability, and allocations.
kubectl get nodes
Example output:
NAME STATUS ROLES AGE VERSIONnode-1 Ready worker 15d v1.26.0node-2 Ready worker 10d v1.26.0node-3 NotReady worker 12d v1.26.0
The STATUS column shows the readiness of nodes. Nodes marked as NotReady are unavailable for scheduling. Investigate these nodes by examining their conditions.
STATUS
NotReady
kubectl describe node <node-name>
Example output (truncated):
Conditions: Type Status LastHeartbeatTime Reason ---- ------ ----------------- ------ MemoryPressure False 2024-12-30T10:32:45Z KubeletHasSufficientMemory DiskPressure False 2024-12-30T10:32:45Z KubeletHasNoDiskPressure PIDPressure False 2024-12-30T10:32:45Z KubeletHasSufficientPID Ready True 2024-12-30T10:32:45Z KubeletReadyAllocatable: cpu: 4 memory: 16Gi pods: 110
The Conditions section provides insights into issues like memory, disk, or CPU pressure. Allocatable shows the resources available for scheduling.
Conditions
Allocatable
To evaluate resource usage and detect overcommitment, check node metrics (requires Metrics Server):
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%node-1 1200m 30% 8Gi 50%node-2 900m 22% 6Gi 37%node-3 - - - -
The percentage of resource usage indicates how heavily loaded a node is. Nodes nearing 100% utilization might not accommodate new pods.
Command: List Node Taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
NAME TAINTSnode-1 <none>node-2 key=value:NoSchedulenode-3 key=value:NoExecute
Taints applied to nodes can restrict scheduling. Pods require matching tolerations to be scheduled on such nodes.
Addressing node issues
Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.
Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance.
By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Share:
and start using Komodor in seconds!