Komodor is an autonomous AI SRE platform for Kubernetes. Powered by Klaudia, it’s an agentic AI solution for visualizing, troubleshooting and optimizing cloud-native infrastructure, allowing enterprises to operate Kubernetes at scale.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Guides, blogs, webinars & tools to help you troubleshoot and scale Kubernetes.
Tips, trends, and lessons from the field.
Practical guides for real-world K8s ops.
How it works, how to run it, and how not to break it.
Short, clear articles on Kubernetes concepts, best practices, and troubleshooting.
Infra stories from teams like yours, brief, honest, and right to the point.
Product-focused clips showing Komodor in action, from drift detection to add‑on support.
Live demos, real use cases, and expert Q&A, all up-to-date.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Here’s what they’re saying about Komodor in the news.
With 67% of organizations delaying deployments and misconfigurations affecting 40% of Kubernetes environments, understanding and resolving CrashLoopBackOff errors maintains production reliability and avoids the costly business impacts that plague 90% of organizations running Kubernetes workloads.
When a Kubernetes container repeatedly fails to start, it enters a ‘CrashLoopBackOff’ state, indicating a persistent restart loop within a pod. This error often occurs due to various issues preventing the container from launching properly.
Common causes include insufficient memory or resource overload, deployment errors, third-party service issues like DNS errors, missing dependencies, or container failures due to port conflicts.
To confirm this error, execute ‘kubectl get pods’ and verify if the pod status is ‘CrashLoopBackOff’. Addressing the root cause, whether by adjusting resources, resolving dependencies, or updating configurations, is essential for resolving the error and stabilizing the pod.
By default, a pod’s restart policy is Always, meaning it should always restart on failure (other options are Never or OnFailure). Depending on the restart policy defined in the pod template, Kubernetes might try to restart the pod multiple times. When a Pod state is displaying CrashLoopBackOff, it means that it’s currently waiting the indicated time before restarting the pod again.
Every time the pod is restarted, Kubernetes waits for a longer and longer time, known as a “backoff delay”. The delay between restarts is exponential (10s, 20s, 40s, …) and is capped at five minutes. During this process, Kubernetes displays the CrashLoopBackOff error.
This is part of an extensive series of guides about kubernetes troubleshooting.
“CrashLoopBackOff” can occur when a pod fails to start for some reason, because a container fails to start up properly and repeatedly crashes. Let’s review the common causes of repeated container crashes.
Start with kubectl describe pod (Events) and kubectl logs –previous (last crash), then match what you see to the table below.
One of the common causes of the CrashLoopBackOff error is resource overload or insufficient memory. Kubernetes allows setting memory and CPU usage limits for each pod, which means your application might be crashing due to insufficient resources. This might happen due to memory leaks in your program, misconfigured resource requests and limits, or simply because your application requires more resources than are available on the node.
When the node that the pod is running on doesn’t have enough resources, the pod can be evicted and moved to a different node. If none of the nodes have sufficient resources, the pod can go into a CrashLoopBackOff state.
To resolve this issue, you need to understand the resource usage of your application and set the appropriate resource requests and limits. You can use the kubectl describe pod [pod_name] command to check if the pod was evicted due to insufficient memory.
You can also monitor the memory and CPU usage of your pods using Kubernetes metrics server or other monitoring tools like Prometheus. If your application is consistently using more resources than allocated, you might need to optimize your application, allocate more resources, or change resources:limits in the Container’s resource manifest.
In Kubernetes v1.24+, the kubelet talks to a CRI runtime (most commonly containerd or CRI-O). If the node runtime is misconfigured, unhealthy, or mismatched with kubelet expectations, containers may fail to start and end up in restart loops.
If you see docker://… on a cluster running Kubernetes v1.24+, that’s a red flag. Plan a migration to a supported CRI runtime.
Sometimes, the CrashLoopBackOff error is caused by an issue with one of the third-party services. If this is the case, upon starting the pod you’ll see the message:
CrashLoopBackOff
send request failure caused by: Post
Check the syslog and other container logs to see if this was caused by any of the issues we mentioned as causes of CrashLoopBackoff (e.g., locked or missing files). If not, then the problem could be with one of the third-party services.
To verify this, you’ll need to use a debugging container. A debug container works as a shell that can be used to login into the failing container. This works because both containers share a similar environment, so their behaviors are the same. Here is a link to one such shell you can use: ubuntu-network-troubleshooting.
Using the shell, log into your failing container and begin debugging as you normally would. Start with checking kube-dns configurations, since a lot of third-party issues start with incorrect DNS settings.
kube-dns
The CrashLoopBackOff status can activate when Kubernetes cannot locate runtime dependencies (i.e., the var, run, secrets, kubernetes.io, or service account files are missing). This might occur when some containers inside the pod attempt to interact with an API without the default access token.
This scenario is possible if you manually create the pods using a unique API token to access cluster services. The missing service account file is the declaration of tokens needed to pass authentication.
You can fix this error by allowing all new –mount creations to adhere to the default access level throughout the pod space. Ensure that new pods using custom tokens comply with this access level to prevent continuous startup failures.
If you constantly update your clusters with new variables that spark resource requirements, they will likely encounter CrashLoopBackOff failures.
Suppose you have a shared master setup and run an update that restarts all the pod services. The result is several restart loops because Kubernetes must choose a master from the available options. You can fix this by changing the update procedure from a direct, all-encompassing one to a sequential one (i.e., applying changes separately in each pod). This approach makes it easier to troubleshoot the cause of the restart loop.
In some cases, CrashLoopBackOff can occur as a settling phase to the changes you make. The error resolves itself when the nodes eventually receive the right resources for a stable environment.
To identify the failure, start with Kubernetes-native logs:
kubectl logs <pod-name> -c <container-name> --previous --tail=50kubectl describe pod <pod-name> | sed -n '/Events/,$p'
If you need to debug at the node runtime layer (containerd/CRI-O), use crictl (CRI tools).
For port conflicts specifically, prefer diagnosing inside the pod (what is trying to bind which port) rather than killing random host processes. If you truly suspect a host-level port conflict, confirm on the node (tooling varies by distro), then fix the underlying host service or daemonset using that port.
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better manage and resolve CrashLoopBackOff errors in Kubernetes:
Use `kubectl describe pod ` to inspect events and error messages for root causes.
Ensure probes are correctly configured to avoid premature restarts.
Use resource requests and limits to prevent resource starvation and over-allocation.
Ensure init containers complete successfully, as their failure can cause the main container to crash.
For stateful applications, use StatefulSets for better pod management and recovery.
The best way to identify the root cause of the error is to start going through the list of potential causes and eliminate them one by one, starting with the most common ones first.
Note (Kubernetes v1.24+): Kubernetes no longer includes dockershim, so nodes typically run a CRI runtime like containerd or CRI-O. Use kubectl logs for application logs and crictl for node runtime debugging when needed.
Run kubectl describe pod [name].
kubectl describe pod [name]
If you get a Liveness probe failed and Back-off restarting failed container messages from the kubelet, as shown below, this indicates the container is not responding and is in the process of restarting.
Liveness probe failed
Back-off restarting failed container
From Message ----- ----- kubelet Liveness probe failed: cat: can’t open ‘/tmp/healthy’: No such file or directory kubelet Back-off restarting failed container
If you get the back-off restarting failed container message this means that you are dealing with a temporary resource overload, as a result of an activity spike. The solution is to adjust periodSeconds or timeoutSeconds to give the application a longer window of time to respond.
back-off restarting failed container
periodSeconds
timeoutSeconds
If this was not the issue, proceed to the next step.
If Kubernetes pod details didn’t provide any clues, your next step should be to pull information from the previous container instance.
You originally ran kubectl get pods to identify the Kubernetes pod that was exhibiting the CrashLoopBackOff error. You can run the following command to get the last ten log lines from the pod:
kubectl get pods
kubectl logs --previous --tail 10
Search the log for clues showing why the pod is repeatedly crashing. If you cannot resolve the issue, proceed to the next step.
Run the following command to retrieve the kubectl deployment logs:
kubectl logs -f deploy/ -n
This may also provide clues about issues at the application level. For example, below you can see a log file that shows ./ibdata1 can’t be mounted, likely because it’s already in use and locked by a different container.
[ERROR] [MY-012574] [InnoDB] Unable to lock ./ibdata1 error:11 [ERROR] [MY-012574] [InnoDB] Unable to lock ./ibdata1 error:11
Failing all the above, the next step is to bash into the CrashLoop container to see exactly what happened.
In most cases, restarting the pod and deploying a new version will resolve the problem and keep the application online. However, it is important to identify the root cause of the CrashLoopBackOff error and prevent it in the first place. Here is a list of best practices that can help you prevent the CrashLoopBackOff error.
A misconfigured or missing configuration file can cause the CrashLoopBackOff error, preventing the container from starting correctly. Before deployment, make sure all files are in place and configured correctly.
Most CrashLoopBackOff “missing file” problems come from volume mounts (ConfigMaps, Secrets, PVCs) or an incorrect mountPath. First, confirm what Kubernetes intended to mount:
kubectl describe pod | sed -n '/Mounts:/,/Conditions:/p'
Then verify the file exists inside the container at the mount path:
kubectl exec -it -c -- ls -lakubectl exec -it -c -- cat /
If an application uses a third-party service and calls made to a service fail, then the service itself is the problem. Most errors are usually due to an error with the SSL certificate or network issues, so make sure those are functioning correctly. You can log into the container and manually reach the endpoints using curl to check.
curl
Incorrect environment variables are a common cause of the CrashLoopBackOff error. A common occurrence is when containers require Java to run, but their environment variables are not set properly. Use env to inspect the environment variables and make sure they’re correct.
The application may be trying to connect to an external service, but the kube-dns service is not running. Here, you just need to restart the kube-dns service so the container can connect to the external service.
As mentioned before, file locks are a common reason for the CrashLoopBackOff error. Ensure you inspect all ports and containers to see that none are being occupied by the wrong service. If they are, kill the service occupying the required port.
Komodor is a Kubernetes troubleshooting platform that turns hours of guesswork into actionable answers in just a few clicks. Using Komodor, you can monitor, alert and troubleshoot CrashLoopBackOff events.
For each K8s resource, Komodor automatically constructs a coherent view, including the relevant deploys, config changes, dependencies, metrics, and past incidents. Komodor seamlessly integrates and utilizes data from cloud providers, source controls, CI/CD pipelines, monitoring tools, and incident response platforms.
These are the most common questions people ask while debugging CrashLoopBackOff. Each answer includes the fastest command(s) to confirm what’s happening.
CrashLoopBackOff means a container is repeatedly crashing and Kubernetes is applying an increasing delay (“backoff”) before trying to restart it again. It’s a symptom, not the root cause. Start by checking Events and the previous container logs.
kubectl describe pod <pod-name> kubectl logs <pod-name> -c <container-name> --previous --tail=50
Use kubectl describe to read the Events (often the quickest root-cause hint), then use kubectl logs --previous to see logs from the container instance that just crashed.
kubectl describe
kubectl logs --previous
kubectl describe pod <pod-name> kubectl logs <pod-name> -c <container-name> --previous
This message is emitted when Kubernetes sees repeated container failures and starts delaying restarts using an exponential backoff. It typically appears alongside CrashLoopBackOff and often shows up in the Pod’s Events.
kubectl describe pod <pod-name>
If a liveness probe fails repeatedly, the kubelet restarts the container, which can create a crash loop. If your app is slow to start, add a startupProbe so liveness doesn’t kill it during warm-up, or relax probe timeouts/thresholds.
kubectl get pod <pod-name> -o yaml # inspect probes kubectl describe pod <pod-name> # see probe failures in Events
If an init container fails, the main container may never start. Check the init container status in describe, then read the init container logs directly.
describe
kubectl describe pod <pod-name> kubectl logs <pod-name> -c <init-container-name> --previous --tail=200
OOMKilled (often exit code 137) indicates the container was killed due to memory limits or node memory pressure. Look for “OOMKilled” in the container status and Events, then adjust memory requests/limits and investigate memory spikes or leaks.
kubectl describe pod <pod-name> # look for Reason: OOMKilled / Exit Code: 137 kubectl logs <pod-name> -c <container-name> --previous
This usually means the Pod references a ConfigMap or Secret that doesn’t exist (wrong name, wrong namespace, or missing key). The exact missing object is typically listed in Events.
kubectl describe pod <pod-name> kubectl get configmap,secret -n <namespace>
This points to a bad command/args, a missing binary inside the image, or a path issue. Confirm by checking the previous crash logs, then verify the container command/args in the Pod spec (or remove overrides).
command
args
kubectl logs <pod-name> -c <container-name> --previous --tail=100 kubectl get pod <pod-name> -o yaml
If the container crashes quickly, the current instance may not have logs yet. Use --previous to fetch logs from the last crashed instance, and include -c if the Pod has multiple containers.
--previous
-c
kubectl logs <pod-name> -c <container-name> --previous
In practice: Events plus previous logs. Events tell you what Kubernetes observed (probe failures, missing secrets, image errors). Previous logs tell you what the app did right before it died.
Share:
Gain instant visibility into your clusters and resolve issues faster.