What is Kubernetes CrashLoopBackOff?
CrashLoopBackOff is a common error that you may have encountered when running your first containers on Kubernetes. This error indicates that a pod failed to start, Kubernetes tried to restart it, and it continued to fail repeatedly.
To make sure you are experiencing this error, run
kubectl get pods and check that the pod status is
What does CrashLoopBackOff mean anyway?
By default, a pod’s restart policy is
Always, meaning it should always restart on failure (other options are
OnFailure). Depending on the restart policy defined in the pod template, Kubernetes might try to restart the pod multiple times.
Every time the pod is restarted, Kubernetes waits for a longer and longer time, known as a “backoff delay”. During this process, Kubernetes displays the CrashLoopBackOff error.
This is part of an extensive series of guides about kubernetes troubleshooting.
Common Causes of CrashLoopBackOff and How to Fix Them
Errors When Deploying Kubernetes
A common reason pods in your Kubernetes cluster display a CrashLoopBackOff message is that Kubernetes springs deprecated versions of Docker. You can reveal the Docker version using -v checks against the containerization tool.
A best practice for fixing this error is ensuring you have the latest Docker version and the most stable versions of other plugins. Thus, you can prevent deprecated commands and inconsistencies that trip your containers into start-fail loops.
When migrating a project into a Kubernetes cluster, you might need to roll back several Docker versions to meet the incoming project’s version.
Issue with Third-Party Services (DNS Error)
CrashLoopBackOff error is caused by an issue with one of the third-party services. If this is the case, upon starting the pod you’ll see the message:
send request failure caused by: Post
Check the syslog and other container logs to see if this was caused by any of the issues we mentioned as causes of CrashLoopBackoff (e.g., locked or missing files). If not, then the problem could be with one of the third-party services.
To verify this, you’ll need to use a debugging container. A debug container works as a shell that can be used to login into the failing container. This works because both containers share a similar environment, so their behaviors are the same. Here is a link to one such shell you can use: ubuntu-network-troubleshooting.
Using the shell, log into your failing container and begin debugging as you normally would. Start with checking
kube-dns configurations, since a lot of third-party issues start with incorrect DNS settings.
The CrashLoopBackOff status can activate when Kubernetes cannot locate runtime dependencies (i.e., the var, run, secrets, kubernetes.io, or service account files are missing). This might occur when some containers inside the pod attempt to interact with an API without the default access token.
This scenario is possible if you manually create the pods using a unique API token to access cluster services. The missing service account file is the declaration of tokens needed to pass authentication.
You can fix this error by allowing all new –mount creations to adhere to the default access level throughout the pod space. Ensure that new pods using custom tokens comply with this access level to prevent continuous startup failures.
Changes Caused by Recent Updates
If you constantly update your clusters with new variables that spark resource requirements, they will likely encounter CrashLoopBackOff failures.
Suppose you have a shared master setup and run an update that restarts all the pod services. The result is several restart loops because Kubernetes must choose a master from the available options. You can fix this by changing the update procedure from a direct, all-encompassing one to a sequential one (i.e., applying changes separately in each pod). This approach makes it easier to troubleshoot the cause of the restart loop.
In some cases, CrashLoopBackOff can occur as a settling phase to the changes you make. The error resolves itself when the nodes eventually receive the right resources for a stable environment.
Container Failure due to Port Conflict
Let’s take another example in which the container failed due to a port conflict. To identify the issue you can pull the failed container by running
docker logs [container id].
Doing this will let you identify the conflicting service. Using
netstat, look for the corresponding container for that service and kill it with the
kill command. Delete the
kube-controller-manager pod and restart.
How to Troubleshoot CrashLoopBackOff
The best way to identify the root cause of the error is to start going through the list of potential causes and eliminate them one by one, starting with the most common ones first.
1. Check for “Back Off Restarting Failed Container”
kubectl describe pod [name].
If you get a
Liveness probe failed and
Back-off restarting failed container messages from the kubelet, as shown below, this indicates the container is not responding and is in the process of restarting.
From Message ----- ----- kubelet Liveness probe failed: cat: can’t open ‘/tmp/healthy’: No such file or directory kubelet Back-off restarting failed container
If you get the
back-off restarting failed container message this means that you are dealing with a temporary resource overload, as a result of an activity spike. The solution is to adjust
timeoutSeconds to give the application a longer window of time to respond.
If this was not the issue, proceed to the next step.
2. Check Logs From Previous Container Instance
If Kubernetes pod details didn’t provide any clues, your next step should be to pull information from the previous container instance.
You originally ran
kubectl get pods to identify the Kubernetes pod that was exhibiting the
CrashLoopBackOff error. You can run the following command to get the last ten log lines from the pod:
kubectl logs --previous --tail 10
Search the log for clues showing why the pod is repeatedly crashing. If you cannot resolve the issue, proceed to the next step.
3. Check Deployment Logs
Run the following command to retrieve the kubectl deployment logs:
kubectl logs -f deploy/ -n
This may also provide clues about issues at the application level. For example, below you can see a log file that shows ./ibdata1 can’t be mounted, likely because it’s already in use and locked by a different container.
[ERROR] [MY-012574] [InnoDB] Unable to lock ./ibdata1 error:11 [ERROR] [MY-012574] [InnoDB] Unable to lock ./ibdata1 error:11
Failing all the above, the next step is to bash into the CrashLoop container to see exactly what happened.
Why Fix CrashLoopBackOff When You Can Prevent It? 5 Tips for Prevention
An ounce of prevention is better than a pound of cure. Here is a list of best practices that can help you prevent the CrashLoopBackOff error from occurring in the first place.
1. Configure and Recheck Your Files
A misconfigured or missing configuration file can cause the
CrashLoopBackOff error, preventing the container from starting correctly. Before deployment, make sure all files are in place and configured correctly.
In most cases, files are stored in
/var/lib/docker. You can use commands like
find to verify if the target file exists. You can also use
less to investigate files and make sure that there are no misconfiguration issues.
2. Be Vigilant With Third-Party Services
If an application uses a third-party service and calls made to a service fail, then the service itself is the problem. Most errors are usually due to an error with the SSL certificate or network issues, so make sure those are functioning correctly. You can log into the container and manually reach the endpoints using
curl to check.
3. Check Your Environment Variables
Incorrect environment variables are a common cause of the
CrashLoopBackOff error. A common occurrence is when containers require Java to run, but their environment variables are not set properly. Use env to inspect the environment variables and make sure they’re correct.
4. Check Kube-DNS
The application may be trying to connect to an external service, but the
kube-dns service is not running. Here, you just need to restart the
kube-dns service so the container can connect to the external service.
5. Check File Locks
As mentioned before, file locks are a common reason for the
CrashLoopBackOff error. Ensure you inspect all ports and containers to see that none are being occupied by the wrong service. If they are, kill the service occupying the required port.
Troubleshoot CrashLoopBackoff – The Easy Way With Komodor
Komodor is a Kubernetes troubleshooting platform that turns hours of guesswork into actionable answers in just a few clicks. Using Komodor, you can monitor, alert and troubleshoot
For each K8s resource, Komodor automatically constructs a coherent view, including the relevant deploys, config changes, dependencies, metrics, and past incidents. Komodor seamlessly integrates and utilizes data from cloud providers, source controls, CI/CD pipelines, monitoring tools, and incident response platforms.
- Discover the root cause automatically with a timeline that tracks all changes in your application and infrastructure.
- Quickly tackle the issue, with easy-to-follow remediation instructions.
- Give your entire team a way to troubleshoot independently without escalating.