Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Automate and optimize AI/ML workloads on K8s
Easily manage Kubernetes Edge clusters
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Your single source of truth for everything regarding Komodor’s Platform.
Keep up with all the latest feature releases and product updates.
Leverage Komodor’s public APIs in your internal development workflows.
Get answers to any Komodor-related questions, report bugs, and submit feature requests.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
The 503 Service Unavailable error is an HTTP status code that indicates the server is temporarily unavailable and cannot serve the client request. In a web server, this means the server is overloaded or undergoing maintenance. In Kubernetes, it means a Service tried to route a request to a pod, but something went wrong along the way:
503 errors are a severe issue that can result in disruption of service for users. Below we’ll show a procedure for troubleshooting these errors, and some tips for avoiding 503 service errors in the first place.
Keep in mind that it can be difficult to diagnose and resolve Service 503 messages in Kubernetes, because they can involve one or more moving parts in your Kubernetes cluster. It may be difficult to identify and resolve the root cause without proper tooling.
This is part of a series of articles about 5xx Server Errors.
A possible cause of 503 errors is that a Kubernetes pod does not have the expected label, and the Service selector does not identify it. If the Service does not find any matching pod, requests will return a 503 error.
Run the following command to see the current selector:
kubectl describe service [service-name] -n [namespace-name]
Example output:
Name: [service-name] Namespace: [pod-name] Labels: none Annotations: none Selector: [label] ...
The Selector volume shows which label or labels are used to match the Service with pods.
Check if there are pods with this label:
kubectl get pods -n your_namespace -l "[label]"
no resources found
In step 1 we checked which label the Service selector is using. Run the following command to ensure the pods matched by the selector are in Running state:
Running
kubectl -n your_namespace get pods -l "[label]"
The output will look like this:
NAME READY STATUS RESTARTS AGE my-pod-9ab66e7ee8-23978 0/1 ImagePullBackOff 0 5m10s
Next, we’ll check if a readiness probe is configured for the pod:
kubectl describe pod -n | grep -i readiness
Readiness: tcp-socket :8080 delay=10s timeout=1s period=2s #success=1 #failure=3 Warning Unhealthy 2m13s (x298 over 12m) kubelet Readiness probe failed:
If all the above steps did not discover a problem, another common cause of 503 errors is that no instances are registered with the load balancer. Check the following:
This procedure will help you discover the most basic issues that can result in a Service 503 error. If you didn’t manage to quickly identify the root cause, you will need a more in-depth investigation across multiple components in the Kubernetes deployment.
To complicate matters, more than one component might be malfunctioning (for example, both the pod and the Service), making diagnosis and remediation more difficult.
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better handle Kubernetes 503 Service Unavailable errors:
Ensure backend services are up and running and can handle incoming requests.
Check that readiness probes are correctly configured and reflect the actual readiness state of pods.
Use monitoring tools to track CPU, memory, and network usage to detect resource bottlenecks.
Ensure Ingress and service configurations are correct and properly routing traffic.
Analyze logs for errors or issues that might cause services to be unavailable.
Another common cause of 503 errors is that when Kubernetes terminates a pod, containers on the pod drop existing connections. Clients then receive a 503 response. This can be resolved by implementing graceful shutdown.
To understand the concept of graceful shutdown, let’s quickly review how Kubernetes shuts down containers. When a user or the Kubernetes scheduler requests deletion of a pod, the kubelet running on a node first sends a SIGTERM signal via the Linux operating system.
The container can register a handler for SIGTERM and perform some cleanup activity before shutting down. Then, after a configurable grace period, Kubernetes sends a SIGKILL signal and the container is forced to shut down.
Here are two ways to implement graceful shutdown in order to avoid a 503 error:
Kubernetes troubleshooting relies on the ability to quickly contextualize the problem with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production. The major challenge is correlating service-level incidents with other events happening in the underlying infrastructure. Service 503 errors are a prime example of an error that can occur at the service level, but can also represent a problem with underlying pods or nodes.
Komodor can help with our new ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can rapidly:
Beyond node error remediations, Komodor can help troubleshoot a variety of Kubernetes errors and issues, acting as a single source of truth (SSOT) for all of your K8s troubleshooting needs. Komodor provides:
Related content: Read our guide to Kubernetes 502 bad gateway.
Share:
and start using Komodor in seconds!