Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Automate and optimize AI/ML workloads on K8s
Easily manage Kubernetes Edge clusters
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Your single source of truth for everything regarding Komodor’s Platform.
Keep up with all the latest feature releases and product updates.
Leverage Komodor’s public APIs in your internal development workflows.
Get answers to any Komodor-related questions, report bugs, and submit feature requests.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
The OOMKilled status in Kubernetes, flagged by exit code 137, signifies that the Linux Kernel has halted a container because it has surpassed its allocated memory limit. In Kubernetes, each container within a pod can define two key memory-related parameters: a memory limit and a memory request. The memory limit is the ceiling of RAM usage that a container can reach before it is forcefully terminated, whereas the memory request is the baseline amount of memory that Kubernetes will guarantee for the container’s operation.
When a container attempts to consume more memory than its set limit the Linux OOM Killer changes the container status to ‘OOMKilled’, which prompts Kubernetes to terminate it. This mechanism prevents a single container from exhausting the node’s memory, which could compromise other containers running on the same node. In scenarios where the combined memory consumption of all pods on a node exceeds available memory, Kubernetes may terminate one or more pods to stabilize the node’s memory pressure.
To detect an OOMKilled event, use the kubectl get pods command, which will display the pod’s status as OOMKilled. For instance:
NAME READY STATUS RESTARTS AGE
my-pod-1 0/1 OOMKilled 0 3m12s
Resolving OOMKilled issues often starts with evaluating and adjusting the memory requests and limits of your containers and may also involve debugging memory spikes or leaks within the application. For in-depth cases that involve persistent or complex OOMKilled events, further detailed investigation and troubleshooting will be necessary.
OOMKilled is actually not native to Kubernetes—it is a feature of the Linux Kernel, known as the OOM Killer, which Kubernetes uses to manage container lifecycles. The OOM Killer mechanism monitors node memory and selects processes that are taking up too much memory, and should be killed. It is important to realize that OOM Killer may kill a process even if there is free memory on the node.
OOMKilled
OOM Killer
The Linux kernel maintains an oom_score for each process running on the host. The higher this score, the greater the chance that the process will be killed. Another value, called oom_score_adj, allows users to customize the OOM process and define when processes should be terminated.
oom_score
oom_score_adj
Kubernetes uses the oom_score_adj value when defining a Quality of Service (QoS) class for a pod. There are three QoS classes that may be assigned to a pod:
Each QoS class has a matching value for oom_score_adj:
Because “Guaranteed” pods have a lower value, they are the last to be killed on a node that is running out of memory. “BestEffort” pods are the first to be killed.
A pod that is killed due to a memory issue is not necessarily evicted from a node—if the restart policy on the node is set to “Always”, it will try to restart the pod.
To see the QoS class of a pod, run the following command:
Kubectl get pod -o jsonpath=’{.status.qosClass}’
To see the oom_score of a pod:
kubectl exec -it /bin/bash
cat/proc//oom_score
run cat/proc//oom_score_adj
The pod with the lowest oom_score is the first to be killed when the node runs out of memory.
This is part of a series of articles about Exit Codes.
The following table shows the common causes of this error and how to resolve it. However, note there are many more causes of OOMKilled errors, and many cases are difficult to diagnose and troubleshoot.
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better manage and resolve OOMKilled (Exit Code 137) errors in Kubernetes:
Use tools like Prometheus and Grafana to monitor and analyze memory usage trends over time.
Regularly adjust memory requests and limits based on historical usage data to prevent over- or under-allocation.
Optimize your application to use libraries and algorithms that consume less memory.
Automatically adjust memory limits and requests based on real-time usage to handle load changes effectively.
Use profiling tools like Heapster or the JVM’s built-in tools to detect and fix memory leaks in your application.
Run kubectl describe pod [name] and save the content to a text file for future reference:
kubectl describe pod [name]
kubectl describe pod [name] /tmp/troubleshooting_describe_pod.txt
Check the Events section of the describe pod text file, and look for the following message:
State: Running Started: Thu, 10 Oct 2019 11:14:13 +0200 Last State: Terminated Reason: OOMKilled Exit Code: 137 ...
Exit code 137 indicates that the container was terminated due to an out of memory issue. Now look through the events in the pod’s recent history, and try to determine what caused the OOMKilled error:
If the pod was terminated because container limit was reached:
If the pod was terminated because of overcommit on the node:
When adjusting memory requests and limits, keep in mind that when a node is overcommitted, Kubernetes kills nodes according to the following priority order:
To fully diagnose and resolve Kubernetes memory issues, you’ll need to monitor your environment, understand the memory behavior of pods and containers compared to the limits, and fine tune your settings. This can be a complex, unwieldy process without the right tooling.
The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong—simply because it can.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.
Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:
Share:
and start using Komodor in seconds!