Komodor is an autonomous AI SRE platform for Kubernetes. Powered by Klaudia, it’s an agentic AI solution for visualizing, troubleshooting and optimizing cloud-native infrastructure, allowing enterprises to operate Kubernetes at scale.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Guides, blogs, webinars & tools to help you troubleshoot and scale Kubernetes.
Tips, trends, and lessons from the field.
Practical guides for real-world K8s ops.
How it works, how to run it, and how not to break it.
Short, clear articles on Kubernetes concepts, best practices, and troubleshooting.
Infra stories from teams like yours, brief, honest, and right to the point.
Product-focused clips showing Komodor in action, from drift detection to add‑on support.
Live demos, real use cases, and expert Q&A, all up-to-date.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Here’s what they’re saying about Komodor in the news.
As Kubernetes production deployments surge, understanding and preventing OOMKilled errors has become critical for the 93% of companies now using this orchestration platform.
The OOMKilled status in Kubernetes, flagged by exit code 137, signifies that the Linux Kernel has halted a container because it has surpassed its allocated memory limit.
In Kubernetes, each container within a pod can define two key memory-related parameters: a memory limit and a memory request. The memory limit is the ceiling of RAM usage that a container can reach before it is forcefully terminated, whereas the memory request is the baseline amount of memory that Kubernetes will guarantee for the container’s operation.
When a container attempts to consume more memory than its set limit the Linux OOM Killer changes the container status to ‘OOMKilled’, which prompts Kubernetes to terminate it. This mechanism prevents a single container from exhausting the node’s memory, which could compromise other containers running on the same node.
In scenarios where the combined memory consumption of all pods on a node exceeds available memory, Kubernetes may terminate one or more pods to stabilize the node’s memory pressure.
To detect an OOMKilled event, use the kubectl get pods command, which will display the pod’s status as OOMKilled. For instance:
NAME READY STATUS RESTARTS AGE
my-pod-1 0/1 OOMKilled 0 3m12s
Resolving OOMKilled issues often starts with evaluating and adjusting the memory requests and limits of your containers and may also involve debugging memory spikes or leaks within the application. For in-depth cases that involve persistent or complex OOMKilled events, further detailed investigation and troubleshooting will be necessary.
OOMKilled is actually not native to Kubernetes—it is a feature of the Linux Kernel, known as the OOM Killer, which Kubernetes uses to manage container lifecycles. The OOM Killer mechanism monitors node memory and selects processes that are taking up too much memory, and should be killed. It is important to realize that OOM Killer may kill a process even if there is free memory on the node.
OOMKilled
OOM Killer
The Linux kernel maintains an oom_score for each process running on the host. The higher this score, the greater the chance that the process will be killed. Another value, called oom_score_adj, allows users to customize the OOM process and define when processes should be terminated.
oom_score
oom_score_adj
Kubernetes uses the oom_score_adj value when defining a Quality of Service (QoS) class for a pod. There are three QoS classes that may be assigned to a pod:
Each QoS class has a matching value for oom_score_adj:
Because “Guaranteed” pods have a lower value, they are the last to be killed on a node that is running out of memory. “BestEffort” pods are the first to be killed.
A pod that is killed due to a memory issue is not necessarily evicted from a node—if the restart policy on the node is set to “Always”, it will try to restart the pod.
To see the QoS class of a pod, run the following command:
Kubectl get pod -o jsonpath=’{.status.qosClass}’
To see the oom_score of a pod:
kubectl exec -it /bin/bash
cat/proc//oom_score
run cat/proc//oom_score_adj
The pod with the lowest oom_score is the first to be killed when the node runs out of memory.
This is part of a series of articles about Exit Codes.
The following table shows the common causes of this error and how to resolve it. However, note there are many more causes of OOMKilled errors, and many cases are difficult to diagnose and troubleshoot.
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Reason: Evicted
The node was low on resource: memory
terminationGracePeriodSeconds
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better manage and resolve OOMKilled (Exit Code 137) errors in Kubernetes:
Use tools like Prometheus and Grafana to monitor and analyze memory usage trends over time.
Regularly adjust memory requests and limits based on historical usage data to prevent over- or under-allocation.
Optimize your application to use libraries and algorithms that consume less memory.
Automatically adjust memory limits and requests based on real-time usage to handle load changes effectively.
Use profiling tools like JVM’s built-in tools to detect and fix memory leaks in your application.
Run kubectl describe pod [name] and save the content to a text file for future reference:
kubectl describe pod [name]
kubectl describe pod [name] /tmp/troubleshooting_describe_pod.txt
Check the Events section of the describe pod text file, and look for the following message:
State: Running Started: Thu, 10 Oct 2019 11:14:13 +0200 Last State: Terminated Reason: OOMKilled Exit Code: 137 ...
Exit code 137 indicates that the container was terminated due to an out of memory issue. Now look through the events in the pod’s recent history, and try to determine what caused the OOMKilled error:
Need change correlation + blast radius in minutes? Komodor correlates deploys, config changes, logs, events, and dependencies so you can pinpoint what triggered the OOMKilled and what else it impacted. Explore full cloud-native troubleshooting and their Autonomous AI SRE platform.
If the pod was terminated because container limit was reached:
If the pod was terminated because of overcommit on the node:
When adjusting memory requests and limits, keep in mind that when a node is overcommitted, Kubernetes kills nodes according to the following priority order:
To fully diagnose and resolve Kubernetes memory issues, you’ll need to monitor your environment, understand the memory behavior of pods and containers compared to the limits, and fine tune your settings. This can be a complex, unwieldy process without the right tooling.
The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong—simply because it can.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.
Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:
No. Exit code 137 means the process received SIGKILL. If it was a memory limit OOM, you’ll typically see Reason: OOMKilled in kubectl describe pod. If you don’t see that reason, treat 137 as “killed” and confirm the real trigger via Events.
kubectl describe pod
kubectl describe pod -n
Related: Exit codes in Docker and Kubernetes
Look at each container’s status under containerStatuses. The container that was OOMKilled will show lastState.terminated.reason as OOMKilled and exitCode as 137.
containerStatuses
lastState.terminated.reason
exitCode
137
kubectl get pod -n -o jsonpath='{range .status.containerStatuses[*]}{.name}{"\t"}{.lastState.terminated.reason}{"\t"}{.lastState.terminated.exitCode}{"\n"}{end}'
Then pull logs for the specific container:
kubectl logs -n -c --previou
OOMKilled usually means a container exceeded its memory limit and the kernel killed it. Evicted means the kubelet proactively terminated pods due to node pressure (for example, the node is low on memory) to keep the node stable.
# Check pod events for "Evicted" messaging kubectl describe pod -n # Check node conditions (MemoryPressure, etc.) kubectl describe node
Kubernetes assigns a QoS class based on your requests/limits: BestEffort, Burstable, or Guaranteed. Under node pressure, Kubernetes evicts BestEffort pods first, then Burstable, and Guaranteed last.
Practical tip: For critical workloads, set memory request and limit (and consider making them equal) so you don’t end up as BestEffort.
Start with measured reality, not guesses:
Then adjust after you’ve observed memory over time (especially after releases and traffic spikes).
Two common reasons:
Confirm which path you’re on via pod Events and node conditions.
First, stabilize so you can debug:
# Scale down (Deployment example) kubectl scale deploy/ -n --replicas=0
Then fix the root cause: memory leak, unbounded cache, oversized payloads, or a regression introduced by a recent deploy.
VPA helps when your resource settings are consistently wrong and you want automated recommendations/updates for CPU/memory. It’s best for workloads with changing but trackable usage patterns.
Expect restarts when VPA applies updates (plan rollouts accordingly).
If you see Reason: OOMKilled on the container status, it strongly points to the container limit path. If you see eviction messaging (or node MemoryPressure), it’s likely node pressure/overcommit.
MemoryPressure
kubectl describe pod -n kubectl describe node
That combo catches “about to OOM” and “about to be evicted” scenarios early.
Share:
Gain instant visibility into your clusters and resolve issues faster.