Komodor is an autonomous AI SRE platform for Kubernetes. Powered by Klaudia, it’s an agentic AI solution for visualizing, troubleshooting and optimizing cloud-native infrastructure, allowing enterprises to operate Kubernetes at scale.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Guides, blogs, webinars & tools to help you troubleshoot and scale Kubernetes.
Tips, trends, and lessons from the field.
Practical guides for real-world K8s ops.
How it works, how to run it, and how not to break it.
Short, clear articles on Kubernetes concepts, best practices, and troubleshooting.
Infra stories from teams like yours, brief, honest, and right to the point.
Product-focused clips showing Komodor in action, from drift detection to add‑on support.
Live demos, real use cases, and expert Q&A, all up-to-date.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Discover our events, webinars and other ways to connect.
Here’s what they’re saying about Komodor in the news.
Join the Komodor partner program and accelerate growth.
It starts with a scale-down event or a memory spike. Kubernetes sends SIGTERM. The grace period ticks down. The container doesn’t exit. Thirty seconds later, SIGKILL fires and whatever state that container was holding is gone.
This scenario plays out across production clusters every day. Understanding the full signal lifecycle is how you engineer around it.
SIGKILL is a type of communication, known as a signal, used in Unix or Unix-like operating systems like Linux to immediately terminate a process. It is used by Linux operators, and also by container orchestrators like Kubernetes, when they need to shut down a container or pod on a Unix-based operating system.
Linux commands 44.8% of the server operating system market, making the signals it uses to manage processes anything but an edge case.
A signal is a standardized message sent to a running program that triggers a specific action, such as terminating a process or handling an error. This form of Inter Process Communication (IPC) is how operating systems maintain control: when a signal is sent to a target process, the OS waits for atomic instructions to complete, then interrupts execution and handles the signal accordingly.
SIGKILL instructs the process to terminate immediately. It cannot be ignored or blocked. The process is killed, and if it is running threads, those are killed as well. If the SIGKILL signal fails to terminate a process and its threads, this indicates an operating system malfunction.
This is the strongest way to kill a process and can have unintended consequences because it is unknown if the process has completed its cleanup operations. Because it can result in data loss or corruption, it should only be used if there is no other option. In Kubernetes, a SIGTERM command is always sent before SIGKILL, to give containers a chance to shut down gracefully.
For a deeper dive on SIGTERM click here. This is part of a series of articles about Exit Codes.
In Linux and other Unix-like operating systems, there are several operating system signals that can be used to kill a process.
The most common types are:
terminationGracePeriodSeconds
OOMKilled
128 + signal number
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better manage SIGKILL (Signal 9) in Linux containers:
Customize the SIGTERM grace period in Kubernetes to suit your application’s shutdown requirements, ensuring clean termination before SIGKILL is sent.
Implement signal trapping in your containerized applications to handle SIGTERM and perform necessary cleanup.
Configure robust health checks and readiness probes to prevent sending SIGKILL to healthy containers.
Set appropriate resource limits to avoid OOMKilled errors, which result in SIGKILL.
Set appropriate resource limits and rightsize workloads to reduce OOMKilled errors, which often surface as SIGKILL.
For teams trying to balance reliability with spend, Kubernetes cost optimization is closely tied to setting requests and limits based on actual usage, not guesswork.
Regularly analyze logs for SIGTERM and SIGKILL signals to identify patterns and improve system reliability.
At scale, this kind of repetitive investigation is one of the clearest use cases for AI SRE and for an AI SRE agent that reduces MTTR and operational toil.
If you are a Unix/Linux user, here is how to kill a process directly:
ps -aux
kill [ID]
kill -9 [ID]
SIGKILL kills a running process instantly. For simple programs, this can be safe, but most programs are complex and made up of multiple procedures. Even seemingly insignificant programs perform transactional operations that must be cleaned up before completion.
If a program hasn’t completed its cleanup at the time it receives the SIGKILL signal data may be lost or corrupted. You should use SIGKILL only in the following cases:
If you are a Kubernetes user, you can send a SIGKILL to a container by terminating a pod using the kubectl delete command.
kubectl delete
Kubernetes will first send the containers in the pod a SIGTERM signal. By default, Kubernetes gives containers a 30 second grace period, and afterwards sends a SIGKILL to terminate them immediately.
If your team is tracing shutdowns like this across many clusters and workloads, Komodor’s AI SRE Platform is built for visualizing, troubleshooting, and optimizing Kubernetes environments at scale.
When Kubernetes performs a scale-down event or updates pods as part of a Deployment, it terminates containers in three stages:
It is important to realize that while you can capture a SIGTERM in the container’s logs, by definition, you cannot capture a SIGKILL command, because it immediately terminates the container.
Not every pod shutdown follows the same path in Kubernetes. In the normal termination flow, Kubernetes starts with a graceful shutdown and only escalates to SIGKILL if the container is still running after the termination grace period expires. By default, that grace period is 30 seconds, but it can be changed with terminationGracePeriodSeconds.
In a standard pod deletion, rollout, or scale-down event, the kubelet begins shutdown by running any preStop hook first, if one is defined, and then asking the container runtime to send the stop signal to PID 1 in each container. In most cases that is SIGTERM, which gives the application a chance to finish in-flight work and clean up before exiting.
preStop
If the container is still running when the grace period ends, Kubernetes escalates to forcible shutdown. At that point, the container runtime sends SIGKILL to any remaining processes in the pod. This is the path most teams mean when they say a container was “killed by Kubernetes” after failing to stop cleanly.
OOMKilled should be treated as a separate failure mode, not as a normal graceful-then-forced shutdown sequence. In practice, teams often see it as exit code 137, which is commonly associated with SIGKILL, but the troubleshooting flow is different: instead of asking “why didn’t the app exit during termination?”, the better question is “why did the container exceed available memory or hit its memory limit?”
To troubleshoot the difference:
When SIGKILL is tied to memory pressure or badly sized requests and limits, the fix is usually not just incident response but better Kubernetes rightsizing and broader Kubernetes cost optimization practices.
If you are tuning managed clusters, see our guides to EKS cost optimization and GKE cost optimization for cloud-specific ways to reduce waste without trading away reliability.
Just like Kubernetes can send a SIGTERM or SIGKILL signal to shut down a regular container, it can send these signals to an NGINX Ingress Controller pod. However, NGINX handles signals in an unusual way:
And so, in a sense, NGINX treats SIGTERM and SIGINT like SIGKILL. If the controller is processing requests when the signal is received, it will drop the connections, resulting in HTTP server errors. To prevent this, you should always shut down the NGINX Ingress Controller using a QUIT command.
In the standard nginx-ingress-controller image (version 0.24.1), there is a command that can send NGINX the appropriate termination signal. Run this script to shut down NGINX gracefully by sending it a QUIT signal:
nginx-ingress-controller
/usr/local/openresty/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit while pgrep -x nginx; do sleep 1 done
SIGKILL is handled entirely by the operating system (the kernel). When a SIGKILL is sent for a particular process, the kernel scheduler immediately stops giving the process CPU time to execute user space code. When the scheduler makes this decision, if the process has threads executing code on different CPUs or cores, those threads are also stopped.
When a SIGKILL signal is delivered, if a process or thread is executing system calls or I/O operations, the kernel switches the process to “dying” state. The kernel schedules CPU time to allow the dying process to resolve its remaining concerns.
Non-interruptible operations will run until they are complete (but check for “dying” status before running more user-space code). Interruptible operations, when they identify the process is “dying”, terminate prematurely. When all operations are complete, the process is given “dead” status.
When kernel operations complete, the kernel starts cleaning up the process, just like when a program exits normally. A result code higher than 128 is given to the process, indicating that it was killed by a signal. A process killed by SIGKILL has no chance to process the received SIGKILL message.
At this stage the process transitions to “zombie” status, and the parent process is notified using the SIGCHLD signal. Zombie status means that the process has been killed,but the parent process can read the dead process’s exit code using the wait(2) system call. The only resource consumed by a zombie process is a slot in the process table, which stores the process ID, exit, and other “critical statistics” that enable troubleshooting.
wait(2)
If a zombie process remains alive for a few minutes, this probably indicates an issue with the workflow of its parent process.
SIGKILL issues are rarely just about one signal. In real environments, they sit at the intersection of application shutdown behavior, resource tuning, and platform-team response time. That is where AI SRE and a Kubernetes platform become useful, because they connect troubleshooting, optimization, and operational scale instead of treating them as separate problems.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.
Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:
SIGKILL is a Unix/Linux signal that immediately and forcibly terminates a process. Unlike other signals, it cannot be ignored, blocked, or handled by the process itself. The operating system kernel executes it directly. It kills the process and all its threads instantly. Because it bypasses graceful shutdown, it can cause data loss or corruption and should only be used as a last resort.
SIGTERM (Signal 15) is a gentle termination request that a process can catch, handle, or delay, allowing it to clean up before exiting. SIGKILL (Signal 9) is an immediate, unconditional kill that cannot be intercepted or ignored. Kubernetes always sends SIGTERM first, giving containers a 30-second grace period, and only sends SIGKILL if the container hasn’t exited by then.
When Kubernetes terminates a pod, it sends SIGTERM to all containers in the pod first. Containers have a default 30-second grace period to shut down cleanly. If a container is still running after that window expires, the kubelet sends a SIGKILL signal, which forces immediate termination. This grace period is configurable via terminationGracePeriodSeconds in the pod spec.
A container forcefully terminated by SIGKILL exits with exit code 137 (128 + signal number 9). If a container shut down gracefully in response to SIGTERM, it exits with exit code 143 (128 + 15). Checking the exit code via kubectl describe pod is the fastest way to determine whether a container was killed forcefully or terminated on its own terms.
Use SIGKILL only when: a process is stuck in its cleanup routine and won’t respond to SIGTERM; you intentionally want to preserve the process state for forensic investigation; or the process is suspected to be malicious. For normal shutdowns, always attempt SIGTERM first. Skipping SIGTERM and jumping straight to SIGKILL risks data corruption or incomplete transactional operations.
No. Because SIGKILL immediately terminates the container at the kernel level, there is no opportunity for the container to log the event. You can capture SIGTERM in logs and should handle it in your application code, but SIGKILL leaves no application-level trace. To investigate SIGKILL events, check pod-level events using kubectl describe pod or host-level process monitoring.
Share:
Gain instant visibility into your clusters and resolve issues faster.
May 12 · 9:00EST / 15:00 CET · Live & Online
🎯 8+ Sessions 🎙️ 10+ Speakers ⚡ 100% Free
By registering you agree to our Privacy Policy. No spam. Unsubscribe anytime.
Check your inbox for a confirmation. We'll send session links closer to May 12.