What is SIGKILL (signal 9)
SIGKILL is a type of communication, known as a signal, used in Unix or Unix-like operating systems like Linux to immediately terminate a process. It is used by Linux operators, and also by container orchestrators like Kubernetes, when they need to shut down a container or pod on a Unix-based operating system.
A signal is a standardized message sent to a running program that triggers a specific action (such as terminating or handling an error). It is a type of Inter Process Communication (IPC). When an operating system sends a signal to a target process, it waits for atomic instructions to complete, and then interrupts the execution of the process, and handles the signal.
SIGKILL instructs the process to terminate immediately. It cannot be ignored or blocked. The process is killed, and if it is running threads, those are killed as well. If the SIGKILL signal fails to terminate a process and its threads, this indicates an operating system malfunction.
This is the strongest way to kill a process and can have unintended consequences because it is unknown if the process has completed its cleanup operations. Because it can result in data loss or corruption, it should only be used if there is no other option. In Kubernetes, a SIGTERM command is always sent before SIGKILL, to give containers a chance to shut down gracefully.
What are SIGTERM (Signal 15) and SIGKILL (Signal 9)? Options for Killing a Process in Linux
In Linux and other Unix-like operating systems, there are several operating system signals that can be used to kill a process.
The most common types are:
- SIGKILL (also known as Unix signal 15)—kills the process abruptly, producing a fatal error. It is always effective at terminating the process, but can have unintended consequences.
- SIGTERM (also known as Unix signal 9)—tries to kill the process, but can be blocked or handled in various ways. It is a more gentle way to kill a process.
Using the Kill -9 Command
If you are a Unix/Linux user, here is how to kill a process directly:
- List currently running processes. The command
ps -auxshows a detailed list of all running processes belonging to all users and system daemons.
- Identify the process ID of the process you need to kill.
- Do one of the following:
kill [ID]command to try killing the process using the SIGTERM signal
kill -9 [ID]command to kill the process immediately using the SIGKILL signal
When Should you Use SIGKILL as a Unix/Linux User?
SIGKILL kills a running process instantly. For simple programs, this can be safe, but most programs are complex and made up of multiple procedures. Even seemingly insignificant programs perform transactional operations that must be cleaned up before completion.
If a program hasn’t completed its cleanup at the time it receives the SIGKILL signal data may be lost or corrupted. You should use SIGKILL only in the following cases:
- A process has a bug or issue during its cleanup process
- You don’t want the process to clean itself up, to retain data for troubleshooting or forensic investigation
- The process is suspicious or known to be malicious
How Can You Send SIGKILL to a Container in Kubernetes?
If you are a Kubernetes user, you can send a SIGKILL to a container by terminating a pod using the
kubectl delete command.
Kubernetes will first send the containers in the pod a SIGTERM signal. By default, Kubernetes gives containers a 30 second grace period, and afterwards sends a SIGKILL to terminate them immediately.
The Kubernetes Pod Termination Process and SIGKILL
When Kubernetes performs a scale-down event or updates pods as part of a Deployment, it terminates containers in three stages:
- The kubelet sends a SIGTERM signal to the container. You can handle this signal to gracefully terminate applications running on the container, and perform customized cleanup tasks.
- By default, Kubernetes gives containers a grace period of 30 seconds to shut down. This value is customizable.
- If the container does not exit and the grace period ends, the kubelet sends a SIGKILL signal, which causes the container to shut down immediately.
It is important to realize that while you can capture a SIGTERM in the container’s logs, by definition, you cannot capture a SIGKILL command, because it immediately terminates the container.
Which Kubernetes Errors are Related to SIGKILL?
Any Kubernetes error that results in a pod shutting down will result in a SIGTERM signal sent to containers within the pod, and subsequently SIGKILL.
In the case of OOMKilled error – a container or pod killed because they exceeded the allotted memory on the host – the kubelet immediately sends a SIGKILL signal to the container.
To troubleshoot a container that was terminated by Kubernetes:
- At the Kubernetes level, you will see the Kubernetes error by running kubectl describe pod
- At the container level, you will see the exit code 143 if the container terminated gracefully with SIGTERM, or exit code 137if it was forcefully terminated using SIGKILL
- At the host level, you will see the SIGTERM and SIGKILL signals sent to the container processes.
How Does SIGKILL Impact NGINX Ingress Controllers?
Just like Kubernetes can send a SIGTERM or SIGKILL signal to shut down a regular container, it can send these signals to an NGINX Ingress Controller pod. However, NGINX handles signals in an unusual way:
- When receiving SIGTERM, SIGINT – NGINX performs fast shutdown. The master process instructs the worker process to exit, waits only 1 second, and then sends it a SIGKILL signal.
- When receiving QUIT – NGINX performs graceful shutdown. It closes the listening port to avoid receiving more requests, closes idle connection, and only exits after all working processes exit.
And so, in a sense, NGINX treats SIGTERM and SIGINT like SIGKILL. If the controller is processing requests when the signal is received, it will drop the connections, resulting in HTTP server errors. To prevent this, you should always shut down the NGINX Ingress Controller using a QUIT command.
Shutting down the NGINX Ingress Controller with QUIT instead of SIGTERM
In the standard
nginx-ingress-controller image (version 0.24.1), there is a command that can send NGINX the appropriate termination signal. Run this script to shut down NGINX gracefully by sending it a QUIT signal:
/usr/local/openresty/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit while pgrep -x nginx; do sleep 1 done
Under the Hood: How the SIGKILL Signal Works
SIGKILL is handled entirely by the operating system (the kernel). When a SIGKILL is sent for a particular process, the kernel scheduler immediately stops giving the process CPU time to execute user space code. When the scheduler makes this decision, if the process has threads executing code on different CPUs or cores, those threads are also stopped.
What happens when a process is killed while executing kernel code?
When a SIGKILL signal is delivered, if a process or thread is executing system calls or I/O operations, the kernel switches the process to “dying” state. The kernel schedules CPU time to allow the dying process to resolve its remaining concerns.
Non-interruptible operations will run until they are complete (but check for “dying” status before running more user-space code). Interruptible operations, when they identify the process is “dying”, terminate prematurely. When all operations are complete, the process is given “dead” status.
What happens when a process is marked “dead”?
When kernel operations complete, the kernel starts cleaning up the process, just like when a program exits normally. A result code higher than 128 is given to the process, indicating that it was killed by a signal. A process killed by SIGKILL has no chance to process the received SIGKILL message.
At this stage the process transitions to “zombie” status, and the parent process is notified using the SIGCHLD signal. Zombie status means that the process has been killed,
but the parent process can read the dead process’s exit code using the
wait(2) system call. The only resource consumed by a zombie process is a slot in the process table, which stores the process ID, exit, and other “critical statistics” that enable troubleshooting.
If a zombie process remains alive for a few minutes, this probably indicates an issue with the workflow of its parent process.
Troubleshooting Kubernetes Pod Termination with Komodor
As a Kubernetes administrator or user, pods or containers terminating unexpectedly can be a pain and can result in severe production issues. The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective and time-consuming.
Some best practices can help minimize the chances of SIGTERM or SIGKILL signals affecting your applications, but eventually, something will go wrong—simply because it can.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.
Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:
- Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
- In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
- Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
- Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
In this blog post, we will discuss a new paradigm for making Kubernetes easier to troubleshoot: the shift-left approach....
We are expanding ValidKube’s capabilities with the inclusion of Polaris - a cool OS project by our good friends at Fairwinds!...