What is SIGSEGV
SIGSEGV, also known as a segmentation violation or segmentation fault, is a signal used by Unix-based operating systems (such as Linux). It indicates an attempt by a program to write or read outside its allocated memory—either because of a programming error, a software or hardware compatibility issue, or a malicious attack, such as buffer overflow.
SIGSEGV is indicated by the following codes:
- In Unix/Linux, SIGSEGV is operating system signal 11
- In Docker containers, when a Docker container terminates due to a SIGSEV error, it throws exit code 139
The default action for SIGSEGV is abnormal termination of the process. In addition, the following may take place:
- A core file is typically generated to enable debugging
- SIGSEGV signals may logged in more detail for troubleshooting and security purposes
- The operating system may perform platform-specific operations
- The operating system may allow the process itself to handle the segmentation violation
SIGSEGV is a common cause for container termination in Kubernetes. However, Kubernetes does not trigger SIGSEGV directly. To resolve the issue, you will need to debug the problematic container or the underlying host.
SIGSEGV (exit code 139) vs SIGABRT (exit code 134)
SIGSEGV and SIGABRT are two Unix signals that can cause a process to terminate.
SIGSEGV is triggered by the operating system, which detects that a process is carrying out a memory violation, and may terminate it as a result.
SIGABRT (signal abort) is a signal triggered by a process itself. It abnormally terminates the process, closes and flushes open streams. Once it is triggered, it cannot be blocked by the process (similar to SIGKILL, but different in that SIGKILL is triggered by the operating system).
Before the SIGABRT signal is sent, the process may:
- Call the
abort()function in the
libclibrary, which unlocks the SIGABRT signal. Then the process can abort itself by triggering SIGABRT
- Call the
assert()macro, which is used in debugging, and aborts the program using SIGABRT if the assertion is false.
Exit codes 139 and 134 are parallel to SIGSEGV and SIGABRT in Docker containers:
- Docker exit code 139—means the container received a SIGSEGV by the underlying operating system due to a memory violation
- Docker exit code 134—means the container triggered a SIGABRT and was abnormally terminated
What Causes SIGSEGV?
Modern general-purpose computing systems include memory management units (MMUs). An MMU enables memory protection in operating systems like Linux—preventing different processes from accessing or modifying each other’s memory, except via a strictly controlled API. This simplifies troubleshooting and makes processes more resilient, because they are carefully isolated from each other.
A SIGSEGV signal or segmentation error occurs when a process attempts to use a memory address that was not assigned to it by the MMU. This can happen for three common reasons:
- Coding error—segmentation violations can occur if a process is not initialized properly, or if it tries to access memory through a pointer to previously freed memory. This will result in a segmentation violation in a specific process or binary file under specific circumstances.
- Incompatibility between binaries and libraries—if a process runs a binary file that is not compatible with a shared library, it can result in segmentation violations. For example, if a developer updates a library, changing its binary interface, but does not update the version number, an older binary may be loaded against the newer version. This may result in the older binary trying to access inappropriate memory addresses.
- Hardware incompatibility or misconfiguration—if segmentation violations occur frequently across multiple libraries, with no repeating pattern, this may indicate a problem with the memory subsystems on the machine or improper low-level system configuration settings.
Handling SIGSEGV Errors
On a Unix-based operating system, by default, a SIGSEGV signal will result in abnormal termination of the violating process.
Additional actions performed by the operating system
In addition to terminating the process, the operating system may generate core files to assist with debugging, and can also perform other platform-dependent operations. For
example, on Linux, you can use the grsecurity utility to log SIGSEGV signals in detail, to monitor for related security risks such as buffer overflow.
Allowing the process to handle SIGSEGV
On Linux and Windows, the operating system allows processes to handle their response to segmentation violations. For example, the program can collect a stack trace with information like processor register values and the memory addresses that were involved in the segmentation fault.
An example of this is segvcatch, a C++ library that supports multiple operating systems, and is able to convert segmentation faults and other hardware related exceptions to software language exceptions. This makes it possible to handle “hard” errors like segmentation violations with simple try/catch code. This makes it possible for software to identify a segmentation violation and correct it during program execution.
When troubleshooting segmentation errors, or testing programs to avoid these errors, there may be a need to intentionally cause a segmentation violation to investigate its impact. Most operating systems make it possible to handle SIGSEGV in such a way that they will allow the program to run even after the segmentation error occurs, to allow for investigation and logging.
Troubleshooting Common Segmentation Faults in Kubernetes
SIGSEGV faults are highly relevant for Kubernetes users and administrators. It is fairly common for a container to fail due to a segmentation violation.
However, unlike other signals such as SIGTERM and SIGKILL, Kubernetes does not trigger a SIGSEGV signal directly. Rather, the host machine on a Kubernetes node can trigger SIGSEGV when a container is caught performing a memory violation. The container then terminates, Kubernetes detects this, and may attempt to restart it depending on the pod configuration.
When a Docker container is terminated by a SIGSEGV signal, it throws exit code 139. This can indicate:
- An issue with application code in one of the libraries running on the container
- An incompatibility between different libraries running on the container
- An incompatibility between those libraries and hardware on the host
- Issues with the host’s memory management systems or a memory misconfiguration
To debug and resolve a SIGSEGV issue on a container, follow these steps:
- Get root access to the host machine, and review the logs to see additional information about the buggy container. A SIGSEGV error looks like the following in kubelet logs:
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1bdaed0]
- Try to identify in which layer of the container image the error occurs—it could be in your specific application code, or lower down in the base image of the container.
docker pull [image-id]to pull the image for the container terminated by SIGSEGV.
- Make sure that you have debugging tools (e.g. curl or vim) installed, or add them.
kubectlto execute into the container. See if you can replicate the SIGSEGV error to confirm which library is causing the issue.
- If you have identified the library or libraries causing the memory violation, try to modify your image to fix the library causing the memory violation, or replace it with another library. Very often, updating a library to a newer version, or a version that is compatible with the environment on the host, will resolve the issue.
- If you cannot identify a library that is consistently causing the error, the problem may be on the host. Check for problems with the host’s memory configuration or memory hardware.
The process above can help you resolve straightforward SIGSEGV errors, but in many cases troubleshooting can become very complex and require non-linear investigation involving multiple components. That’s exactly why we built Komodor – to troubleshoot memory errors and other complex Kubernetes issues before they get out of hand.
Troubleshooting Kubernetes Container Termination with Komodor
As a Kubernetes administrator or user, pods or containers terminating unexpectedly can be a pain, and can result in severe production issues. Container termination can be a result of multiple issues in different components and can be difficult to diagnose. The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective and time-consuming.
Some best practices can help minimize the chances of SIGSEGV or SIGABRT signals affecting your applications, but eventually something will go wrong—simply because it can.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.
Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:
- Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
- In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
- Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
- Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Workflows automatically detects K8s issues, responding with a sequence of checks to quickly pinpoint the root cause and recommendation for the fix!...
These are the KubeCon 2021 talks you wouldn't want to miss!...
Metadata is essential for grouping resources, redirecting requests and managing deployments. Learn the best practices of using labels and annotations....