What are 5xx Errors
5xx errors are returned as part of the Hypertext Transfer Protocol (HTTP), which is the basis for much of the communication on the Internet and private networks. A 5xx error means “an error number starting with 5”, such as 500 or 503. 5xx errors are server errors—meaning the server encountered an issue and is not able to serve the client’s request.
5xx errors can be encountered when:
- A user browses a website and the web server is experiencing an error
- A software program accesses an API and the API server returns an error
- A component of a distributed system like Kubernetes fails to server requests by other components
The most common 5xx errors are:
- 500—Internal Server Error
- 501 – Not Implemented
- 502 – Bad Gateway
- 503 – Service Unavailable
- 504 – Gateway Timeout
- 509 –Bandwidth Limit Exceeded
- 511 – Network Authentication Required
In most cases, the client cannot do anything to resolve a 5xx error. Typically, the error indicates that the server has a software, hardware, or configuration problem that must be remediated.
This is part of an extensive series of guides about Observability.
What are HTTP Status Codes
HTTP is a client-server protocol—the client, known as a user-agent, connects to a server and makes requests. The server receives each request, handles it, and returns a response. It is common to have intermediaries known as proxies between the client and server, which relay requests and responses to their destination.
An HTTP request looks like this:
- The method indicates what operation the client wants to perform on the server. For example, GET means the client wants to read information.
- The version indicates which HTTP version is used by the client.
An HTTP response looks like this:
- The version indicates which HTTP version is implemented by the server.
- The status code is the response code. If this is a number starting with 5xx, the response indicates a server error.
- The status message is a verbal description of the error, which the client can display to the end-user.
HTTP supports the following groups of error codes:
- 1xx informational response – request was received and server continues working.
- 2xx successful – request was received and successfully performed.
- 3xx redirection – the request was redirected to another URL.
- 4xx client error – the request was incorrect or invalid and cannot be fulfilled.
- 5xx server error – problem on the server preventing it from fulfilling the request.
Tips from the expert
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better handle 5xx server errors:
Implement robust logging
Ensure detailed logging is in place to capture the context of server errors.
Use APM tools
Employ application performance monitoring tools to diagnose and resolve issues causing 5xx errors.
Set up error alerting
Configure alerts to notify your team immediately when 5xx errors occur.
Analyze traffic patterns
Study traffic patterns to identify and mitigate spikes that may lead to 5xx errors.
Deploy in blue-green or canary
Use blue-green or canary deployments to minimize the impact of changes causing 5xx errors.
Why You Should Care About 5xx Errors?
Significance of 5xx Errors for Web Admins
For a website owner or developer, a 5xx error indicates that a website user attempted to access a URL and could not view it. In addition, if search engine crawlers access a website and receive a 5xx error, they might abandon the request and remove the URL from the search index, which can have severe consequences for a website’s traffic.
Significance of 5xx Errors for API Developers
A 5xx error returned by an API indicates that the API is down, undergoing maintenance, or is experiencing another issue. When an API endpoint experiences a problem, returning a 5xx error code is good, expected behavior, and can help clients understand what is happening and handle the error on the client side.
In microservices architectures, it is generally advisable to make services resilient to errors in upstream services, meaning that a service can continue functioning even if an API it relies on returns an error.
Significance of 5xx Errors for Kubernetes Users
In Kubernetes, a 5xx error can indicate:
- A node-level terminating condition—the node is not functioning or unable to respond to a request.
- A pod-level terminating condition—the pod may have been terminated (SIGKILL), or is about to be terminated and is currently in the termination grace period (SIGTERM).
Learn more in our detailed guide to Kubernetes troubleshooting
Understanding Different 5xx Server Error Codes
500—Internal Server Error
This error indicates that the server experienced an unexpected condition that was not specifically handled. Typically, this means an application request could not be fulfilled because the application was configured incorrectly.
501—Not Implemented
This error indicates the server does not support the functionality requested by the client, or does not recognize the requested method. This could indicate that the server might respect this type of response in the future.
502—Bad Gateway
This error indicates that the server is a proxy or gateway, and received an invalid response from an upstream server. In other words, the proxy is unable to relay the request to the destination server.
Related content: Read our guide to Kubernetes 502 bad gateway.
503—Service Unavailable
This error indicates that the server is temporarily incapable of handling the request, for example because it is undergoing maintenance or is experiencing excessive loads.
The server may indicate the expected length of the delay in the Retry-After header. If there is no value in the Retry-After header, this response is functionally equivalent to response code 500.
Learn more in our detailed guide to Kubernetes service 503.
504—Gateway Timeout
This error indicates that a server upstream is not responding to the proxy in a timely manner. This does not indicate a problem in an upstream server, only a delay in receiving a response, which might be due to a connectivity or latency issue.
505—HTTP Version Not Supported
This error indicates that the web server does not support the major HTTP version that was used by the request. The response contains an entity stating why the version is not supported, and providing other protocol versions that the server does support.
506—Variant Also Negotiates
This error occurs when using Transparent Content Negotiation—a protocol that enables clients to retrieve one of several variants of a given resource. A 506 error code indicates a server configuration error, where the chosen variant starts a content negotiation, meaning that it is not appropriate as a negotiation endpoint.
507—Insufficient Storage
This error indicates that the client request cannot be executed because the server is not able to store a representation needed to finalize the request. This is a temporary condition, like a 503 error. It is commonly related to RAM or disk space limitations on the server.
508—Loop Detected
This error occurs in the context of the WebDAV protocol. It indicates that the server aborted a client operation because it detected an infinite loop. This can happen when a client performs a WebDav request with Depth: Infinity.
509—Bandwidth Limit Exceeded
This error indicates that the request exceeded the bandwidth limit defined by the server’s administrator. The server configuration defines an interval for bandwidth checks, and only after this interval, the limit is reset. Client requests will continue to fail until the bandwidth limit is reset in the next cycle.
510—Not Extended
This error indicates that the access policy for the requested resource was not met by the client. The server will provide information the client needs to extend their access to the resource.
511—Network Authentication Required
This error indicates that the resource accessed requires authorization. The response should provide a link to a resource that allows users to authenticate themselves.
What Causes 5xx Server Errors
5xx errors can occur at multiple layers of the server environment. In a web application, these layers include:
- Content distribution network (CDN)
- Web server (such as Apache)
- Web development framework (such as PHP)
- Content management system (such as WordPress)
- Plugins running within the CMS
In a Kubernetes application, these layers include:
- Load balancer or service mesh
- Services
- Pods
- Containers
- Applications running in containers
Here are a few common reasons for 5xx server errors, regardless of the type of application:
- Code bugs—the application serving the request is experiencing an error as a result of an internal bug.
- Updates—the application has been updated and the new version is not able to serve the request correctly.
- Incompatibilities—the application is not compatible with other software on the host or with hardware on the host.
- Operating system issues—operating system crashed, corrupted, or misconfigured.
- Hardware issues—hardware failure or misconfiguration on the host.
- Back-end failure—a back-end component the application relies on has failed or is not responding.
- Insufficient resources—the host may not have sufficient resources to serve the current application load.
- Insufficient bandwidth—the host’s network bandwidth may be exhausted by the current application load.
Resolving 5xx Errors
Debugging Server-Side Scripts in Web Applications
5xx server errors are often caused by customer scripts you are running on a web server. Here are a few things you should check if your web application returns a 5xx error:
- Check server permissions—your script may not have permission to perform the necessary operations on a file or folder. For example, the script may need to write files but may not have write permission to its folder.
- Check for script timeouts—the script may have timed out. Coding errors or other issues might cause a script to use excessive resources or get stuck in a loop.
- Check for server timeouts—in some cases the script itself is working properly, but the server is not working properly—for example, restarting or disconnected from the network.
- Check for .htaccess error—on an Apache web server, the .htaccess file defines the configuration of the web server on a certain directory. An encoding error in the .htaccess file can result in 500 errors.
- Check for script-specific errors—turn on error logging in your web framework to identify what is wrong with the custom script. There may be errors returned by the runtime environment or logged by the script itself.
- Check for server-specific errors—consult with the hosting provider or server administrator to see if they are familiar with an error caused by the specific server or a component interacting with the server.
Debugging 5xx Server Errors in NGINX
The NGINX documentation recommends an interesting technique to debug 5xx errors in an NGINX server when it is used as a reverse proxy or load balancer—setting up a special debug server and routing all error requests to that server. The debug server is a replica of the production server, so it should return the same errors.
There are a few benefits to this approach:
- The debug server only receives error requests, so its logs will contain only errors, making investigation and resolution easy.
- The debug server does not need high performance, so it is possible to enable all logging and diagnostic tools, including stack trace and application profiling.
- You can use the max_conns parameter to limit the number of requests directed to the debug server, to avoid overwhelming it if there is a sudden spike of errors.
- It is easy to identify errors that are due to resource issues on the production server—if a request returns an error on the production server but works fine on the debug server.
You can use the following configuration to set up an application server and route errors to a debug server:
upstream app_server { server 172.16.0.1; server 172.16.0.2; server 172.16.0.3; } upstream debug_server { server 172.16.0.9 max_conns=20; } server { listen *:80; location / { proxy_pass http://app_server; proxy_intercept_errors on; error_page 500 503 504 @debug; } location @debug { proxy_pass http://debug_server; access_log /var/log/nginx/access_debug_server.log detailed; error_log /var/log/nginx/error_debug_server.log; } }
Debugging 5xx Errors in Kubernetes Nodes
There are two common errors for 5xx errors returned by a Kubernetes node—node-level termination and pod-level termination.
Node-level termination events
Nodes can return 5xx errors if an automated mechanism, or a human administrator, makes changes to the nodes without first draining them of Kubernetes workloads. For example, the following actions can result in a 5xx error on a node:
- An administrator performing maintenance on a node
- An administrator restarting a node
- A cloud service scaling down and terminating a node
- A process on the node attempts to restart or shut it down
To diagnose and resolve a 5xx error on a node:
- Identify if the node as shut down or modified by a staff member or an external process.
- Check if the node is running, and if not, restart it and ensure it rejoins the cluster.
- Log into the nodes and review logs to see what caused the node to fail or misbehave.
Learn more in our guide to Kubernetes nodes
Pod level termination events
When a pod is terminated due to eviction from a node, the following process occurs:
- The Kubernetes control plane instructs the kubelet to terminate the pod.
- The kubelet instructs the operating system to send a SIGTERM (15) signal to all containers running in the pod.
- There is a configurable grace period, during which applications have the opportunity to gracefully shut down and close existing connections.
- The operating system sends SIGKILL to kill any remaining containers in the pod.
5xx errors can occur in between steps 3 and 4. When applications are shutting down, they might fail to serve certain requests and return errors, which will typically be 502 (bad gateway) or 504 (gateway timeout).
Resolving Kubernetes Server Errors with Komodor
5xx server errors indicate a problem with a Kubernetes node or software running within its containers. To troubleshoot a 5xx error, you must be able to contextualize a server error with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production. The major challenge is correlating 5xx errors with other events happening in the underlying infrastructure.
Komodor can help with our new ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can rapidly:
- See service-to-node associations
- Correlate service and node health issues
- Gain visibility over node capacity allocations, restrictions, and limitations
- Identify “noisy neighbors” that use up cluster resources
- Keep track of changes in managed clusters
- Get fast access to historical node-level event data
Beyond node error remediations, Komodor can help troubleshoot a variety of Kubernetes errors and issues, acting as a single source of truth (SSOT) for all of your K8s troubleshooting needs. Komodor provides:
- Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
- In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
- Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
- Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
See Additional Guides on Key Observability Topics
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of observability.
Exit Codes
Authored by Komodor
- Exit Codes in Containers & Kubernetes | Complete Guide
- SIGKILL: Fast Termination of Linux Containers | Signal 9
- How to Fix OOMKilled Kubernetes Error (Exit Code 137)
Distributed Tracing
Authored by Lumigo
- What is Distributed Tracing?
- Distributed Tracing in Microservices: Basics & 4 Tools to Know
- Distributed Tracing Tools
Serverless Monitoring
Authored by Lumigo