In Kubernetes, you use probes to configure health checks that help determine each pod’s state. Distributed microservices-based systems must automatically detect unhealthy applications, reroute requests to other systems, and restore broken components. Health checks help address this challenge to ensure reliability.
Kubernetes observes a pod’s lifecycle by default and routes traffic to the pod when containers move from a ‘pending’ state to ‘succeeded’. It locates application crashes and restarts the unhealthy pod to recover. This basic setup is not enough to ensure health in Kubernetes, especially when an application in the pod is configured with daemon process managers.
When Kubernetes determines that a pod is healthy and ready for requests once all containers start, the application can receive traffic before it is ready. It can occur when an application needs to initialize some state, load data before handling application logic, or make database connections.
This issue creates a gap between when the application is ready and when Kubernetes thinks it is ready. As a result, when the deployment starts to scale, unready applications might receive traffic and send back 500 errors.
Kubernetes health checks use probes that enable the kubelet, an agent running on each node, to validate the health and readiness of a container. Probes determine when a container is ready to accept traffic and when it should be restarted.
You can perform health checks via HTTP(S), TCP, command probes, and gRPC. We’ll show how this works with examples.
This is part of an extensive series of guides about kubernetes troubleshooting.
What are the Three Types of Kubernetes Probes?
Here are the three probes Kubernetes offers:
- Liveness probe — determines if a container is operating. If it does not operate, the kubelet shuts down and restarts the container.
- Readiness probe — determines if the application that runs in a container is ready to accept requests. If it is ready, Kubernetes allows matching services to send traffic to the pod. If it is not ready, the endpoints controller removes this pod from all matching services.
- Startup probe — determines if the application that runs in a container has started. If it has started, Kubernetes allows other probes to start functioning. Otherwise, the kubelet shuts down and restarts the container.
A container that does not include any of these probes always succeeds.
Types of Kubernetes Health Checks
You can create health check probes by issuing requests against a container. Here is how to implement Kubernetes probes:
HTTP Requests
An HTTP request is a mechanism that lets you create a liveness probe. You can expose an HTTP endpoint by implementing any lightweight HTTP server within the container. A probe performs an HTTP GET request against this endpoint at the container’s IP to check if the service is alive. If the endpoint returns a success code, kubelet considers the container alive and healthy. If not, the kubelet terminates and restarts this container.
Commands
You can configure a command probe to get kubelet to execute the cat /tmp/healthy command in a certain container. If the command succeeds, kubelet considers the container alive and healthy. If not, it shuts down the container and restarts it.
TCP Connections
A TCP socket probe tells Kubernetes to open a TCP connection on a specified port on the container. If it succeeds, Kubernetes considers the container healthy. You should use TCP probes for gRPC and FTP services that already include the TCP protocol infrastructure.
gRPC
You can use gRPC-health-probe in your container to enable the gRPC health check if you are running a Kubernetes version 1.23 or less. After Kubernetes version 1.23, gRPC health checks are supported by default natively.
Health Check Internals
In a health check, you define the endpoint, interval, timeout, and grace period:
- Endpoint/CMD: This is the URL/CMD that you want to call or execute to verify the health check. An HTTP 200 status code is considered healthy. A cmds 0 exit status code is successful. In the case of a TCP connection, if the application is accepting the connection, then it is live.
- Interval: This is the time period between two health checks.
- Timeout: This is the amount of time the entity performing a health check will wait before determining that it’s a failure.
- Grace period: Once the application is running, this is how long you have before the health check will start.
How to Configure Kubernetes Health Checks with Examples
Using HTTP(S) Protocol for Kubernetes Health Check
To configure an HTTP health check in Kubernetes, you need to define a liveness or readiness probe that makes an HTTP request to a specific endpoint in your application. Here’s an example of how to set up an HTTP liveness probe:
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox:1.28
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello World
restartPolicy: OnFailure
In this configuration:
path
specifies the HTTP endpoint to check.port
specifies the port on which the application is listening.initialDelaySeconds
is the number of seconds after the container starts before the probe is initiated.periodSeconds
specifies the interval between consecutive probes.
The probe sends an HTTP GET request to the /healthz
endpoint. If it returns a 2xx status code, the container is considered healthy; otherwise, it is restarted.
To manually check if the pod is working, you can use a command like curl <pod-ip-address>:8080/healthz
Using TCP Protocol for Kubernetes Health Check
TCP health checks are useful for applications that rely on TCP connections, such as databases or gRPC services. Here’s an example of a TCP liveness probe on a MySQL database. Before running this health check, ensure MySQL is installed and running on the pod.
apiVersion: v1
kind: Pod
metadata:
labels:
app: myapp
name: myapp-pod
spec:
containers:
- name: myapp-container
image: myapp:latest
livenessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 5
periodSeconds: 10
In this configuration:
tcpSocket
specifies the port to which Kubernetes will attempt to establish a TCP connection.- If the connection is successful, the container is considered healthy.
initialDelaySeconds
and periodSeconds function the same as in the HTTP probe.
Using Command Probes for Kubernetes Health Check
Command probes execute a command inside the container to determine its health. Here’s an example of a command liveness probe:
apiVersion: v1
kind: Pod
metadata:
labels:
app: myapp
name: myapp-pod
spec:
containers:
- name: myapp-container
image: myapp:latest
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 10
In this configuration:
- The
exec
field specifies the command to run inside the container. - If the command returns a zero exit code, the container is considered healthy.
initialDelaySeconds
andperiodSeconds
are used to control the timing of the probe.
Using gRPC Protocol for Kubernetes Health Check
Kubernetes supports gRPC health checks natively from version 1.24 onward. For earlier versions, you can use a helper tool like grpc-health-probe.
To configure a gRPC health check, first add the grpc-health-probe
binary to your container image. Then, configure the gRPC health check in your pod definition:
apiVersion: v1
kind: Pod
metadata:
labels:
app: myapp
name: myapp-pod
spec:
containers:
- name: myapp-container
image: myapp:latest
livenessProbe:
exec:
command:
- /grpc-health-probe
- -addr=:50051
initialDelaySeconds: 5
periodSeconds: 10
In this configuration:
- The
command
field specifies thegrpc-health-probe
binary and the gRPC server address. - The probe checks the gRPC health endpoint to determine if the service is healthy.
How Do Health Checks Enable Faster Troubleshooting?
To understand how health checks enable faster troubleshooting, consider the following example. Let’s say there is an application deployment and a service in front of the deployment to balance the traffic. If you can’t see the application coming up, you will start troubleshooting by checking the health check. If you have set up the proper health check, the application pods will get killed and you can put an alert on the pod status, which will tell you that the application pods are failing the health check.
Sometimes, your application passes a health check and the application pods are up, but the application is still not receiving any traffic. This can happen if you have implemented a readiness check, but it is not successful. If the readiness check is failing, Kubernetes will not add your pods to the service endpoint and your service will not have any pods to send traffic to.
There are a few commands that can help you debug issues more quickly.
Use the below command to see if your containers are up and running. This command will show how many pods are up in your deployment.
kubectl get deployment deployment_name -n dep_namespace
If you see that the pods are not up, look at the deployment descriptions or events with the following command. This will show how many pods are up in the replica set.
kubectl describe deployment deployment_name -n dep_namespace
Then, you can take the replica set name and check what is happening in the ReplicaSet events. This will show if there is any issue bringing up your pods.
kubectl describe replica set replicaset_name -n dep_namespace
Lastly, you can describe your pods to see if they are failing the health checks.
kubectl describe pod pod_name -n dep_namespace
You can also try looking at the pod’s logs to identify why it failed the health checks, using the below command.
kubectl logs -f pod_name -n dep_namespace
In addition, checking the events in deployments, replica sets, pods, and pod logs will tell you a lot about any issues. If you are running StatefulSet, you can use the same commands for troubleshooting.
You can also check if the endpoint object in the service has your pod IPs or not.
kubectl describe service service_name -n dep_namespace
If you have a load-balancer service, you will be able to see your instances attached to the load balancer. If your health check is failing, the load balancer will remove the instances, and traffic won’t be forwarded to the instances.
Kubernetes events are very important when you are troubleshooting. Most of the time, you will be able to find the issue in one of the Kubernetes events. You can easily see events related to any Kubernetes object using the Kubernetes describe command.
Avoiding Common Health Check Pitfalls
There are several common pitfalls you may run into when running Kubernetes health checks:
- HTTP applications should not have TCP health checks, as they will mark your application as healthy on port binding, even if your actual HTTP service is not running. Write a proper health endpoint, where you should check the application’s dependencies and then make it live.
- Always put readiness checks in your servers. This ensures that your application does not prematurely receive traffic it cannot serve.
- Avoid TCP health checks for databases like Redis. Redis can be live once the Redis server is running, but this doesn’t ensure that Redis has joined the cluster or started as a master slave or its final configurations. In these cases, use the liveness command interface to make sure Redis or the databases are in your desired state.
- Avoid verifying dependencies in your health check that are not necessary for the application to be running. Also, avoid health check loops. For example, application X needs applications Y and Z to be online, and Z needs X to be online. In this scenario, if one application goes down, they will all go down, as they are dependent on each other.
Conclusion
Health checks are clearly important for every application. The good news is that they are easy to implement and, if done properly, enable you to troubleshoot issues faster. If you log exactly why a health check failed, you can pinpoint and solve issues easily.
While these tips can (and will) help minimize the chances of things breaking down, eventually, something else can go wrong – simply because it can.
This is where Komodor comes in – Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.
Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.
By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.