5 Ways to Make Kubernetes Auditing an Effective Habit 

Kubernetes has several components that produce logs and events containing information on everything that has happened in a Kubernetes cluster. Keeping track of all this data becomes extremely challenging when you run Kubernetes at a very large scale. 

With so many components generating logs, organizations need a centralized place to see it all. But this is only half your problem. You also need to correlate logs coming from different components to draw the right conclusions and take effective actions.

Auditing thus takes on even greater importance in terms of keeping an eye on your system by:

  • Tracking production issues and preventing recurrence 
  • Making sure you’re not lagging on security and end up exposing your data to malicious actors
  • Achieving certifications like PCI DSS or HIPPA

This post will explore auditing, what Kubernetes events or logs you should be watching, and a few steps to take for establishing an effective auditing practice.

Since traditional VMs have a limited number of components, you can monitor them easily. Simply using a less or tail command will allow you to watch a few files like syslog, lastlog, and dpkg logs.

In Kubernetes, however, there are multiple components, with each component possibly running on different machines and producing its own logs and events. 

So how do you achieve proper auditing in Kubernetes with so many things to keep an eye on? First, let’s discuss what effective auditing entails. 

Getting auditing right allows you to answer the what, when, who, and where surrounding an event:

  • What exactly happened and do any logs explain the given behavior? 
  • When did the event occur? The exact timeline is crucial for identifying the cause.
  • Who triggered the event? Every event has a trigger; your job is to find it. 
  • Which level was involved? Knowing which level was compromised is crucial for reinforcing the principle of least privilege.
  • Which components were involved?
  • Where did it start?

Your auditing will only be efficient when you know what to monitor for complete visibility. Let’s review everything your auditing should cover in a Kubernetes ecosystem.

API Server Logs

The API server is the most important part of Kubernetes since all the other components talk to it to get the information they need to perform any action. 

Monitoring the API server’s logs will help you discover any unwanted activity. Unfortunately, this can be tricky in managed deployments like AWS EKS, Azure Kubernetes, and GKE. If you have your own deployment, things are a bit easier, as you can review API server logs at defined locations. 

When dealing with Kubernetes objects, each object typically has one of multiple controllers working on it. When these controllers perform an action, they emit events that are visible in Kubernetes APIs. You can retrieve these events using the following kubectl commands: 

kubectl get events (to retrieve all events)
kubectl get events -n namespace (to retrieve all events for a namespace)
kubectl get events --watch (to stream events in real time) 
kubectl get events --field-selector involvedObject.name=my-pod --field-selector involvedObject.kind=Pod (to retrieve events for a specific pod)

You can also view events at the end of the details when you run describe commands on any object. 

The most basic entity deployed in Kubernetes is the container, and one or more containers combined are called pods. You can access container logs for pods using the following commands:

kubectl logs podname -n namespace -c containername
kubectl logs deployment/deploymentname -c containername

These are two basic examples. Visit Kubernetes documentation for a full list.

Each node in Kubernetes has a process running called a kubelet. This process is responsible for getting the actions from the API server and then executing them. Simply look up the logs for this process in your worker node and then tail them. 

In systemd-based Linux systems, you can find them by using the command:

journalctl -u kubelet

Keeping an eye on metrics also plays an important role in identifying event triggers. For example, high CPU and memory can cause node eviction due to an out-of-memory (OOM) error or a health check failure. 

Health checks on your pods are also critical. Making sure you have alerts for CPU, memory, and health checks will help you catch more than 80% of issues in production. 

Monitoring all of the above is key to achieving efficient auditing, but it is only one part. Organizations must also establish a proper process and make sure it is adhered to across all departments. Below, we discuss a few steps to help you do this. 

1. Perform Regular Audits

Compiling audit reports for your clusters and their overall health and performance, as well as regularly publishing them across your organization, establishes transparency. This enables everyone to identify actions (e.g., upgrades, optimizations, or cleanups) and then put them in their sprints. 

2. Establish a Clear Process 

A clearly defined process is critical for quickly identifying and fixing issues. One option is for everyone to create a Jira ticket and push it into the owner’s sprint when an issue is found. The owner will then address the problem, after which proper guardrails should be put in place to avoid recurrence. 

For example, let’s say an issue is found where an application port was exposed to the public. A ticket will be created and addressed to DevOps to restrict the port. DevOps will then take action on it and create a step in the pipeline so that no such exposed ports can go to production.

3. Implement Alerting 

Alerts make sure the right person is notified when an event occurs. The security team should be able to catch issues before they escalate via proper review; this again entails placing guardrails to prevent recurrence, emphasizing a proactive approach to security. 

Your auditing tool should connect to an alerting tool such as PagerDuty and then send alerts out as a message via Slack or email based on their level of severity. For example, low- and medium-severity alerts can be sent via email, and high-severity ones can be sent via Slack, while the phone should be used for critical alerts. 

4. Ensure Secure Access Control

Access control can be problematic if many developers are working on a Kubernetes cluster. It’s crucial to perform regular audits to make sure the principle of least privilege is being followed, which, in turn, helps avoid (unnecessary) elevated permissions. 

There are tools to help with this. Komodor provides you with a centralized, cross-cluster view of what access has been given to what users. Its out-of-the-box policies and roles allow you to govern permissions to make sure users are granted the proper access. 

5. Aggregate Logs and Events

None of the above matters if you don’t have all logs and events in one place, correlated, and presented in a single dashboard. It’s important to make sense of so much data and identify what is of primary importance. This is where an aggregated and correlated view will help.

When you’re running a production workload in Kubernetes, you have to be sure that nothing goes wrong in terms of security and compliance. You may by mistake expose a service to the public, or run multiple instances of a service in a node of the K8s cluster that goes down, crashing your app; these issues lead to the improper granting of elevated permissions, decreased resiliency, etc. 

When done properly, auditing can help you avoid such problems and achieve a safer production environment. Establishing a proper auditing practice around logs and events ensures that no issue gets lost—and haunts you later on. To do this, developers must give the same attention to security and other Kubernetes-related issues as they do to their normal deliveries. 

Komdor can help developers implement an effective auditing habit. It provides a centralized place to aggregate and correlate logs, identify action items, and alert the appropriate teams. Explore how Komodor can complement your Kubernetes strategy and book a demo today.