Komodor is an autonomous AI SRE platform for Kubernetes. Powered by Klaudia, it’s an agentic AI solution for visualizing, troubleshooting and optimizing cloud-native infrastructure, allowing enterprises to operate Kubernetes at scale.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Guides, blogs, webinars & tools to help you troubleshoot and scale Kubernetes.
Tips, trends, and lessons from the field.
Practical guides for real-world K8s ops.
How it works, how to run it, and how not to break it.
Short, clear articles on Kubernetes concepts, best practices, and troubleshooting.
Infra stories from teams like yours, brief, honest, and right to the point.
Product-focused clips showing Komodor in action, from drift detection to add‑on support.
Live demos, real use cases, and expert Q&A, all up-to-date.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Here’s what they’re saying about Komodor in the news.
Join the Komodor partner program and accelerate growth.
Auto-generated transcript below.
**0:04** Hello and welcome back to the Rawkode Academy. I am your host David Flanagan, although you may know me from across the internet at Rock code. Today I’m going to guide you through Komodor. If you’re not familiar with Komodor, it is a SAS-based product to help you troubleshoot and debug your Kubernetes clusters. I have a few opinions about it, and I don’t often cover SAS-based products. However, Komodor just this week announced their new free tier, meaning you don’t need to pay to get started with Komodor. Not only that, Komodor sponsored my time to produce the advanced scheduling demo, and lastly, they’ve also committed to being on an episode of Clustered where I’m going to put them through the tests. I’m going to give them a broken cluster, and they’re convinced they can use Komodor to fix it. So thank you, Komodor, for sponsoring the advanced scheduling video. I’m sorry and thank you for joining me on Cluster very, very soon.
**1:15** But today, let’s focus on the tutorial. I’ll show you how you can get started with Komodor. We’re going to start from the beginning but also showcase some advanced use cases for Komodor. The first thing we need to do is go to Komodor.com. From here, you can feel free to read the marketing material, go to resources, pricing, documentation, whatever you want. I’m going to start by logging in, which I use my Google account.
**1:49** So immediately, we’re presented with the service list. This is a list of all the microservices or maybe huge services, who knows, deployed to your Kubernetes cluster. Now, you won’t see this list right away. This is because I’ve already added my first Kubernetes cluster, but let me walk you through the process for doing that yourself. Down at the bottom left, you will see the Integrations button. When you select this, you can click on add a cluster. You can give it whatever name you want and hit next. This will give you the command that you copy and run in your terminal. It will add a Helm Repository, deploy the helm chart with a Komodor agent, and then you click next where it will wait for the connection and confirm it. From there, go back to the home page, and you’ll see your Kubernetes services from your cluster.
**2:44** There’s a couple of nice things on this page right off the bat. First, all my services are healthy, hey! But secondly, we get a good overview of the workloads running in this cluster. You can see I have a bunch of Prometheus stuff. I’ve got one password connect with GitHub’s cert manager shop. Lots of cool stuff. Now, if things weren’t all healthy, we could either exclude the health use or we could filter on their health feeds. If you have more than one cluster, you can filter by that too. And if you only want to take a look at a particular namespace, in my case, let’s just take a look at my community namespace. You’ll see that I’m only running a single service. If I want to view the platform and the community namespace, I can do so as well.
**3:41** If you want to filter by workload type, we can click on demon set and see demon sets, just the basic sentence that you would expect from a service overview of your cluster. The last thing I’ll point out on this page is at the top right. Here we can sort by a few options. By default, it is on health, which makes sense. If there’s something that is unhealthy in your cluster, you want to see that first. The other view that I’ve been enjoying over the last few days is namespace. It’s a good way to break it down by namespace without specifically filtering on a namespace itself. And if you’re only worried about things that have changed recently, go to last modified, and you’ll see the most recent resources that have been modified within your cluster. I deployed previous today, so we can see Prometheus front and center. And that’s the service overview. It’s not life-changing, but it’s very valuable with just enough functionality to maybe pry Cube control over your hands when things go wrong.
**4:39** So let’s see what else we can do with Komodor. We also have the jobs option on the left, although I have no jobs in my cluster. However, this is just the same as Services. If you are using the job object or the crown job object, you will see them listed here. Next, we have the events. This will show you all the events from your Kubernetes cluster. Now, this is something that can be typically quite overwhelming to do from the cube control command line because events come fast and furious in a Kubernetes cluster. And when we have an abundance of information, bringing in a visual layer to that information is how we develop understanding. So let’s see how we can understand the events within a Kubernetes cluster with Komodor.
**5:31** Much like the service page, we have the ability to filter these events on cluster and namespace. However, now we can filter by individual service. We can filter by the event type. We have the ability to filter on the status of the event as well as deploy details and availability reasons, and we’ll get into more of these in just a moment. But first, let’s take a look at my platform namespace. Now here, we can see all the events as Komodor was deployed to my cluster and went through the discovery fields, that is, discovering all the workloads and resources within my cluster. From here, we can click on the service name so that slides out at a nice kind of popover model dialogue, meaning we don’t really lose our original context when we’re debugging, which I think is really important for a debugging tool. So, very nice addition.
**6:22** We have the service name, the health status. You can see all the events for the service as well as the pods, the nodes are scheduled on, and some additional information which gives us access to the labels and annotations on the service. Okay, let’s pop into the monitor namespace, and we’ll select our Grafana service. Currently, we only see information about Grafana, which we’d expect. We can see the events again, the nodes, pods, and our labels and annotations.
**6:59** Now, before we take a look at the best practice recommendations, let’s pop back over to events and see here we have the related resources button. Now, this is quite nice because it allows us to select other resources within the same namespace if we want to be able to collectively grip them and view their events together. So I’ll pick on Kubernetes services and I’ll mark this as related. I’ll pop over to contact Maps where I’ll select the API server one, and I’ll pick one more, which is to pop over to secret and secret finder config. We apply the selection, and there you’ll see that the events listed for on this resource include the related resources.
**7:40** Now, I think this feature could be improved. I’d love to see Komodor scan the YAML for reference config Maps, secrets, and services with matching selectors and hook this up for me. However, doing it manually if there’s a few resources that I do want to group collectively isn’t exactly at the end of the world, so it’s a cool feature and one that could have some really interesting improvements over time. So let’s go back to the information screen, and we’ll see these best practice warnings.
**8:13** So when we click this, we have a bunch of checks, and here we can see that our deployment has one replica. Now, this is a warning just because if we lose that replica, we’ve lost our service, so maybe you want to run two or three. However, you know your services better than any tool can, so feel free to use the ignore button. For Grafana, maybe we determined that we do only ever want one, and we’re happy for that to be offline if something goes wrong. We can just say ignore for 14 days, 30 days, 90 days, or forever. So perhaps I’m not ready to make a decision on whether this is good or bad yet, and I’ll ignore it for a couple of weeks.
**8:54** Next, we have a critical warning telling us that this workload has no liveness probe. If we expand it, it tells us that liveness probes are sustained to ensure that an application stays in a healthy state. When a liveness probe fails, the Pod will be restarted. This is a pretty neat behavior. The cubelet monitors our workloads, and if it needs to kick them, it kicks them. So you should always try and have some of these best practices whenever possible, and Komodor brings that front and center. So I’m not going to ignore that one because you know what, I should have a liveness program that’s workload.
**9:29** Now, we’ve got some past ones here where we have a Readiness probe, we have CPU and memory constraints, and the last one is just a pill policy. It’s not good practice to have an average pill policy of always. Usually preferred to set it to F not present. It just means when the workload restarts, you don’t need to go to an image registry and see if it can be pulled down, and it usually means you’re using some sort of Alias tag system. Again, we want to kind of get away from that as much as possible. So it’s not a critical, but it is a warning that you maybe you need to update this. And I think this is a nice way to gain more insights and understanding of the services within our cluster.
**10:10** So let’s see what else we can do from the events page. So I’m not going to filter on an individual service. I don’t think that shows as NF in you, but if we scroll down, we can filter on the event type. So let’s filter by one of these event types and see what information we get back. Let’s start with one of the most common ones, which is conflict change. This is going to tell you when a conflict map is created, modified, or deleted within your cluster. So let’s create one. Here I have cm.yaml, which is a config map called raw code. If we go to the terminal, we can apply this to our Monitor and namespace.
**10:55** Let’s make a quick change so that’s conflict map and say that we no longer want key value. Instead, we want name David. Go to our terminal, apply this one more time, and let’s go visualize this with Komodor. So right away, we can see that a config map was created and the monitor namespace called Rockwood. We click on this, we have all green because it was the first time this conflict map was created. We then have our change, and this time we can see that the key value was removed and named David was added. And if you want to view this in more details, you can expand the death. We can see the data changed along with some metadata about the resource as well.
**11:47** And this is one of these really simple but very valuable features. When things go wrong on a Kubernetes cluster, it’s not because the resources haven’t changed. It’s because of our changes that things sometimes go wrong. Human error is probably still the biggest cause of problems in a Kubernetes cluster. So it’s crucial that you understand when conflict changes in your cluster and how that can have a cascading effect on the workloads within your cluster. And your ability to see those changes as they happen will substantially lower your mean time to recovery.
**12:24** So beyond conflict change, let’s filter on availability issues. No availability issues give us an understanding of when a workload was unavailable, perhaps because the part was being restarted or the probes were failing. If we take a look at the Grafana one here, you can see that this pod was unhealthy and why it was unhealthy. Well, because it was container creating. Of course, it’s not healthy if it’s discreeting. Also, what’s nice here is it shows you each of the containers and the status for them too. If you want, you can click the live pods and logs button.
**13:09** This will show us the current pod and our cluster for that selector, where we could pop it open and go to logs. So it’s nice having logs right front and center when required. If we pop back to details, we can see the conditions that tell us if our pod is healthy. We have the containers, the images are running, the pill policy, the ports, the mounts, and the arguments, all the useful information that you need. We have the abilities to see the tolerations, the volumes, and of course, the events associated with this workload. If you’re a fan of the cube control describe command, you can click the scrape and get the exact output on the screen as so.
**13:54** So to reiterate from the advanced page, we’ve seen a pod with availability issues. We went to the current instance of this pod, we’ve seen there is no problem, and we had all the information we need to debug a problem, to debug if there was something wrong. Now, the rest of the Komodor UI is pretty much more of the same. We can break down all of the resources. We can see nodes, we can click on a node, we can see all of its conditions, the capacity, and allocatable resources across CPU, memory, storage, etc. We have the ability to Corden and drain a node if we wish. For workloads, it’s the same. We have deployment, so we can click, we can add it, we can scale, we can restart. You can do this for most of the resources within your cluster. For storage, we can see storage classes or config. We go to config Maps and we can see them. Pretty much you’re getting a visual representation of everything you can do with the cube control command.
**15:06** We can even list the custom resource definitions within our cluster and search. So I’m not going to spend any more time going through this because these web pages are dashboards, and as we all know, dashboards are not to be looked at until something goes wrong. So how do we get information from Komodor to give us alerts when our attention is needed? And for that, we have monitors.
**15:39** We can expand our cluster, and we can see that we have some rules already in place. These are shipped by default by Komodor. We have an availability monitor. This will let us know if any of our workloads are less than 80 for more than 10 seconds. If we need 10 pods in our deployment and for more than 10 seconds we have 7 or less, we’ll get an alert. If our Crown jobs are failing, we get another. We can get alerts for when deployments are updated, and we can also get alerts for when our nodes are not healthy.
**16:17** So let’s take a look at one of these alerts and then configure our own. Here is the deployment alert. This is going to let us know whenever a workload is modified within our cluster. Using the Integrations that you have configured with Komodor, you can use these as destinations. We can use a standard webpack or publish a message to Slack. I have a channel called SRE, and I’m going to click save. So now if we modify a deployment, we should get a notification to my Slack Channel.
**16:52** So let’s test it. I have my Slack here, but first, we need a deployment change. So I’m just going to do this through the Komodor UI. I’m going to go to deployments, and we’ll modify the cert manager deployment. We can click on edit YAML, where I’m just going to add a new label. We hit apply. We can see that we have new events, and we can see that our manual action to edit a deployment. We click on it, we see the change. So even though we can see the deployment changed here, this will not trigger a Slack notification because it uses the resources generation rather than a resource version, which is good because do we really need a notification that a label changed on a workload when the workload itself was not restarted, rescheduled, or modified?
**17:55** Now, let’s make one more change to a resource. This time we’re going to add an environment variable, which is going to have a value of high. Now, this will trigger a new generation, and if we pop over to Slack, we’ll see the notification in the SRE Channel. And if we click on this, it takes us directly to the event with the change. We can see that the revision and generation of this resource went from one to two because of an environment variable addition. Let’s workload change is also denoted here by the new deploy event, which tells us that the image doesn’t change, but other aspects did. So given that, we have a pretty sophisticated troubleshooting and debugging tool here.
**18:54** It’s also worth noting that Komodor has pretty elevated privileged access to your cluster, and as such, we need to be able to trust it. Luckily, Komodor provides the ability to protect some of our more sensitive information from being leaked through the Komodor UI. Let’s go to resources, workloads, and pods. If I select the default namespace, I have a new super secret workload. If we click on this, there’s not a lot to see here, but if we go to the logs, we have a password. How do we prevent Komodor from leaking such information? No, it doesn’t have to be just an or standard out logging. Although when applications crash, sometimes they do dump the environment, revealing some very sensitive information. But also, how do we redact this from conflict maps and other sources of sensitive information? Fortunately, by default, Komodor hashes all the information that it pulls from Secret resources, but we do need to put in a couple of extra steps to protect standard out and or config Maps, although I hope you’re not storing too much sensitive information in the config map.
**20:13** And I’m going to configure this through the Komodor UI. So the first thing we want to do is to go to configuration and config Maps, where I can select the Komodor namespace. Here we have the Kubernetes Watcher config, and I’m going to go straight to the edit page. You’ll see we have two settings here on lines 20 and 21 called redact and redact logs. These take a list of Expressions to redact from Kubernetes resources and from log data. Now, the six steps regex pattern matching like you can do with any sophisticated login Library, but I’m going to keep it very simple for this demo. I’m going to explicitly say that I want my password one two three omitted, and we’ll add one more. This time we’ll do a reject match for anything we’ll click password equals. Let me open the matcher to dot star and we’ll stick a space on the end.
**21:18** So now we will have to kick the Komodor agent so that it reloads its configuration. We can just delete, and already we can see we have a new one running. So let’s go back to our pod and the default namespace, where we have our super secret secret workload. And if we view the logs, our password one two three has been redacted. So let’s modify this at the deployment level, or we can say edit YAML, and we have password one two three, but let’s also add password equals hello not secret like so. What caused this pod, which we can see here, to be terminated with a new one running? And if we pop open the logs for here, you can see that both values have been properly redacted.
**22:32** So this is a very important feature, but also a very cumbersome feature because security is hard. It’s never easy. It would probably be worthwhile for your team or organization to have a convention to, well, not log sensitive values, but if you do, always make sure there’s some marker in place so you can configure tools like Komodor and other logging systems to redact that information as fast as possible. And you can even run the container locally to test your redactions before pushing them to your Kubernetes cluster.
**23:13** If I go to my terminal, I have a just fail. It’s like a MIG fail, however, it allows positional arguments on targets, and it’s generally just a little bit nicer to work with. From here, we can say redact, and you can already see the autocomplete here and the documentation from the just file, but we provide a redaction phrase and a log Lane. So I’m going to say let’s redact raw code, and then the log Lane that I want to test is to say I and raw code hear me type. Just pulls down a container image, does a little bit of plumbing, and then shows you the input log and the output, and you can see what it was before and after the reduction.
**23:59** This means that you can test your regex patterns all you want. Say you want to do password equals dot star question mark raw code because maybe we got it wrong, and then in our test with the password equals La Raw code without any. So this won’t redact, but we have a problem we can fix it. So let’s run that again with the E on the end, and oh, it’s still broken. Well, clearly, I don’t know how to spell Rockwood. There we go, where’d you do it right? It works. That string was redacted because we were able to test the regex.
**24:46** Now, this is really cool. You can actually plumb in a shell script, a whole bunch of example log lanes that you have from your application, but redactions that you know always have to be satisfied and hook this into your CI system, and that way, you know right away whenever you’ve got secrets leaking that should be redacted.
**25:05** So that’s a quick overview of Komodor. There’s an awful lot to love, and it gives you great visibility into the wonderfully complex system that is Kubernetes running our wonderfully complex applications, which are our microservices. This was just part one. There’s going to be a part two of this video dropping early next week, and part two we’ll be taking a look at more of the Komodor Integrations. We were seeing how to integrate your source control via GitHub. We’ll also take a look at hooking up to Sentry for exception tracking and Grafana and alert manager, giving you full visibility across all of your observability stack. And then at the end of next week, part three will drop where we take a look at two final features, one at the humble web hook and how we can get information from Komodor to do whatever the hell we please, and then one of my favorite features, the v-cluster integration, deploying Komodor to all your virtual clusters and multi-tenant Kubernetes environments. We’ll be back next week with the next video. Until then, have a wonderful day, and I’ll see you all soon.
Share:
Gain instant visibility into your clusters and resolve issues faster.