You can also view the full presentation deck here.
Jared: Good morning, good afternoon or good evening, depending on where you are in the world. Welcome to today’s DevOps comm webinar, brought to you by Techstrong and Fairwinds. My name is Jared Harris, I’ll be your host for today’s webinar. We’ve got an exciting webinar ahead of us. But before we begin, I’d like to give you a few housekeeping notes. Today’s event is being recorded. You’ll receive an email shortly after the webinar concludes with a link to access to recording.
If you have any questions, we’d like for you to submit all questions in the Q&A tab, which is on the right side of your screen. This is also where you’ll find the chat tab, where if you want to engage with our speakers, as well as fellow audience members, you’ll see a link where you can download… you’ll see there’s the chat section over there to do so. We’ve also got a few polls during the webinar. So, go ahead and stay tuned. We’d love for you to participate in those and see what you think. For the conclusion of the webinar, we’re also going to make sure to give away four $25 amazon gift cards. You’ve got to be in it to win it. So, stick around, make sure that you’re here for the entire presentation to see if you are winner.
Let’s go ahead and get started with today’s webinar. Kubernetes, open-source validation. I’m joined today with Kendall Miller, technical evangelist at Fairwinds. Andy Suderman is the CTO here at Fairwinds. And Nir Shtein, software developer at Komodor. It’s my pleasure to turn the floor over to you, Kendall, to get us started. Thank you so much for being here.
Kendall: Thanks, Jared. I think, am I still sharing my screen? Yeah. Can you turn my screen share back up? There we go. Hey, okay. So, welcome, folks. This is a talk on Kubernetes open-source validation. We’re going to be talking with Nir and Andy about things that Fairwinds and Komodor are both doing in the space. And if you’re here for something else, that’s not what we’re talking about. This is what we’re talking about. So, I’m excited to have you here. Anyways, let’s dive in. Real quick, by way of introduction, Nir, why don’t you start? And then we’ll do Any and myself.
Nir: Yeah, of course. So, first of all, hello to everyone. Very happy to be here. So, I’m a software engineer with Komodor and the main developer behind ValidKube open-source project, the first open-source project of Komodor. Generally, I very much love Kubernetes, love open-source, and of course, the program. So, it’s going to be fun today.
Kendall: Glad to have you here, Nir. Andy?
Andy: Hi, I’m Andy. Thanks for having me. CTO of Fairwinds, open-source developer. Lots of experience of Kubernetes. Excited about all things open-source and all things Kubernetes. Just always love to talk about that stuff.
Kendall: Open-source and everything open-source, all things Kubernetes. Glad to have you all here. My name is Kendall. I’m a technical evangelist and advisor for Fairwinds. I’ve been here from the very beginning talking about all things Kubernetes for more years than I’d like to admit at this point. But we’ll just skip over that detail, and we’ll tell you about Fairwinds and we’ll tell you about Komodor, and then we’ll dive in.
So, real quick, Fairwinds has been in business for about a little over 7 years, focused on all things Kubernetes for the vast majority of that. We seek to help companies deploy faster, deploy more confidently, and do that by building tools for configuration validation in Kubernetes. So, we build a whole bunch of open source which we’re going to be talking a lot about today, as well as we have a SaaS product that helps folks with getting the things they’re deploying into Kubernetes correct. Getting them configured correct, so they’re not over provisioned, under provisioned, over permissioned, under permission, etc. There’s a lot of things in Kubernetes that are complicated and hard to get right because it still feels like a new paradigm for most people. And Fairwinds exists to build software to make that easy for you. So, if you’re using Kubernetes, I can almost guarantee we have some kind of open source that’s relevant for you on resource side or security or just correct configuration, having confidence that what you’re doing, you’re doing correctly. So, that’s who Fairwinds is, trying to simplify that complexity. And I’ll turn it over to you, Nir, to talk about Komodor.
Nir: Yeah. So, I will introduce Komodor in a very, very short. So, generally, Komodor is the first Kubernetes native troubleshoot platform. We generally give a variety of solutions on top of Kubernetes in order to help our customers to maintain clusters and to prevent and solve issues in Kubernetes. As you say, and as we all know, Kubernetes is a very complex and a very large system, with a very high complexity, and many capabilities, and many things to know about platform.
So, in order to make things simple and to make things to solve them more efficiently, we created Komodor. Komodor generally give you, first of all, visibility into the cluster. We give visibility, not only what happens now, we also give like a timeline of what happened in the past, what happened a week ago, what happened a month ago. It can be some config change. It can be some bad state of some deployment. By doing that, when you come to solve problems, and when you have some incidents, your life will be much more easier.
The other thing is to like looking about so many tools, and try to solve using Kubectl commands. And it can be very complicated, and very hard to solve problems. And so, Komodor is pretty simple. And we like telling our customer to install Komodor agents on the clusters using helm or customize. Then the agents like reports what’s happening on the cluster. And Komodor like process millions of Kubernetes events every day, and then make those events into our platform. And from there, you can do like everything that you need in order to troubleshoot and in order to prevent issues on Kubernetes. So, this is a brief about Komodor.
Kendall: Thank you, Nir. That’s great. So, before we dive in too deep, let’s start with a poll. We’d like to ask folks, “Where are you in your Kubernetes journey?” And we’ll have Jared bring that up. And then you can select from, “I’m learning about containers and Kubernetes. Planning to use it in 6 to 12 months using only in testing, development, or using it in production.” And, yeah, we’ll show the results live as they come in here. So, click some buttons and we’ll see those results come in. Did we close it?
Jared: Yeah, we’ve got some people responding. I think it’s just a little delayed.
Kendall: Okay, okay. Great. Yeah, I can submit my answer. There we go. Ah, there we go. Okay, I’m seeing some results coming in. But it’s weird that it’s not doing it on my screen. Hwere we go. Okay. There’s some coming in. You can keep responding here for a minute. We’ve got about 20 responses. I know there’s a lot more you out there. So, we’ll give it a second just to see where folks are at. This is the divide we tend to see is that there’s a lot of you just getting started with Kubernetes, just learning about it, and then a lot of you using it in production. And there’s a lot less in the middle increasingly, because we’re seeing this dichotomy. People dive in and they get to production very quickly, even if they’re not confident they’re using it in production quickly. Or they just stay in that, “Well, maybe we should do this phase and don’t dive in with both feet.” So, this is pretty common split. Give it about 10 more seconds and we’ll wrap up the poll. [Singing] This is me singing a song the past the time. Okay, 5, 4, 3, 2, 1. There we go. Okay, poll’s closed.
And let’s see, yeah, so total, in the end there, about 40% of you learned about containers and Kubernetes. Almost 30% using it in production. About 30% using in testing and development, and then very small percentage planning to use it in the next 6 to 12 months. So, that’s good. Good spread. Thanks for that. We collect this overtime aggregated, so we see how the audience is changing over time too. It’s useful for us for that.
So, we’re going to be talking about some best practices that are obvious. And this is going to be a little bit of the bulk of some of our conversation that we’re going to get into, some of the open-source tooling that’s Fairwinds has, that Komodor has released, and how some of those play well together. But let’s dive into a few questions here that both Andy and Nir have been using Kubernetes for a long time, probably at a level different than the average Kubernetes user. So, let’s start with some best practices. What are the things that are maybe obvious to someone who’s been doing this for a long time that aren’t obvious when you’ve just moved to Kubernetes, and you’re new to the whole ecosystem? Andy, you want to start?
Andy: Sure. I’ll start with my favorite one that I always, always bring up, because I think it’s core to how Kubernetes works and how scheduling works in Kubernetes, and how scaling works in Kubernetes, and how you get all the benefits that you need out of Kubernetes, and that’s your resource requests and limits. Because if you don’t set those right out the gate, you’re going to have a lot of problems just overall. I think I’ve talked about this in several different forums. I’ve written tools around it. I think it’s super important. Those requests dictate how your pod gets scheduled, dictates how you distribute loads across your cluster. And then your limits are your safeguards, so that you don’t run into noisy neighbors and take down other workloads within your distributed environment.
Kendall: And this is only a problem, Andy, because Kubernetes is doing the bin packing for you, right? Where historically, I would write an application and put it on a machine. And so, I picked how big that machine was, and that’s what determined the resource requests and limits. But now, Kubernetes is figuring out how to pack things into a machine. And unless I tell it what to do, it’s going to what? I mean, give a couple of examples of some of the things that go wrong when you forget.
Andy: Yeah, I mean, I’ve seen pods that consume an entire node’s worth of memory. And then what happens is the Linux kernel tries to manage that and it says, “Oh, we’re out of memory. I need to kill some stuff off.” And when you’ve got 15, different things running on that node, you don’t know what it’s going to kill first. So, it could be that it takes out the kubelet, which is the thing that runs your pods on that node, and now that node just doesn’t work anymore. I’ve seen it take out the networking components on that node. So, all of a sudden, now that node can’t reach out anything in the network, nothing can talk to it, and you get a dead node. So, that’s like kind of the high end of the problem.
The other end is like if you set your requests or limits too low, you may not have enough resources, or you may start getting CPU throttled. And so, your performance will be highly degraded, and you won’t get the performance you’re expecting out of your application. And I’m trying to think of other good ones. I did a whole talk on this recently, actually, so, I should have some.
Kendall: No, it’s okay. That’s good as a whetting your appetite. But it’s also, I mean, having your resource requests and limits wrong can also just mean you’re spending way more money than you need to be spending, because you’re wildly over provisioning your workload. So, that’s another common thing.
Kendall: The average dev goes, “Well, I want my workload to work. I don’t have a clue what my resource requests and limits should be. So, let’s just say 8 gigabytes, and 800 whatever, all the all the CPU.”
Andy: “My MacBook has 64 gigs of memory, and 15, 16 cores. So, that’s what I’m going to give my pod.”
Kendall: “That’s what I’m going to give my pod.” That’s exactly right. Nir, do you have…? What are some obvious things to you that you don’t see people often notice when they’re getting started?
Nir: There are so many of them, so it’s hard to choose. But I will start with the most simple thing for my opinion, and is to maintain a good YAML. And we all know that, eventually, all the resources are sending the YAML. And the YAMLs are representing the desired set for resource. And because Kubernetes is doing so much automation, it is strive to be like the content it is within the YAML. So, I will recommend, to maintain a hygiene YAML and make it clean as small as possible. And also, to include the important metadata inside it. And what I saw in the past, huge YAMLs, it contains so much I needed [inaudible] data and fields. And I was like struggling to see what is going on inside. And only after I cleaned the YAML, I was able to understand what I was need to understand in the first place.
And more than keep your YAML small and neat, I also recommend to include important metadata in the [inaudible] and in the labels. And [inaudible] like arbitrary, non-identifying metadata. It is generally used for like [inaudible] tools or some libraries that you use. And on the other end, libraries tended to be the opposite. They will tend to be like a identify metadata that is a meaningful and relevant for the users. And you can use libraries like to select groups or dividing subsets of objects. So, it’s really important to specify you your libraries properly.
And the last 2 things that I recommend to include in doing good in there, is first to like keep all your secure data and secrets and then reference your environment variables to the secret. And same goes for config maps. All the complex data, all the data that is across many services, keep it in the config maps, and then references to the config map. So, your YAML will stay neat and small with no so much information.
And the last thing that I think this is the most important thing to include in the YAML are liveness and readiness probe. Liveness and readiness probe are huge things. I want to deep dive into them. But generally, readiness probes are there to determine if your container is ready to accept traffic. And readiness probes are there to determine if your container needs to be restocked. And by defining the readiness and liveness probe, you can ensure that your application doesn’t entering some broken state or some other state that you want don’t want to enter them.
I can tell that I saw many times when I didn’t define readiness probes, and I, “Oh, my application is up and running.” And I scratch my head and stall and trying to understand, what is the problem? And the problem was my service didn’t config right. Like, I didn’t able to catch this because I didn’t define some readiness probes.
Kendall: Yeah. I mean, there’s… and to go back to your YAML bits, I think one of the reasons good hygiene in your YAML, good YAML hygiene… trying to figure out how to line up those words correctly. I mean, one of the reasons that’s so important is that, basically, all that you do in Kubernetes is YAML.
Nir: Are YAMLs.
Kendall: Like, the entire thing is just writing configuration. It’s not the old days of managing fleets of computers with chef cookbooks. It’s just all YAML, period.
Nir: Kubernetes doing everything for you.
Nir: It’s doing the hard work. And a little misconfiguration can lead to some a fire in your production, yeah? Some little misconfiguration in config map can lead to some wrong client ID, and then your application doesn’t work.
Kendall: Which is why, Nir, we suggest that every company email all of their latest YAML to each other at the end of the day, and zip that file and share it company-wide, so that we know what the latest YAML is, right? Now, I mean, just keeping it all this code and get is… like I think that’s obvious. I know you both laugh, but it’s not always obvious to everybody. So, having it all those codes somewhere in a repository that makes sense is also key to that. Okay, well talk… one thing that comes up a lot. And, real quick, I should pause and say, please, folks, if you have questions, there is a Q&A tab in your browser as part of this. Feel free to ask questions as we go. We will have some time for Q&A at the end. But if I see a question come through that is relevant, we may pause in the moment and talk about that. So, please feel free to ask questions as we go.
One question that I get asked a lot when it comes to Kubernetes is environment segregation, “How should I decide?” And I know, everything in operations is, “It depends.” But spend just a second time about how you think about this problem, like, “How many clusters should I have? Should I separate in between clients? Or should it all be namespaces? Or does it not matter, and I’ll just run everything in the same spot?” And you just, obviously, it’s a different answer for everybody’s company, because it depends. But give us some guardrails on how we think about environment separation. And what are some of the high-level principles for it?
Nir: Yeah. So, I’ll start off saying that I think that when you use Kubernetes, you have like 2 options how to separate your environments. You can do it like via cluster means you have like a different cluster for each environment. And you can do it via namespaces. And this is a resource of Kubernetes, means that you need to separate your environments for each namespace. I think the advantage of separate your environments via clusters is that you don’t have any dependencies between your clusters. Like, your staging cluster can collapse, and everything can go on there, and your production will be still up and running. And I think this is the most important the benefit when you separate environments via clusters.
And you can also do like some tests that are across cluster, like some Auerbach changes or like some Cluster auto scaler that you want to test. And when you have like one environment on one cluster and you separate the environments via the namespace, I think the big advantages is that you don’t need to invest so much in setting up many clusters. You don’t need to invest in like abrogating the cluster. You don’t need to invest in monitoring and some specific metrics for each cluster.
Andy: Yeah, I would definitely agree with everything there. I think what we typically recommend is a hybrid approach of those 2, where you have…
Nir: Mm-hmm. Exactly. Yeah.
Andy: You have a production cluster and a nonproduction cluster, and then you separate all of your environments in your nonproduction cluster by namespace. That way, you get those benefits. as being able to test your cluster-wide changes in the nonproduction environment, and you also don’t have to spin up 100 clusters for 1 per dev or whatever you end up wanting to do.
Kendall: Yep, trying to find my mute button. Great. And well, anything else that we want to touch on this topic before we move on? I know we could probably talk about some of the basics for like a long time. But if… let me put it this way. If I’m moving from Windows to Linux and I’ve never used Linux before, one of the first things you’re going to tell me is like, “Just forget everything you ever learned, right? Because it’s all different.” I mean would you start with that with Kubernetes? Because it is kind of a completely different paradigm. You’re no longer thinking about managing the individual devices, you’re thinking about all the configuration of the very high level in a declarative way, rather than like, “I’m going to reach in and effect things.” Is there anything else that you would just say, “When you’re getting started, here’s the best practice that you need to consider,”?
Andy: I mean, first thing, it depends on where you’re coming from, right? Because there’s declarative systems for managing VM-based infrastructures as well. They’re not as popular right now, and it’s not what we’re here to talk about, but they do exist. So, maybe you’re already in that declarative mindset. But when you’re talking about containerizing an application and then putting it in Kubernetes, I think what you have to focus on while you’re building that from the beginning is you have to bake security in from the beginning. You got to think about how you’re building your container, how your container runs, how you’re configuring your container to run in Kubernetes after that. But it all comes back to, in the very beginning, how’s your app built in your container? Does your app have to run as root on your machine? Like, that’s a bad practice in a VM or in a container. But that’s something you should consider.
Andy: And I can tell another thing about what Andy say, that we can tell when you use Kubernetes, most of the time, like the application is the guilty when things go wrong. But we need to remember, because we use Kubernetes, many things can go on and because of the cluster. Like because, like I say, misconfiguration and network problems, some broken PVC that your stateful application is relying on. So, I think that when you’re using application that is running Kubernetes, it is very important to log out some logs that are related to the cluster, like, “What is the status of my nodes? What is the status of my config maps, PVC, process volumes, etc., things like that?”
Kendall: Yeah. Great. Okay, one of the things that… and a few more things we’re going to talk about here, and then we’re going to move on and actually dive into some of the open-source stuff and have a conversation around ValidKube from Komodor and some of the software from Fairwinds as well. But before we go there, one of the things we’ve been fiddling with at Fairwinds too is… or at Fairwinds commonly is API deprecations. And I’m just curious, Andy, this is a space you’ve played in too. But, Nir or Andy, I mean, how often do you see API applications being a problem? And what does it look like when it becomes a problem?
Andy: I mean, it’s a problem all the time, but there’s a few key Kubernetes releases that make it particularly painful. The migration from 1.15 to Kubernetes, 1.16, there were many, many API versions that got removed. The important distinction is deprecation versus removal. So, the Kubernetes deprecation policies is that an API is marked as deprecated. And then I think you have to keep it around for a minimum of 2 or 3 minor versions (I don’t remember the actual policy) before it’s actually…
Nir: It’s 5.
Andy: Is it 5? Okay. Before it’s removed from the API. And once it’s removed from the API, then you have problems. Because now, I can’t deploy that API versioning anymore. So, my YAML (coming back to what Nir was talking about earlier) has to have that newer API version in it, otherwise, I just can’t deploy the cluster. So, we’re seeing it right now. We’re upgrading all of our clusters to 1.22. And there’s several major API removals in that version that we’re dealing with.
Kendall: yeah. Nir, anything to add to that?
Kendall: Okay. So, then we’re going to dive in in just a second to these open-source projects. But let’s start with you, Nir. What got you into…? Like, we’re going to talk about ValidKube, which is a tool for making sure that things are configured in a valid way, right? What got you into this problem? Why are you solving this problem? Why is this interesting to you? Talk a little bit about that, and then we’ll dive into what it is.
Nir: Mm-hmm. So, when people are starting to learn about Kubernetes and start dipping into the Kubernetes world, the first thing that they do is to like make things work. They like deploy YAML, roll out, and everything seems fine and work. But in the second day of learning Kubernetes, you want to improve your YAMLs. You want to improve your quality. You want to like, apply some best practices. You want to make your image containers are secured. You want to make that your YAML is validate. You want to make it neat and small. And because of that reason, we like serve for those way tools that I will tell later on in ValidKube in a simple way, and that the user that’s starting Kubernetes won’t have to like install the tools on their local machine. And I think this is the main reason.
Kendall: Yeah. Great. Andy, what got you into solving those problems?
Andy: Customers. So, we we’ve been running clusters for customers. We’ve run hundreds of different clusters, a lot of different verticals, a lot of different languages, a lot of different types of software, a lot of deployment methods, and multiple cloud providers. And what we saw while we were running other people’s clusters is that we could build them as stable and secure and awesome as we wanted. But when the customer goes to deploy something into it, it’s a distributed environment. And all these problems introduce other problems that wake me up in the middle of the night as a cluster operator. And so basically, my driver for all the various tools that I’ve worked on in my time here is helping people not make mistakes in their clusters so that I don’t have to wake up in the middle of the night, if we’re really going to like boil it down.
Kendall: Yeah, good reason. Yeah, it doesn’t matter how good the cluster is if everything that’s deployed into it is terrible. Okay, so let’s dive in and let’s… Nir, let’s turn it over to you. Talk about ValidKube. Why does this exist? What does it do? And go from there.
Nir: Yeah. So, as I mentioned, ValidKube is the first open-source project of Komodo, hope 1 of many others. And ValidKube like combines right now for open-source tools that give you the control Kubernetes best practices, hygiene, and security. Basically, it’s a very simple tool. It’s basically a website that is maintained by Komodor, and is public for everyone that want to use it, invite people to come. And you basically enter your YAML as an input and 1 of the 4 tools and get the output that you want.
And the first tools that we want currently in the ValidKube, the first one is Validate. As it sounds, it validate the YAML. It validate to YAML, switch from Kubernetes. It Validate all the required fields out there, and you don’t miss anything. And the second tool is a KubeNeat. And the purpose of this tool is to clean your YAML and make it neat and small, by recommending you moving like all unnecessary and all unrelevant fields. And the third tool is Trivy. Trivy is basically a security tool that scans for vulnerabilities on your container images and file system or git repository. We use it specifically, invite you for scanning the container image within the Kubernetes YAML. And the third… the fourth and last tool is Polaris. I’m sure that you will explain about Polaris later on. So, I won’t explain about that. And by the way, 2 of the tools, Trivy and KubeNeat are powered by Aqua Security. And I think this is it. In just the website, you enter YAML and get the result that you want. Very simple.
Kendall: And, Nir, I mean get the result that I want, what happens when I actually put in YAML? Does it fix it for me? Does it tell me where the problems are and how to fix it? What does the actual interface look like?
Nir: Yeah. So, the 4 tools like tell you what are the problems, what you need to improve. And you can run it all the time until your YAML is fixed and what you want.
Kendall: Okay. Great. You eventually get like a passing score that you’ve fixed all the things or something. Is that right?
Nir: What, Kendall?
Kendall: You eventually get like a, “Yes, you’ve fixed everything. Like, the problems are all gone.”
Nir: You don’t get any errors, yeah?
Kendall: Yeah. Okay, okay.
Nir: It depends on the on the tool that you use. You can like integrate any tool ValidKube, even if it is like a CLI, and you get as input as a… just YAML as an input. And you can like integrate every tool.
Kendall: Cool. Okay, appreciate it. And, Andy, give an overview of Polaris. Well, this is one of the tools in ValidKube. That’s part of why we’re here. And to be clear, we’re here with the Komodor folks because they’ve built this integration that includes Polaris. So, we’re talking about how all of these things play nice with each other. And, yeah, go for it.
Andy: Yep. So, Polaris is CLI tool and also is a… can function as an admission controller in your cluster, and as a… it can run in your cluster. And Polaris checks your various Kubernetes configuration and objects for best practices. So, some of the things we talked about earlier today, like setting your resource requests and limits, some of the security settings that are important, not using the Docker latest image tag, and various things like that. And so, you can run it against the objects in your cluster, and it’ll tell you where you’re not doing a good job. And so, it plugs into ValidKube to scan your YAML files. And we’ve got 20 built in checks across 3 different categories security, efficiency, and reliability. And you can actually add custom checks to it in the form of JSON schema. It also has a configuration so you can exempt certain things, like you know this thing has to violate various best practices, you can do that. And then as I mentioned, it can run as an admission controller. So, if you really just want to stop anyone from being able to deploy anything into your cluster that violates these various best practices, then you can do that.
Kendall: Great. And you’ve got a couple more to go through here. We’re going to be covering several Fairwinds open-source projects. But talk about Goldilocks, because we touched on resource requests and limits.
Andy: Sure, yeah. So, when telling people to set their resource requests and limits, we said, “This is kind of hard. Like, what should I set it to?” We talked about, “Oh, well, my Mac has 64 gigs of memory. Maybe that’s my pod needs.” Well, that’s probably not what your pod needs. And so, Goldilocks will create a vertical Pod autos scaler object, which is another project within Kubernetes that provides resource requests and recommendations based on real-time usage. So, it sits there and it watches your pod run, and all of your pods run, and it says, “Hey, these use, on average, 500 megabytes of memory. And that’s what we’re going to recommend you set it to.”
Kendall: Great. Makes that easy. Look, everybody has a hard time with the resource requests and limits thing. Just use Goldilocks. It makes the problem go away. Go ahead. Pluto.
Andy: It may be a little reductive, Kendall. But anyway, so we talked about deprecated API versions, Pluto is the tool that helps with that. Pluto will check your YAML or your helm manifests, your helm charts for deprecated API versions. This one’s a little bit tricky. You can run it against the cluster. It will check your helm releases in the cluster. But there’s various reasons why it’s not entirely 100% reliable in cluster, because Kubernetes does a really nice job of translating API versions from one to the other. And it’s actually relatively tricky to figure out what version was in the YAML that was used to deploy it. And this is why it’s super important to run tools like Pluto and Polaris against your infrastructure’s code, your YAML, rather than live in your cluster, or in addition to live in your cluster, we should say.
Kendall: Yeah. And Nova?
Andy: Nova. So, Nova checks for outdated stuff. So, when you’re trying to find known vulnerabilities in your cluster and in your code, most of the time, it’s just because stuffs out of date. And so, originally, Nova was written to look at, “Here are the helm releases in my cluster?” If there’s a new version of that helm chart available, we’re going to recommend that too. And as of just this week actually, Nova now looks at your Docker image versions, and tries to find new versions to recommend to you.
Andy: Yeah. Nice.
Kendall: I knew this was coming. I didn’t know that that got launched. Cool.
Andy: I think it was yesterday that I released them. Yeah.
Kendall: So, let’s talk about, at a high level, like what are some of the things that people should consider when choosing OSS tools? Like, we’ve talked about how Kubernetes is, this new world this new environment. We’ve talked about this adjustment, the best practices that a lot of people miss. We’ve shown you open-source tools and said, “Here’s some ways to go succeed at that both from Komodor and from Fairwinds.” But I’m Bob self-system administrator, “What do I know to do, or what do I need to know to do when I’m considering choosing OSS tools? What are the things that I need to think about?”
Andy: Easy, just pick it if it comes from Komodor or Fairwinds. Done.
Nir: Good answer.
Kendall: Nir, you want to start with that one?
Kendall: I mean, do we think yeah, like how much effort it is or fit into a spec? Or what do we do about that?
Nir: Yeah, yeah. I can start. So, I think the first 2 main things when choosing any open-source tools, is to consider the effort that you need to evolve in the maintenance and in the adoption of the project, means what the efforts in order to combine the project in the company? And what is the effort after that to maintain it?
First, I want to look about the adoption. There are several things that I want to look at. The first, is it easy or hard to implement the project? And if it’s easy or hard, how much time is going to take me? It’s going to take me an hour day and months? It can be very, very, very positive thing to consider. Does it required some specific skills like programming language, like some knowledge at some framework? Like, if this project is written in Python and want to extend it in the future, and none of my developers know Python, it’s maybe can be fair choice to choose this project. I want to look if this product has some prerequisites, like some infrastructure. Do I need to install something before I use the project? Do I need to write some code or write some configuration in order to combine the project?
So, like, let’s assume that I want to dock the project, and I adopted and combine it in my software, now I want to look about the maintenance of the project. First, I want to look how much time it’s going to take me to maintain it. Like, in a week’s time, like do I need to put full developer all week on that in order to maintain it? Do I need to put like 5 minutes in a week, 10 minutes, 1 hour? And if I find a bug in the project, what I do? And if I’d like to expand the project to new features and new capabilities, what I do? Can I do it easily? Is it going to be hard? And do I need to put like monitoring and metrics and things like that? And if so, does it come out of the box, or I need to implement it myself? And I think all those questions and can really give me great estimations about what is the effort of adoption and maintenance.
Andy: Yeah. I think another big one I’d add to that is there’s the XKCD Am I getting that wrong? The XKCD picture of like all software in the world is built on these things that are all open-source down here and open-source down here. And there’s this 1 little project that’s maintaining the entire thing, and it’s maintained by 1 guy in a basement somewhere and nobody knows who that is. And like there is… it’s worth seeing who maintains this project, how well is it maintained, when’s the last time. Because there’s some projects, like it’s not uncommon to go out looking for something that solves your problem. Find something that solves your problem, and then find the last time code was pushed to that project was 3 and a half years ago. And like, that’s like, “Great, they solve my problem, but it’s not kept up to date, and there’s no way it’s going to be kept up to date,” is a big part of that consideration too.
Nir: Yeah, exactly.
Andy: Go ahead, Nir.
Nir: So, exactly as you say, the project can be… like the last release can be like few months ago. And if the last week’s release was a few months ago, it can really indicate about poor maintenance of the project. And when I look about open source, I will definitely see when was the last day released. And I will definitely talk about the release notes. If I see like all the release nots contained like, “Fix, fix, fix,” it can indicate improved maintenance. But if I see that new features, kicks in and new capabilities kicks in, it can definitely consider me to choose the project.
And I also will look about, what are the issues that are currently open on the project? How many issues are currently open? And what is the time it takes until someone respond to an issue? What is the time it takes until someone close an issue? All those fans can determine if this project is maintained well or is maintained bad. And I definitely want to choose what is have improved maintenance.
Kendall: Yeah, yeah. Okay, I mean, the other thing just add is like how easy is it to install across the big environment. There’s a lot of tools that are really easy to install in one cluster and real hard to install and 10 or 20 or 50 or 1000.
Andy: I think the last… one thing that we haven’t talked about yet too is licensing. Licensing is super important. It’s not a hot, flashy, fun topic, but not all open-source licenses are created equal. And a lot of open-source licenses can be rather detrimental, especially if you’re trying to incorporate it into a SaaS offering or something like that, it can get you into definite issues down the road if you’re not paying attention to that. And so, filtering out anything that’s GPL license, if you’re planning to use it in your software offering, be careful there.
Kendall: Yeah, yeah, can get you in trouble in offering. Okay, so I want to leave some time for Q&A at the end, and we got a few things to go through. So, I am going to spend a minute sharing the Fairwinds Insights platform. And then, Nir, if you want to talk a little bit about some of the Komodor stuff too, I’ll give you some time for that.
Kendall: We’ll get into Q&A. I’m not seeing lots of Q&A questions come through, but if people have questions, feel free to drop them in there. So, real quick, at a high level, Fairwinds Insights is a SaaS offering from Fairwinds that marries together a bunch of the open-source that we talked about today from Fairwinds side. We focus on security, cost optimization, and policy and guardrails. So, Kubernetes is still hard. It’s that whole problem of, “I want to configure things correctly so I don’t make mistakes.” And that’s where Insight sits, is to help you build same defaults and succeed at it.
So, I’m going to walk you through a real quick demo. I’m not going to touch on everything and Insights, because this is the very quick demo. If you want to see more, get in touch with us, and we can go deeper and talk to you about the specific ways that this might be useful to you and your organization. But when you log into Fairwinds Insights, you’re going to get an overview of common issues across your cluster. But I’m clicked into the cluster view, and I can see a whole bunch of clusters running in my organization. And there’s the tracking of a health score over time. So, you can see how each of these is doing. Some have a 95. Some have a 90 or 94, and 89. My overall organization’s getting an A minus in this.
So, if I click into a cluster, I’m going to get… this is the proxima-prod cluster, I get an overview of the cluster. Anything above the bar here is when action items are introduced to the cluster. So, the new problematic things. And the green things are when we’ve removed things from the cluster. So, the cluster is, in theory, getting better over time. And you can see this health score. We’ve got an A here with only 27 critical issues. But we’ve checked, almost 3000 things across the board here and prioritize them. So, critical, high, low, medium, passing, we track the health score over time. And then there is some cost metrics in here as well, which I’m going to show you in just a second.
So, all of that rolls up into a big, long list of action items. And the action items are we’re pulling a bunch of things from a bunch of reports, which I skipped over, and I’ll go back to in just a sec. And we’re giving you a list of what’s going wrong in the cluster. So, the critical issues tend to be on the security side, but we do give you reliability, efficiency, whatever, all of these different things. So, a label is missing as a high priority, it’s not critical. But you can click on this, it’ll tell you, “Hey, this label is missing in your workload,” or whatever it is. So, you don’t have to be a super senior engineer to click on this. Read the description and read how to go about fixing it so that you can improve upon your cluster.
Now, if you wait until there’s a whole bunch of issues running in your cluster, you’ve waited too long. And platform administrators, system administrators are like, “I don’t want a gigantic list of all the things I did wrong. That’s terrible. Like, if anything, that’s just going to cause me stress knowing that there’s a million things that aren’t working.” So, we’ve shifted this left into the development and deployment workflow, so that developers who are any service ownership environment can own the things that they’re deploying into Kubernetes, see what they’ve configured incorrectly and fix them before they’re deployed. So, you can integrate this in CI, or at the admission controller level.
Here, you can see an engineer trying to push some code. And circle CI is passing all of its tests, but Fairwinds Insights is failing it. So, that engineer already familiar with Git, because this is a world they live in, clicks on Details here, is loaded into the UI. In theory, they’re logged in, which I thought I was, but apparently, I’m not. I’m going to log in real quick. Just a sec. They’re going to log in, and they’re going to get that. That’s not good. This is, of course, the demo gods I didn’t sacrifice beforehand. It’s going to load you into the GUI, and it’s going to show you this. So, here’s the issues that we have. And I can click on this and it’s going to tell me what the issue is and how to go about fixing it.
So, a normal engineer working in a normal workflow suddenly has something that’s going to enforce policy that you can use the policy out of the box, you can write custom policy, but it’s going to keep you from causing problems across your organization. So, I’m going to go back to this organization. I’m going to go back into this cluster. I’m going to show you some of those where we’re getting all of these reports from. And while that’s loading, I’m going to come over here and show you we also have an efficiency tab.
So, this is some of our cost metrics. We break down relative total cost across your workloads, relative daily cost. And then we give you recommendations based on how the quality of service that you know each service needs to receive. So, for example, here’s the Prometheus service. We’ve said it’s critical. You’re currently spending $6.08 per month on it. With our recommendations, you’ll save $1. $1 is no big deal. There’s only one pod running right here. If you have lots and lots of pods of a big service running and you’re wildly out of configuration, you’re going to see really big cost savings. We had a customer come in run this on their cluster very briefly and find a quarter million dollars’ worth of savings, which very quickly justifies the software. So, it’s really easy to get there very quickly.
This is what our recommendation engine looks like. So, we show what you’re actually using versus what your requests and limits currently are. So, we’re trying to spell this out, make it easy for you. And to go back real quick, this is where this is all coming from. So, Polaris, we’ve talked about Nova, Pluto. These are Fairwinds open-source projects. We also do integrate Trivy, which Nir talked about briefly. We have support for the open-policy agent, Kube-hunter, Kube-bench, etc. There’s a whole bunch here. We’re trying to put this all in 1 place, make it easy and accessible for you. If you’re using Kubernetes and you want to have confidence you’re using it correctly, look us up. That’s my Fairwinds Insights exists. Happy to have the conversation. And with that, I want to turn it over to you, Nir, to talk more about Komodor and some of your offerings.
Nir: Yeah. So, I’ll share some of my Komodor Insight. So, as I said before, Komodor try to help our users to talk through the problems when they happen. And more than that, we like to discover the issues before our users discover the issues. And this is the reason why we launched our playbooks and monitors. And this like configure monitor for his own needs. And the monitor can be, for example, a node monitor that trigger every time that you node has become like [inaudible] or some unknown state. And when you know like become the state, it will give event to Komodor with a given a threshold from the user. And then we want like some playbooks, means some a variety of checks on the nodes. Like, is the node already committed? What is the CPU usage of this node? Are there any ports that are evicted on the node?
And generally, the purpose of this jacks is to see what is the problem. And we also want to give what is the solution. So, we give like an end-to-end solution to the users. And some incident is happening in Kubernetes, and then Komodor detects the incident and allows it to the user via Slack, Teams, or Genie, it doesn’t matter. The user will note about the incident. He will dive into coding model. Meanwhile, we will try like to uncover the root cause of the problem, and then to provide a simple instruction how to solve it. So, you have the problem, what causes it, and the solution. So, this is generally about playbooks and monitors within Komodor. More than a node monitor, we also want a variety of monitors that will add it to the workload related to PVC monitors, and many others. So, this is about Komodor playbooks and monitors.
Kendall: Great, thank you. And, well, we had 1 question come through. So, we’ll get to that in just a second. But let’s, before we dive into the Q&A, let’s do this polling question. Jared, if you can pull this up, what’s the greatest opportunity to improve your Kubernetes environment? So, do you just need help getting help with the basics? Are you looking for general best practices, improving the security posture, saving money, improving the reliability of apps running in Kubernetes? What do you need to see improvement? And I’m going to come over to this poll tab. It should be up on your screen, but you can go the poll tab too and respond. I’ll see in this poll, responded poll. I want your answers. Oh, there we go, 11 answers. Okay, they do… there’s a delay for me for whatever. It takes a minute then they put her out. Yeah, they’re coming in now. Jared, for whatever reason, it’s not updating live on my screen, and maybe it is on other people’s.
Jared: Yeah, I’m only seeing that thing in the poll tab, not on the live screen. I’m not sure why.
Kendall: That’s okay. Yep, there we go, in and out. Okay. So, we’ll give that another second to come in. Looks like, again, we’re at the top of the bottom, improving reliability and getting help with the basics, the 2 biggest categories right now. With your answers still coming in? Sorry, should not be yawning on this. Whew, I’m boring myself. That’s bad. I gotta pep myself up. It’s lunchtime. Okay. Well, we’ll wrap up that poll, because 5, 4, 3, 2. Okay, here’s more of the answers for folks’ sake, improving the reliability of apps running in Kubernetes. I guess that aligns with why you’re here. Although I’m surprised to see no one saying improving the security posture of your cluster. So, security and reliability are pretty intimately intertwined, just to keep y’all tuned into that. But anyways.
Okay, and we have… let’s see, can we close this? We do have a Kubernetes Best Practice Guide available, if you want to go check that out. This is a white paper that Fairwinds put together that gives you just a high-level overview of some of the best practices on the box, covering security, cost optimization, reliability, policy enforcement, monitoring and alerting. And this link will be shared with you, so you don’t have to write it down in a hurry. And then Komodor has a DevOps handbook for Kubernetes areas. Anything to add about this, Nir?
Nir: No, it’s a great book, and you can download it for free.
Kendall: Okay, and so both of these resources will be sent out, so you don’t have to scramble to write those things down. But we did. I do want to have just a minute to get to, yeah, and Dave will share the link to the white paper and such. Okay, 2 of the questions that have come through, like first one is, “Where do I get started in knowing how to get started? Like, where do I get started with the Kubernetes open-source projects that we just shared?” So, I think they’re somewhat obvious. You can go to Komodor’s website. You can go to Fairwind’s website and probably find any of these open-source projects. Also, check out our GitHub. I’m assuming it’s the same for you, Nir. Although the ValidKube is like a website you go to and plug in if you want to, right?
Nir: Mm-hmm, exactly.
Andy: We also have a Slack community, for our open source as well.
Kendall: Yeah. So, if you have questions, and you want to pop in there and meet some of the people that are also using it, go do that with the Fairwinds side. And then this question came through said, “What Kubernetes resources does Komodor cover?” So, I think maybe they’re asking specifically about ValidKube, but I’ll leave that to you, Nir.
Nir: Yeah, I think ValidKube generally covered all the resource that exist on Kubernetes. It depend on the tools that you want. But within Komodor, we currently cover most of the resources. And now, we’re doing the period in order to cover like all the resources that exist in Kubernetes, and that are meaningful for the users. It can be workloads, storage, network, basically everything that you need to troubleshoot and to see what is happening on your cluster.
Kendall: Okay. And with that, I will wrap up and hand it back to you, Jared, to wrap this up.
Jared: Alright, that was outstanding. I’d like to thank you, Kendall, Andy, Nir, for taking the time to join our discussion and share your expertise with us. Quick reminder that today’s session was recorded. Following this panel, you’ll receive an email with a link to access the on demand. You can also find the recording on devops.com and container journal.com website. You just go to devops.com/webinars or containerjournals.com/webinars and look in the on-demand section of both of those websites and you’ll see it waiting for you there.
Now, on to the winners for our $25 gift card drawing. We’ve got Naresh T., Musa C., Kelly V., Maria V. Congratulations to the 4 of you. Please keep an eye in your inbox to claim your gift card. If you did not receive an email, please be sure to check your Spam folder, make sure it’s there. You can also reach out to me. Email should be… my email should show up when you receive the link to the on demand. Kendall, Andy, Nir, again, thank you so much for taking the time to be with us today. If you have any final remarks, this is the time to do so.
Andy: Thanks for having us.
Nir: Thank you, everyone.
Jared: Alright. And I would like to thank Fairwinds for sponsoring today’s webinar. And I would like like to thank each of you for joining us today for the entirety of the presentation. If you don’t mind, please take a moment as soon as we end the session today to take our survey. It’s quick 3 questions. And tell us your thoughts, your feelings, and anything that you want. Anyway, this is Jared Harris, I’m signing out. Have a great day.