Komodor is an autonomous AI SRE platform for Kubernetes. Powered by Klaudia, it’s an agentic AI solution for visualizing, troubleshooting and optimizing cloud-native infrastructure, allowing enterprises to operate Kubernetes at scale.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Guides, blogs, webinars & tools to help you troubleshoot and scale Kubernetes.
Tips, trends, and lessons from the field.
Practical guides for real-world K8s ops.
How it works, how to run it, and how not to break it.
Short, clear articles on Kubernetes concepts, best practices, and troubleshooting.
Infra stories from teams like yours, brief, honest, and right to the point.
Product-focused clips showing Komodor in action, from drift detection to add‑on support.
Live demos, real use cases, and expert Q&A, all up-to-date.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Who we are, and our promise for the future of cloud-native.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Discover our events, webinars and other ways to connect.
Here’s what they’re saying about Komodor in the news.
Join the Komodor partner program and accelerate growth.
Itiel Shwartz: Hello everyone and welcome to another episode of the Kubernetes for humans podcast. Today with me in the show I have Stefana. Stefana, do you want to introduce yourself?
Stefana Muller: Sure, Itiel. Thanks for having me. I’m Stefana Muller. I run infrastructure at Salesforce. Coming from the OwnBackup acquisition. Many of you might have heard of that 2 years a year and a half ago. Nice to be on the podcast.
Itiel Shwartz: So Stefana, maybe like let’s start with a bit about you. Like what’s your background? When did you get into tech? When did you get into ops or operation? And how is life in Salesforce?
Stefana Muller: Well, that’s trying to rewind a couple years back to remember where I came from. But yeah, definitely can give you some background. Been working in the industry about 26 years. I keep counting up. Now at one point I should stop counting I think. But it’s been a long journey pre-cloud and now you know cloud and now AI. I mean there’s many transitions that have happened throughout. My journey came from technical support in the beginning and then on to product management, product management and infrastructure, security engineering and now over into infrastructure for the past 6 to 7 years now. I came prior to worked at some healthcare companies. I’ve worked at large enterprises. I’ve worked at small startups. And now I am at Salesforce after acquisition of a great product company called OwnBackup. So at Salesforce it’s been a big journey and a shift I guess you could say where we were running everything ourselves at Own 15 regions across the world, 52 clusters which is quite large for small company. Lots of data, petabytes of data backed up every day. Now we are a part of a much larger organization, much larger complexity. And and we’re even I think even more enabled here at Salesforce to use AI on a regular basis almost like a pair programming. So it’s it’s a bit a bit of a shift for my career. Totally changing from being you know I guess you could say the center of infrastructure now just a part of it. But the cool part about it is that I’m able to kind of explore in a lot of different ways. Efficiencies, ways to enable my team to take to explore different ways to use AI in our infrastructure today.
Itiel Shwartz: Okay, no that’s like one of the eight more interesting stories I think that we have like technical support to product to product to operation like that’s a that’s quite a journey.
Stefana Muller: Trajectory, right? It shouldn’t have gone that way. But I really started in not that common. Yeah.
Itiel Shwartz: And so before we jump into the Salesforce day maybe share a bit about OwnBackup like a bit about like what challenges what were the problems like obviously you did backup, right? Like that was the product. Like one can only one can only assume
Stefana Muller: That was the original name of the company. Yep. So Own was we actually changed the name to Own because we ended up doing more than backup. Go figure. So we had to change the name. No, that’s okay. I didn’t mention it earlier. It’s interestingly cuz the product initially was trying to focus on backup to backing to focus on the SaaS providers out there. So Salesforce, you know, Microsoft, Workday, and other SaaS providers, other SaaS providers and helping customers who are using the SaaS offerings to backup their technology so that they have an offsite backup away from the SaaS providers hosting to backup and and be able to position restore data back when things went wrong. Because as you can imagine things go wrong often in those environments. You have thousands of users using the portal to log their tickets, to use CRM, whatever it is. And you know, having having mistakes happen on a regular basis where a precision repair and ability to actually restore just one line or one object or one thing really is very resource are useful for an a company. Well, we expanded into archival solutions so that they can remove some of the old data from their environment into a offsite backup if you would. But we made that archive accessible so that customers could not just archive their data but also see it. They can actually go query that data. So that the archive isn’t just archived and and shipped away for audit purposes but able they’re able to look at it. That kind of gave them the confidence to archive their data, right? If you can imagine. Oh yeah, let’s take the last 10 years of your data and get it out of your production system. That’s a scary proposition for many customers. So that was a that was the trajectory. And we also had a seeding product that helped you seed data into your systems for testing and and proof of concept and things like that.
Itiel Shwartz: I can go all into Own Own’s product line but I think the product aspects are interesting but let’s talk a bit about our resiliency. Like in the end you said you had like 15 different regions, right? Yes, the reason is like maybe like resiliency. So maybe if you can share a bit about the challenges that you had setting it all up, problems maybe where did Kubernetes fit in? So yeah.
Stefana Muller: Absolutely. So I think what I would say is resiliency was one reason for the multiple regions and multiple environments. But also for data residency. So a lot of our customers say they were in for example in Japan, right? They want their data to remain in Japan. So if they’re primarily working in Tokyo, they want offsite backup in Osaka. They do not want to leave the country. That is sometimes it’s law that requires this and sometimes not. So that is why we had to have many regions around the world. We’re servicing a global customer base who wants their data to reside in the country of their choosing. We also had to not just do AWS work because not every customer of ours wanted their data backed up to AWS. Maybe they were competitor, maybe they standardized on something else. So we had an option to backup to Azure. So we run our environment on both both clouds now. We run it in AWS and Azure. So that multi-cloud challenge became even more challenging before we were using Kubernetes everywhere. So what ended up happening is that we had a sprawl of EC2 instances, lots of expense, inability to easily scale. And we knew we had to move to Kubernetes and this is part of my journey through Own. I was helping them move there throughout that that journey. By the way, we’re still doing it. Don’t tell everybody. Oh wait, we’re on the podcast. We are still still struggling with a few areas. Not struggling but they’re really gnarly areas. I call them gnarly. That’s a good California word. But basically it’s difficult. Something that’s got a lot of mix in it. A lot of spaghetti code that we’ve got to untackle unravel before moving it over to Kubernetes to make it efficient. So I guess the big thing that we did is I think by moving to EKS and AKS we were able to standardize our stack across clouds. And that made it a lot easier to maintain, less code for for our application teams to think about. They don’t have to think about the substrate. They don’t have to think about the cloud. They’re thinking about what they’re running. And basically both clouds can handle the same the same effort if you would. So that’s kind of where we started.
Itiel Shwartz: That makes sense. No, it does.
Stefana Muller: I can go on though. I would tell you some of the challenges we faced after.
Itiel Shwartz: No, I guess you did. But maybe let’s do like a quick jump to like the Salesforce day. So everyone is happy. There was a big acquisition, right? And then usually like comes the like the not so like fun part of integrating between product or infrastructure. So like did you like do you still have autonomy? Is everything runs in some Salesforce cloudy thingy? And maybe if you can share a bit about like the technical integration basically.
Stefana Muller: Yeah, so the first start of integration at Salesforce was very focused around let’s make sure you’re secure enough to for us to call you Salesforce, right? They wanted to take a look at what we were running. They weren’t asking us to change yet. But doing that audit, that due diligence, we did red-team testing. We did a ton of things to identify any holes in the process. We even had a bug bounty which was a really cool event that found a lot of stuff ones that we were able to close. Thank goodness. I think that was a really when you’re a startup you don’t have the power behind that. Salesforce had the power behind it. So they were able to put a hundreds of people in a bug in a bug bounty to find those bugs for us. That kind of leveled us up. And before we were allowed to be sold on what we call Salesforce contracts, we had to level up our security level. And I’m not saying our security was low. It was they had specific standards that they wanted us to abide by and we agree with them. So, that was the first task. The second task was
Itiel Shwartz: How much time did you took? I must ask. Like, it sounds like a big task. So, the first task that I just talked about was I guess we got acquired in November. We were We were finalized by March the year after. So,
Stefana Muller: Like basically November to March, it was a very short timeline. Hey, I will say I’m going to give my team a pat on the back. They were We were pretty good. We’re one of the easier acquisitions to acquire, and I’m going to plug our our former our founder, Ariel Burkman, because he was the head of security at OwnBackup, and he was the one who kind of kept us whole. I will say that it was really in a good space. Maybe it wasn’t as scalable. That’s the thing that that Salesforce wanted is if you’re going to do security in this way, it must be scalable. You can’t have a single person watching logs, right? You need to have automation here. And that is what we focused on the most. And in the past year since then, since that March last year to now, we’ve been working on adopting the tools that Salesforce uses across their enterprise. This not only allows us to integrate with the very the large security presence like across the team, but also learn about their standards in what they call FKP or Falcon Kubernetes Platform, perhaps. I actually don’t know what the acronym stands for, but basically it’s their Kubernetes infrastructure in at Salesforce. So, the eventual move for our team is to take everything we’ve done here and the Own infrastructure and migrate it over to the Falcon or what they externally call Hyperforce. Hyperforce is that trust layer that that that Salesforce talks about all the time. That is the infrastructure layer of the trust. That’s where the standardization exists. so, that’s kind of what my team is doing this year. So, last year we focused on adopting all the tools, you know, I you know, some other things we focused on. We implemented Karpenter in our EKS. We’re working on it in AKS as well. just for proper scale, we are also implementing a few other tools for cost efficiency in Kubernetes. It can get quite expensive, especially moving from from EC2 instances that were pretty easy to kind of scope. it can sprawl over in Kubernetes lands. so, there’s there’s been a lot of little tweaks that get us to that scalability that’s necessary, and I think one of the big reasons for that is instead of that small sales team at Own selling our product, we now have the force of Salesforce selling our product. It is expanding rapidly, and it and we have to keep up. I can’t be building new regions every 2 minutes. I need the existing multi-tenant infrastructure to scale automatically.
Itiel Shwartz: That’s like when you were talking about scale, like how much did you scale if you can talk about like the Is it legal? I
Stefana Muller: I can say some numbers. I can’t talk about financial numbers, but I can talk about Essentially some of the public numbers for customers. We went from 7,000, which took us 10 years to get to, to over 11,000 customers. So, that’s that’s a big scale in 1 year, right? Yeah. I think the other things to keep in mind that I can talk about is the number of clusters. So, we went from about 18,000 clusters to 52,000 clusters. Yeah. Okay. Very That’s a lot of clusters. I think also you we have the complexity of our FedRAMP environment, so we also have a FedRAMP moderate environment that we have to manage separately, so that made it a bit more complex needing the folks to focus on the security of that environment and make it make it the next step of FedRAMP high, which is the standard at Salesforce. So, I think I can’t really give too many numbers. I would love to, but I know Salesforce will be like, “Stefana, what are you doing over there?”
Itiel Shwartz: No, no, no. We don’t want you to We don’t want you to to get into trouble just for the podcast.
Stefana Muller: I will mention it’s with the same number of team members, right? So, scale of team did not occur. So, my my goal was to figure out, “Okay, we have a lot of toil. How do I get rid of all of that so I can get my team to focus on the really important parts?” And that has been a big journey for the team. You know, I don’t want them cleaning nodes manually or restarting servers. By the way, they did do a lot of that in the beginning. but now, you know, I don’t want them restarting servers in the middle of the night. Why is the web server down again? Restart it, you know, I’m teasing, but that’s kind of a something that we get paged for. Now, you know, when we’re moved to Kubernetes, that no longer exists. It’s no longer It’s very resilient. It’s auto you know, it’s automatically restarting.
Itiel Shwartz: With AI, it solves a lot of problems and bring a couple of new ones as well, right? Like that’s life. And and with that, I think like it’s a good segue to talk a bit about AI, right? So, you said that like in Salesforce, they are like pushing you maybe even more enabling you even more to use AI, but please please tell like where are you using AI right now? Yeah. Where do you think they you are going or like where are you heading in your like AI usage when we are talking about 2026, 2027?
Stefana Muller: So, where are we right now? it’s actually quite exciting at Salesforce right now. I As a as a VP of in of infrastructure, I wouldn’t usually get to touch AI tools on a regular basis at in a startup. We don’t have the funding for it. Salesforce has opened it up, and they said, “You can use it. Use whatever you need to to efficiently grow you know, your your your velocity.” So, I am excited because I am using even myself. on my team daily is instead of using AI like we were 6 months ago where we would ask a question whenever we had a problem, and it would respond back, we have shifted over the last even month to focus on pair programming with AI. And I think that’s a mental shift, but it’s also an operational shift daily shift of how do you work? Instead of working with the Oh, let me ask I this AI this question that I got stuck while coding. My team doesn’t start coding until they tell AI what the project is. So, we start with the project context. Most of my team, I hope all of my team is now moved over to Claude Code. That is seen to be the best harness for the team so far. Not only because of the different models that are available under Claude, but because it has so many I would call it attachments. There’s many MCPs I can use. There’s many skills I can use. I can build skills very easily. By the way, I built my own skill for Claude Code if anyone wants to check it out. but you know, this is it’s it’s really what I said it to somebody yesterday, AI has democratized who can actually touch infrastructure as well. So, it’s enabling a lot of team members to cross boundaries that they didn’t cross before. So, some SREs are touching infrastructure directly, and some DevOps are touching SRE work and doing troubleshooting. I think it’s going to change the titles in the future, right? I’m not ready I’m not yet there, but it’s changing team structure. It’s team changing it’s unblocking team members to move faster, which has been great. The other key thing we are doing that I don’t think everyone realizes is so important is documentation. I mean, maybe we realize it’s so important. We do realize that. I think we hear about it all the time, but it’s not just a stale wiki or Confluence page that we’re worried about. It’s the documentation in the code. So, one of the key things AI does is it makes it very easy, one, to document old code and figure out what happened there. Why did this happen? When did it happen? Who did it, you know? Instead of doing all of that work manually. And it also helps us automatically force documentation in our new code. So, in infrastructure, that has been huge because you know that the biggest problem I have is usually having that one guy over there that knows everything. Let’s call him Alon Ofek. He might be on my team. I don’t know. He’s my He’s my director of DevOps. Anytime something goes wrong, the guy can’t go on vacation. I’ve got to call him, right? but now it’s very different where we’ve put a harness within Claude that that watches our repositories and doesn’t allow code check-ins unless it has a specific documentation structure. So, it’s it’s the team only needs to ask Claude to document. They don’t need to do much, right? Which is great. And now Claude is enforcing it. So, we’ve put a little enforcement in there. So, those are two I I just gave you two of the big great things that we’re doing here. Of course, there’s the challenges of as well, but those are like two of the big big great things that I feel my team is accelerating on with AI today.
Itiel Shwartz: And where are we going? Like, you know, you talked about what is happening right now, which is I want to say like enablement, right? Or empowerment of of the different team members. Where are we heading? What’s stopping us? What’s not stopping us? Where are we heading?
Stefana Muller: If I could predict that, I don’t think I would be just working at Salesforce as some person, right? I don’t know. I will tell you every day I think we’re headed in a specific direction and I push everyone there and that’s kind of my job, right? Is to get everybody on the same bandwagon and move everyone forward. Okay. And then we find something new. The models are shifting so quickly that they’re actually surprising me, right? I and I haven’t been this surprised in many years and then remember I went through from, you know, from no cell phones, I started my life with no cell phones to pagers, you know, to blackberries. So I started my life pretty early in this process of technology. So I think the thing that it’s hardest is figuring out where we’re going, what we want it to do. Can I I’ll I’ll comment on what it we want AI to do for us. I would love to see more secure ways of trusting AI to make production changes. Today I still need that human in the loop and sometimes putting the human in a loop is the bottleneck because I ask the AI to find something, right? It’ll find it’ll do troubleshooting, it’s grabbing a bunch of logs, it’s it’s figuring out the root cause and it comes up with a solution but I’m not yet ready to trust the solution. Mhm. and that’s because we’re in a highly regulated environment. I hold customer data very closely. It is like my baby, you know, I do not allow that customer data to touch AI or rather AI to touch that customer data. So we have to be very careful of how we implement and what guardrails we put in place. that’s the one thing I wish we had better solutions for on the security front and I’m seeing it happen pretty quickly. Companies are coming in and going, “Hey, I found a way to find secrets in code.” And very quickly they’ve had some point solutions but you know how the market shifts when something new happens? It sprawls. Everyone’s got a solution for every little point thing and then it condenses. We’re not at the condensing part yet. So it’s hard for me to figure out which point solutions I not just need but want to invest in. And that is that’s what I’m looking at now is what do I want to invest in? it are they third-party solutions? Are they open-source solutions? And because it’s rapidly changing and TL, I’m I’m done here. I’ve I think my brain can’t keep up with the rapid changes. No, that’s not the only thing. That’s what I wish I had. I wish I had a a road map. You know how we had the road map for the perfect DevOps environment or, you know, CNCF put stuff out. It was great. They had like a vision. The vision keeps changing and I’m doing my best, right? But I think that that is my that’s my goal. My goal is to figure out how I can trust more trust AI in in production or with troubleshooting and fixing so that they that it doesn’t wake people up at night, right? So PagerDuty doesn’t have to go off. It’s just a an alert in the morning to let you know this happened and that would be a really cool shift because I don’t know of any engineer that enjoys waking up at 2:00 a.m. Unless it’s for fun.
Itiel Shwartz: I tend to agree. I really tend to agree. And like how does it work with the broader Salesforce? Like are you now talking about your own department or there’s some huge Salesforce infra, I don’t know, like committee, mandate, I don’t know.
Stefana Muller: So I you know, I can’t talk too much about global all of Salesforce just because I’m sure that there are probably public information being shared soon. what I can say is that the entire engineering infrastructure and security organizations, which are two separate organizations, there’s infrastructure and security and then there’s engineering are fully empowered to use Claude Code on a regular basis and to try out ways to remove toil. So that’s absolute. Everyone is expected to have a story and rapidly. And let me be clear, they want it rapidly. They don’t want you to wait 3 months, wait for release. This is no longer are we waiting 3 months for releases, right? We’re a very large company. Releasing is difficult with this very large company but they don’t want us to innovate with AI in months time frame. It’s days and weeks and even hours in some cases. We we have a council at the you know, we have an organization that runs the AI strategy for the company and you know, we also build AI tools. So we’re not just the users of the tools. We also build them for the industry, right? Agentforce and and our and Slackbot, which by the way is like my best friend. I don’t know if you have access to Slackbot but since it has all the context of what I’ve been doing every day it pretty much is handling my day-to-day. I no longer have to have that overhead of of thoughts. It prepped me for this meeting in case you were wondering. So I thought I don’t know if I could live without it anymore. It does that have you seen that as well where, you know, you start using these tools and you’re like, “Wow, how did I ever manage my calendar?”
Itiel Shwartz: I’m completely with you, too. I’m completely with you, but you know, like like you said, like even I remember times without cell phones, right? And now I can’t like imagine like a year without a cell phone, right? Like not even a day, maybe. so so yeah, like things change. Stefana, any last word like to to wrap it up like Yeah. I would
Stefana Muller: I think the big thing that that’s kind of out there and everyone’s been talking about it for years now since AI became so rapidly expanding is like what’s happening to our jobs, right? And I think I think it’s really an important cuz you’re talking about Kubernetes for humans. Let’s talk about SRE and and the role of engineers, the infrastructure engineers in in this AI space and I like to always say that AI is really great at automating things away and making its own decisions but it can’t own the consequences. It’s not accountable for those things. So I will implore everyone to remind themselves that they provide this great context and they also are the accountable parties when using an AI tool. so no, I don’t think AI is taking the job of SREs and DevOps. I think it’s removing the barriers from those team members to do more in their job. I can I no longer have to ask the application team when there’s a code problem. I can actually get AI to help me troubleshoot. so it’s really kind of interesting. I’m becoming more of the driver’s seat of of the environment instead of somebody is a passenger with AI at the helm.
Itiel Shwartz: Okay. Yeah. Okay, with that I think we will wrap it up. Stefana, thanks a lot for like being here and being in the like SRE conference. I’m quite sure like when this is going to air this episode, the event already happened. I’m I’m sure it’s going to be a successful one.
Stefana Muller: Yeah, I’m really excited for the conference. So looking forward to it. It’s just in a couple weeks.
[Music] Kubernetes for Humans.
This is an AI generated transcript of the conversation
Gain instant visibility into your clusters and resolve issues faster.
May 12 · 9:00EST / 15:00 CET · Live & Online
🎯 8+ Sessions 🎙️ 10+ Speakers ⚡ 100% Free
By registering you agree to our Privacy Policy. No spam. Unsubscribe anytime.
Check your inbox for a confirmation. We'll send session links closer to May 12.