#005 – Kubernetes For Humans Podcast with Gwen Shapira (Nile)

[Intro Music]

Itiel Shwartz: Hello, everyone, and welcome back to another episode of the Kubernetes for Humans podcast. Today on the show, we have Gwen Shapira. Gwen, I’m super happy to have you here.

Gwen Shapira: I’m super happy to be here. I love the podcast name, Kubernetes for Humans. It hints at the rest of the Kubernetes content out there.

Itiel Shwartz: Yeah, Kubernetes is often seen as something meant for very smart people when it comes to the content itself. First, introduce yourself, and then we can dive into the Kubernetes world.

Gwen Shapira: For sure. Hi, I’m Gwen. Right now, I’m co-founder of a company called Nile, where we’re building a platform, mostly a database, for people who are building software service products. We decided to do this based on our experience in our previous company, where we had to turn an open-source project into software as a service. We discovered all the ways in which this was much harder than it should have been, and of course, Kubernetes was heavily involved. Before that, I worked on large databases running in data centers for large companies, like Hadoop, MySQL, Oracle, and others.

Itiel Shwartz: Was your expertise really in building the database itself, or was it more around the surrounding tooling?

Gwen Shapira: I didn’t write MySQL algorithms, but I did fix MySQL bugs a few times. My focus was mostly on the usage and tooling around it. Today, you might call it platform engineering. A lot of it was architectural, like data modeling, scaling out versus scaling up—problems that are always relevant.

Itiel Shwartz: How did you get into software? Was it a passion from school or something else?

Gwen Shapira: It was a bit of a passion in school. I liked engineering and building things. I was fortunate enough to have a computer at home from a relatively young age. A neighbor, who was a computer science professor, gave me my first programming book when I was around 10 years old. I learned a bit of Logo, where you had the turtle walking around the screen. But I have to say, it wasn’t my dream growing up to be a software developer. I actually failed to get into medical school after trying pretty hard. They were very selective programs. So, I needed a plan B, and my Jewish parents were like, “Okay, if not a doctor, then a lawyer, and if not that, what else pays well?”

Itiel Shwartz: Being a lawyer is like a good thing in a Jewish home as well, right? My father is a lawyer, so I’m in that stereotype too. But you know, like an engineer—my uncle is an engineer. It’s like one lawyer, one engineer, one doctor.

Gwen Shapira: Yeah, I don’t remember engineering being a big deal growing up, but my younger brother is a lawyer, and my dad is a doctor, in case it wasn’t obvious.

Itiel Shwartz: So, you worked in a couple of different companies. When and why did you start working with Kubernetes? What was the context around it?

Gwen Shapira: I joined Confluent when it was very small—engineer number 13. They hired engineers 11, 12, and 13 on the same day. When I joined, there was already a product, because of Kafka, and it was really about helping companies use it and building an ecosystem around it. A few months later, Confluent realized we needed a cloud solution. We couldn’t just be an on-prem company; that wasn’t the future. We brought in some engineers who had worked on Kafka on Kubernetes at their previous startup. At the time, running a stateful service on Kubernetes was considered insane, but we wanted to solve for the future, not the past.

Itiel Shwartz: So, just for context, Confluent is the creator of Kafka, right? You took it from LinkedIn, open-sourced it, and made it big, like the commercial product.

Gwen Shapira: Exactly. When we started our Kubernetes cluster, the goal was always to host Kafka for other users as a service, not just for internal use. Later, we took what we built, like our operator, and made it part of the Confluent offering for on-prem.

Itiel Shwartz: I wanted to ask about platform engineering and building something for other people. Was there a platform team at Confluent?

Gwen Shapira: Absolutely. From day zero of the cloud, there was a platform team. Initially, it was just two people who we hired opportunistically. They had already worked on Kafka on Kubernetes and became our platform team. As soon as you tell anyone in your company that you’re running your product on Kubernetes, you’re the platform team. We worked closely to discover things together. Over time, the platform team grew, and by the time I left, it was probably more than half of engineering—around 200 to 300 people.

Itiel Shwartz: That’s a huge growth rate. Can you share some of the challenges you faced in those early days?

Gwen Shapira: In the beginning, the biggest challenges were technical, mostly around Kubernetes behavior being surprising or not a good match for what we needed. Kubernetes makes everything look easy, but we found that even small changes could cause major issues, like an entire cluster network restart. We didn’t expect it, and it was all around networking. Synchronizing when Kafka believes a machine is available and when Kubernetes believes the machine is available was one of the biggest challenges. It turned out that Kafka was very specific about when it had network access, which didn’t always align with Kubernetes’ timing.

Itiel Shwartz: Most companies, even our customers, don’t run stateful things inside Kubernetes. I think it’s still one of the biggest taboos in Kubernetes, even now. But your team at Confluent built around having state in Kubernetes. What did you learn from that? Is running stateful workloads in Kubernetes going to be the next big thing, or is it already big?

Gwen Shapira: It’s interesting. I didn’t know it was still a taboo. I knew that when we started, it was, but Kubernetes has gotten better over time. A lot of things improved because there was demand to run stateful workloads. Realistically speaking, if you have a stateful workload, what are you supposed to do with it? We know how Kubernetes works, and we like it. I’m not going back to writing my own bash scripts. I think there’s demand for it, and companies and open-source projects are stepping up to meet it. I do wish companies like Google invested more time in making Kubernetes better for stateful workloads instead of telling us not to do it, but the community has done the right thing.

Itiel Shwartz: I think it makes total sense that if you’re running all of your applications, you’d run the state part inside Kubernetes too, because it’s part of your application. Cloud providers, on the other hand, don’t really have a good incentive to push that, because it makes their unique IP less dominant. They want you to use their managed databases. But we run only caches on AWS; we don’t have our main database on Kubernetes. The early warning signs were that you should treat your application as cattle, not pets. But my database is actually my pet.

Gwen Shapira: Exactly. You want to take good care of it because it’s your database, you love it. If you’re running your database yourself, you’ve already invested a lot in backup scripts, replication, failover, and tuning. I wouldn’t say moving to Kubernetes doubles that investment; it just shifts parts of it. The other option is to offload that responsibility to a cloud provider. You’re willing to pay the highly exorbitant fees for something like Aurora because you don’t want to do the pet care and feeding yourself.

Itiel Shwartz: The ideal was always serverless, right? You don’t want to care about the resources or the database; you just want everything to work. We use AWS RDS for Postgres, and I’m happy with that, but if someone offered me a better database at a reasonable price with no downtime, I’d pay for it. So, what should they do?

Gwen Shapira: This is always fascinating to me. When I managed databases and later Hadoop for large companies, I’d ask them when they were moving to the cloud, and the storage part was a big concern. They’d worry about not having control over the storage, but these companies had serious storage issues on-prem too. Hadoop would break down every single day at 3 a.m. because someone else using the same storage array was taking a large data backup. If you’re better at running Postgres than AWS, keep doing it. Consider selling it as a service. But I don’t know that it’s true for most companies.

Itiel Shwartz: I can be good at managing Postgres, but I really don’t want to be. I think it’s similar for Kubernetes. At Confluent, we had good Kubernetes experts, but even there, we eventually considered using managed Kubernetes. I’m not sure how it ended up, but it’s a legit thing to check—what am I paying for, and what am I getting?

Gwen Shapira: The interesting part is if you’re a platform team for a company, what’s your obligation to the application team? If I, as an application owner, could get better service by going to a vendor, you’re maybe not doing a great job as my platform team.

Itiel Shwartz: That’s completely true. There’s always a healthy competition between what the platform team provides versus everything else.

Gwen Shapira: Absolutely. So, maybe I’ll share a bit about my startup. It relates a lot to what we’ve been discussing. We’ve been inspired by Kubernetes and the ideas around it about how platforms should behave. The expectation today is that platforms will give you APIs, they will reconcile things, they will be self-healing, and they’ll provide information that allows developers to fix things when they’re broken. But they’ll also hide problems that developers cannot fix.

Itiel Shwartz: Can you explain more about how this idea of self-healing and reconciliation loops inspired your startup?

Gwen Shapira: We’ve seen that people who really bought into Kubernetes—and it’s an increasing number of companies—are willing to let go of a lot of control. My one big lesson from Kubernetes is that when you’re using a platform, it’s better not to micromanage it. When I tried to place all my services exactly how I thought was optimal, it didn’t work well. But when I trusted Kubernetes to manage those details within certain constraints, I ended up with a more stable, reliable architecture. Kubernetes was smarter than me about a lot of things, and I think a lot of people have learned that. Developers are now more willing to trust the platform to manage details that they were forced to care about in the past but maybe shouldn’t have.

Itiel Shwartz: I think the key word here is “trust.” Once you trust that Kubernetes is reliable enough to handle networking, storage, and so on, it’s much easier to move your company to use it. That’s really the key here.

Itiel Shwartz: To conclude, I always like to ask my guests for a prediction. Where is Kubernetes headed? Three years from now, we’ll look back at this episode. What do you think will happen in the ecosystem?

Gwen Shapira: I feel like I’m going to make the prediction everyone is making and that I’ve been waiting for it to come true for the last three years. Maybe in the next three years, it will finally happen. We always say that Kubernetes is a platform for platforms, and we’ll see more being built on top so that companies can ignore Kubernetes. Large companies will have a platform team, and application teams can ignore Kubernetes. So far, I haven’t had the best luck in ignoring Kubernetes, but who knows—maybe in three years, someone starting a new company can build on Kubernetes without knowing all the details we were forced to learn.

Itiel Shwartz: That sounds good. Any last words?

Gwen Shapira: Keep building cool stuff. My guess is that everyone listening is a developer. Let’s build really awesome products and make something amazing.

Itiel Shwartz: I couldn’t agree more. Gwen, it was a pleasure having you on the show, and best of luck with your startup.

Gwen Shapira: Thank you, and good luck to you too.

[Outro Music]

Gwen Shapira is the Co-Founding CPO at Nile. Previously a system architect at Confluent (employee no. 12!) helping customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of “Hadoop Application Architectures”, and a frequent presenter at data-driven conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Itiel Shwartz is CTO and co-founder of Komodor, a company building the next-gen Kubernetes management platform for Engineers.

Worked at eBay, Forter, and Rookout as the first developer.

Backend & Infra developer turned ‘DevOps’, an avid public speaker who loves talking about infrastructure, Kubernetes, Python observability, and the evolution of R&D culture.  He is also the host of the Kubernetes for Humans Podcast. 

Please note: This transcript was generated using automatic transcription software. While we strive for accuracy, there may be slight discrepancies between the text and the audio. For the most precise understanding, we recommend listening to the podcast episode