#006 – Kubernetes for Humans Podcast with Nick Jones (EscherCloudAI)

Itiel Shwartz: Hello everyone, and welcome back to the Kubernetes for Humans podcast. Today, I have with me Nick Jones. Nick, do you want to introduce yourself?

Nick Jones: Sure. I’m Nick Jones. I’m currently employed as the Head of Kubernetes and Platform Engineering at a company called EscherCloudAI, where we build cloud services with a focus on sustainability. All our servers are immersion-cooled, and we have some cool tricks to reclaim heat and energy from the cooling process. My team is responsible for building a managed Kubernetes service so that people can do their AI, ML, or any other science on our GPUs. I’m also an Open UK Ambassador and a CNCF Ambassador, so I’m involved in those projects as well.

Itiel Shwartz: That sounds great! Before we dive in, let’s talk a bit about your background. What got you into computer science, and how did that journey lead you to Kubernetes?

Nick Jones: I’ve been in the industry for a long time, about 25-26 years at this point. Like many people, I got into computing thanks to video games. I was at university, not exactly studying computer science, but computing was part of it. My final year project involved developing a virtual hotel where I was rendering hotel rooms. To speed up the rendering process, I heard that you could link multiple computers together, which led me to discover that I enjoyed getting computers to talk to each other almost as much as the 3D modeling itself.

After university, I started working for an automotive development company with tons of high-end 3D workstations. I was doing a lot of online gaming back then, particularly Quake, and when ID Software released the Quake 3 test for Linux first, I thought, “Why not install Linux at home to play it ahead of my friends?” That’s how it all started to snowball.

Over the years, my career naturally evolved. I got into cloud computing, worked at Sun Microsystems on Sun Grid, which was infrastructure as a service before the term was even coined, predating EC2 by a couple of years. Fast forward another 10-15 years, and I was working at a UK startup providing public cloud services based on OpenStack and Ceph. We did a lot of things right technically, but the company fell apart due to issues on the colocation side of the business. That’s when I started getting more involved in community stuff, contributing upstream, and really embracing the spirit of open source.

Eventually, I worked at Mesosphere, where I focused on Kubernetes. Mesosphere was known for Apache Mesos, which was used at phenomenal scales, but Kubernetes has clearly won out due to its community-driven approach. After Mesosphere, I moved to Rancher, focused more on Kubernetes, and then to SUSE after Rancher was acquired. Now, I’m at EscherCloudAI, managing Kubernetes services.

Itiel Shwartz: That’s quite a journey. It’s interesting to hear about the whole Mesos vs. Kubernetes era. For those who weren’t around 7-8 years ago, it did feel like a war between Docker Swarm, Mesos, and Kubernetes. Docker was this huge thing, and then the question became, “Where do I run my containers?”

Nick Jones: Absolutely. At the time, people were still figuring out containerization and orchestration. Docker Swarm seemed like it might be the go-to because it was easy to use and had the Docker brand. But Kubernetes captured people’s imaginations with its comprehensive API and extensibility, along with the credibility of being a reimplementation of Google’s Borg. The community effort behind Kubernetes really made it the winner in the end.

Itiel Shwartz: Let’s switch gears a bit. Suppose I’m a senior platform engineer at a company currently running on EC2, and we want to migrate to Kubernetes. What are the biggest pitfalls in implementing Kubernetes?

Nick Jones: The biggest shift is almost always cultural. Kubernetes isn’t just another tool; it requires a change in how your developers, operations teams, and everyone involved approach their work. You need a cohesive engineering organization where everyone understands the challenges across the development, shipping, and operations phases.

One common problem I saw during my consultancy days was traditional, siloed organizations trying to adopt Kubernetes without integrating their teams. Developers would get frustrated with how long it took to provision virtual machines, so they’d take it upon themselves to install Kubernetes, which often led to issues because there wasn’t synergy between Kubernetes and the underlying infrastructure. Successful Kubernetes adoption requires collaboration across teams and a clear understanding of what problems you’re trying to solve.

Itiel Shwartz: So in plain English, what should I do first? Start with culture lectures?

Nick Jones: Not exactly. You need to understand what problems your team wants to solve. Are your developers eager to ship stuff in containers? Do they want to adopt microservices? You need to assess these needs before jumping into Kubernetes. Kubernetes isn’t always the answer, and it’s not a one-size-fits-all solution. Start by finding a small project to migrate first, learn from it, and iterate as you go.

Itiel Shwartz: That makes sense. Even when everyone is aligned, it still takes time. What’s the biggest barrier in such cases?

Nick Jones: When everyone’s aligned, things can go more smoothly, especially if you’re using managed Kubernetes or other managed services. That way, you’re offloading a lot of the operational burden. However, when you start taking on more yourself, things get trickier. This is where platform engineering comes in, providing a dedicated team to manage the infrastructure and tooling needed to keep developers productive.

Itiel Shwartz: Let’s talk about AI and Kubernetes. It’s a hot topic. What’s your take on running AI models on Kubernetes clusters?

Nick Jones: At EscherCloudAI, we’re building a stack from the ground up that’s optimized for running workloads on GPUs, particularly AI. There’s a lot of crossover between traditional HPC and AI, especially in batch scheduling and training. People have been won over by the Kubernetes API, so there’s a focus on bringing HPC-like job scheduling to Kubernetes. There are projects like Volcano and Armada that are making strides in this area, and we’re keeping a close eye on them. It’s a rapidly evolving part of the industry.

Itiel Shwartz: Cost is a big factor when migrating AI projects to Kubernetes. Is there any easy way to optimize the default scheduler for AI tasks?

Nick Jones: It’s tricky because GPUs aren’t particularly sharable resources. With CPUs, you can specify fractions, but with GPUs, it’s all or nothing. NVIDIA offers virtual GPUs and MIG to slice up GPUs, but efficiently packing jobs is still challenging. The way GPUs are currently consumed in Kubernetes will likely evolve, but for now, it’s an area where improvements are needed.

Itiel Shwartz: We’re almost out of time. What’s your take on Kubernetes three years from now? What trends do you see rising?

Nick Jones: I hope we’ll see more tools that make Kubernetes easier to consume and adopt. Kubernetes is reaching a high level of production maturity, so the focus should shift to building on top of it, making it more user-friendly and accessible for different workloads. Eventually, Kubernetes will become less visible, just a piece of the infrastructure puzzle, and we’ll see more focus on what can be built on top of it.

Itiel Shwartz: What’s the “Sausage Club”?

Nick Jones: That’s a project I run with a friend. We both had glorified home labs and decided to combine forces. A friend of ours owns a former nuclear bunker in Scotland, and we’ve set up our hardware there. It’s a glorified home lab that runs OpenStack and Ceph, plus a bunch of other stuff on top. It’s still a hobby, but it’s grown a bit. We call it Sausage Cloud because the virtual machines are named after different types of sausages—it seemed like a fun idea at the time.

Itiel Shwartz: Any final notes?

Nick Jones: I’d like to mention Kubernetes Community Days UK, happening in London on October 17th and 18th. It’s a community-run, non-profit event with great talks and workshops. If you’re interested, check it out at kcd.io. Early bird tickets are still available.

Itiel Shwartz: Thanks a lot, Nick. It was a pleasure having you today. Good luck!

Nick Jones: Thanks for having me!

[Music]

Nick Jones is a CNCF and OpenUK Ambassador, currently employed as Head of Cloud Native and Platform Engineering at EscherCloudAI, where he’s helping to build a next-generation managed Kubernetes service tailored for AI workloads backed by a sustainable approach to running infrastructure. He is also a serial meetup organizer, including Cloud Native and Kubernetes Edinburgh, CNK Manchester, as well as larger events such as Kubernetes Community Days UK. Nick is passionate about new technology, but simultaneously romantic about the old with a penchant for decrepit Silicon Graphics and Sun Microsystems hardware. 

Itiel Shwartz is CTO and co-founder of Komodor, a company building the next-gen Kubernetes management platform for Engineers.

Worked at eBay, Forter, and Rookout as the first developer.

Backend & Infra developer turned ‘DevOps’, an avid public speaker who loves talking about infrastructure, Kubernetes, Python observability, and the evolution of R&D culture.  He is also the host of the Kubernetes for Humans Podcast. 

Please note: This transcript was generated using automatic transcription software. While we strive for accuracy, there may be slight discrepancies between the text and the audio. For the most precise understanding, we recommend listening to the podcast episode