#013 – Kubernetes for Humans Podcast with Ian Nowland (Datadog)

Itiel Shwartz: Hello everyone, and welcome to another episode of the Kubernetes for Humans podcast. My name is Itiel Shwartz, and today with me on the show is Ian. Ian, can you please introduce yourself?

Ian Nowland: Yeah, my name is Ian. I’m originally from Australia, but I’ve lived in the US for about 17 years now. For the last four years of my career, I was in engineering at Datadog, running what we called Core Engineering, which is more platform engineering. In the last six months since leaving that job, I’ve been working on a book on platform engineering, trying to get some of my lessons into writing.

Itiel Shwartz: First of all, I’ll be honest here—I’ve known you for quite some time, I think about two years, and I think you’re one of the biggest experts I’ve talked with regarding platform engineering, mainly because of your experience at both AWS and Datadog, which are huge companies that cater to the rest of the developer world. I’d be happy if we could start at the beginning. You weren’t born writing books on platform engineering. Can you share a bit about your background—what made you go into tech, then into platforms, and also around Kubernetes?

Ian Nowland: Sure. My background is a bit non-traditional for software engineering. I have a mechanical engineering degree because I was super interested in robotics. I graduated from the University of Sydney, which had a good mechatronics program, thinking I’d do a PhD in robotics. But this was right after Y2K, and the industry was in a downturn. I realized maybe a PhD wasn’t the right path, and I also really liked writing code. I ended up working in industrial automation at Honeywell and quickly became a pure software engineer. Around 2006, I realized the Sydney market wasn’t great in terms of roles, and Amazon had sent a recruiting team out to Australia. Amazon was mostly a bookstore back then, just starting to sell CDs and DVDs, but I saw it as a great step in my career. They made it easy for me to move to the US, so I made the leap.

Once I was at Amazon, I got really excited by AWS and the infrastructure level. The first project I worked on was what we now call a platform, but it was actually Elastic MapReduce. We were figuring out how to run it on EC2, which got me really interested in the internals of EC2. About two years into that, around 2011, I transferred into EC2, which was just going through hypergrowth. At the time, EC2 was essentially an open-source hypervisor with MySQL and a workflow. Everything was breaking in every possible way due to the insane growth. It was two years of bailing water, fixing problems as fast as we could.

Then, the guys who were really into back computing always had this vision of competing with bare metal. I got to initiate what’s now called Nitro. I was the first person on it, built it from scratch, and built the team up. I took it through the first three releases, covering three generations of the underlying technology. But I completely burned myself out along the way. Nitro became an internal platform within Amazon, and I wasn’t capable of managing the politics of it all at the time. So, I took a big career reset and ended up working at Two Sigma for a couple of years, working on Kubernetes.

Itiel Shwartz: Not a lot of people worked on EC2 back then, so I wonder—were most changes technical, like how to support and make it work, or more about how people would utilize it? Were there very hard technological problems, or was it more about the usability of EC2?

Ian Nowland: EC2 virtualization at the time was a lot like what you see in the container space today. Xen had lots of edge cases and noisy neighbor problems. EC2 was going through a lot of variability issues, and all the teams on the data plane side were working to get Xen into production. It had all these weird bottlenecks where different tenants could cause issues for others. It was like every single thing that could potentially cause a noisy neighbor was causing one. The managerial challenges were about how to keep growth up while all this was happening. We had the V1 platform in production, which was hyper-growing, and had to figure out how to build the V2 platform. There were about 10 different ideas, so a big part of the challenge was deciding whether to double down on Xen or pursue more radical ideas, like offloading everything to a secondary system. It was a lot of keeping up with hypergrowth while trying to figure out the future strategy.

Itiel Shwartz: You left AWS in its early days, when EC2 was still flourishing. You mentioned working at Two Sigma; that’s where you really got hands-on with platform engineering and migrated from on-prem to public cloud using Kubernetes, right?

Ian Nowland: Yes, exactly. Two Sigma was actually where I started thinking deeply about platform engineering. We were using Kubernetes to pivot from on-prem to the public cloud, going hybrid between AWS and Google. The main reason for Kubernetes was that the firm had been very successful on-premises and didn’t want to move out of its data centers too quickly. Kubernetes was a good decision, but it was very early Kubernetes, so we went through a lot of the hype cycle and the reality of it. 

Itiel Shwartz: Just to clarify, what year and Kubernetes version was this, if you remember?

Ian Nowland: I got there in 2016, and they started with Kubernetes in 2015. It was very early, and we had a lot of challenges, like how to upgrade clusters without waking people up. Another big challenge was networking. They had their own storage system, which was very noisy. There were lots of problems with how legacy software was written, and we had to tackle all these issues to make it work. The other part was dealing with tech debt. They had 10 years of tech debt and owning their own system, so it was the classic problem of how to bring over the long tail of historical decisions onto this very modern platform. 

Itiel Shwartz: I think most companies, especially those that make money, are not necessarily cloud-native. Here at Komodor, everything is cloud-native and built on Kubernetes, but I think most of the industry is more like where you worked—companies that aren’t tech companies at heart. They’re hedge funds or banks, where Kubernetes is just a means to an end. Can you talk more about that migration process and the challenges you faced?

Ian Nowland: It’s a great point. This fintech company, like a lot of New York finance firms, was hiring the highest caliber people. But they built software differently because it’s usually not 24/7, so they ended up with very bespoke solutions to particular problems. They were very risk-averse, especially in the platform layer. You had a training team that didn’t want to touch containers and wanted to run on bare metal where they controlled the cores. On the other side, the data engineers and scientists liked containers and the idea of improved productivity but didn’t care about containers themselves. They just wanted to get to Python as fast as possible because all the best tools were in Python. 

So, what that looked like internally was a messy process of building relationships and compromising on the vision to get things shipped. You had to listen to your customers rather than your team about the technology. Success was about winning internal customers one by one. We embedded our engineers in customers’ teams more than I had ever done at Amazon because it was the right thing to do. That was what got our technology adopted. We managed to carry some legacy stuff into a new clean system while making some compromises along the way.

Itiel Shwartz: The money-makers for many companies are still in legacy systems. When I worked at eBay eight years ago, the heavy lifting was still done by a catalog system that was written 20 years ago. Most of the industry is still converting parts of these legacy systems to new microservices. 

Ian Nowland: Absolutely. Legacy systems have proven themselves, and no one really likes to change them. That was true at Two Sigma, too. We had fun with platform engineering, migrating from Mesos to Kubernetes, which was itself a step forward. But after that, I wanted something more technology-driven with faster timelines, which took me back towards infrastructure. By then, I was living in New York and heard great things about Datadog. The reputation was that engineers really liked the tools, which was appealing because I was used to hating all my tools.

I joined initially to manage their original metrics, time series database, but over time we pivoted to a platform versus product organization. At Datadog, we invested a lot in our in-house tools, and by the end, I was managing about 700 people. It was very standard platform engineering—true infrastructure teams, in-house data platforms, application platforms, API layers, etc.

Itiel Shwartz: You led a big team through hypergrowth and acquisitions at Datadog. Can you share one or two challenges or success stories from that time?

Ian Nowland: Datadog’s success was heavily based on acquisitions, like the French company Logmatic, which became the basis of Datadog’s logging product. Integrating different company cultures and technology was challenging. For instance, Datadog was a Go and Python shop, while Logmatic was a Java shop. Supporting a bunch of French engineers with their own infrastructure and toolset was a challenge for our very US-based, New York-based team. But Datadog was good at keeping the culture of collaboration. We had to build up our automation and abstractions around things like Kafka, which took years. It was a lot of compromising—helping the new team while building the platform at the same time.

Itiel Shwartz: It’s always a struggle between V1 and V2, especially when the money-maker is usually the V1. 

Ian Nowland: Exactly. It’s about changing the car’s engine while it’s going down the road. The V1 is still carrying the business, and while everyone knows it needs to be replaced, it’s a tough challenge. But if you go in with a product-first growth mindset, it’s just how things happen. Good management is about figuring out how to succeed and eventually having good platforms, even if you didn’t start with them.

Itiel Shwartz: After four years at Datadog, you’re now taking a break, but I know you have a lot of thoughts on platform engineering and even managing platform engineers. Can you share some of those thoughts?

Ian Nowland: Sure. I’ve always had a bit of a reputation for criticizing Google, but it comes from a place of respect. I saw this ‘you build it, you run it’ culture and thought it was antithetical to building good platforms and good engineering. You need to have both software engineers and infrastructure engineers working together. Success is having them collaborate effectively; failure is having them siloed. At Datadog, the whole idea of a product was to bring dev and ops together, but it was a continual challenge.

Coming out of Datadog, I’m writing a book on platform engineering with a former manager of mine, focusing on the idea of working together with a product mindset to build platforms. We’ve been working on it for the last six months, and it’s about 90% done.

Itiel Shwartz: That’s interesting. I think a lot of people are trying to figure out exactly that—the balance between abstraction and the complexity inherent in most products. Are you seeing any trends in the industry, or anything interesting in the companies you’re talking to?

Ian Nowland: A lot of people are struggling with the complexity of Kubernetes. The challenge for observability companies is to not just present observability but to simplify it for most users. I’m particularly interested in simplifying the networking layer, especially as it ties to autoscaling. The abstractions are still not fully mapping to business problems. There’s also the question of running stateful workloads like PostgreSQL on Kubernetes. I think today it’s fairly safe to reason about the properties and faults of the underlying technology in terms of how it will impact Postgres. But the historical fear was about losing state unnecessarily due to infrastructure issues. I think we’re getting to a point where Kubernetes is stabilizing, and in a few years, it could be revolutionary once it’s easier and more stable to run stateful workloads on it.

Itiel Shwartz: What’s your prediction for where the industry is heading in the next five years, especially in terms of platform engineering?

Ian Nowland: As I’ve been writing about platform engineering, I keep asking why this approach makes sense and whether it will get commoditized. What I see is that each company’s growth, both in terms of products and technology, is fairly separate. Everyone starts off similarly, but then they make different choices. My hope is that platform teams will get more power to move up the stack and that this movement will lead to better technology that truly leverages the business. If we can get to a point where most application developers don’t have to care about infrastructure, that would be a great outcome.

Itiel Shwartz: Any last notes—about your book, your life, anything else?

Ian Nowland: My book is in progress and should come out next year. It’s aimed at people managing platform engineering teams and dealing with the challenges of inheriting legacy systems. It’ll be out early next year.

Itiel Shwartz: It’s been a pleasure. Enjoy life!

Ian Nowland: Thank you.

[Music]

Ian Nowland has been in the software development industry for over 25 years, with the last 14 in management. That includes a long stint in AWS EC2 and a shorter stint at a quant hedge fund Two Sigma. Up until recently, he served as SVP of Core Engineering at Datadog, managing about half the engineering org that built the company’s core data and platform systems. Ian is now enjoying his post-Datadog days and working on a new book about platform engineering. 

Itiel Shwartz is CTO and co-founder of Komodor, a company building the next-gen Kubernetes management platform for Engineers.

Worked at eBay, Forter, and Rookout as the first developer.

Backend & Infra developer turned ‘DevOps’, an avid public speaker who loves talking about infrastructure, Kubernetes, Python observability, and the evolution of R&D culture.  He is also the host of the Kubernetes for Humans Podcast. 

Please note: This transcript was generated using automatic transcription software. While we strive for accuracy, there may be slight discrepancies between the text and the audio. For the most precise understanding, we recommend listening to the podcast episode