#027 – Kubernetes for Humans Podcast with Ben Sigelman (ServiceNow)

Itiel Shwartz: Hello everyone, and welcome to another episode of the Kubernetes for Humans podcast. Today, we have a very special guest from ServiceNow, Ben Sigelman. Hello, Ben!

Ben Sigelman: Hi, thank you so much for having me. It’s good to be here.

Itiel Shwartz: Pleasure! Can you introduce yourself a bit and tell us about your background?

Ben Sigelman: Oh, my least favorite topic—myself. Yeah, so I’m Ben. I’ve been working in and around observability as a discipline for a long time. I’m somewhat shocked to say it’s almost half my life. I’m in my mid-40s. I joined Google right out of college in 2003. I spent a couple of years working on ad stuff, which I really didn’t enjoy, and then I kind of fell into this trap of trying to solve the observability problem. I’m happy and sad to say I’m nowhere close to solving it, but it’s been really fun. So, I’ve been working on this more or less continuously for a really long time.

I started a company in 2015, LightStep, which initially found traction doing tracing stuff before broadening out into more of a full-platform observability thing. Along the way, I realized there were a lot of problems with standardizing how data gets out of systems, so I had a lot to do with the creation of OpenTracing and then OpenTelemetry. LightStep was acquired by ServiceNow about three years ago, and now, at ServiceNow, I’ve been running the observability business and helping with their strategy around cloud-native stuff.

Itiel Shwartz: That’s a super interesting story. Maybe let’s start at the beginning—Google, 2003. How did you move from ads to observability? It doesn’t seem like an obvious path.

Ben Sigelman: It doesn’t make sense, but it was a good thing. Yeah, 2003—Google at that point was less than a year from going public. Google still seemed like this magic new search engine. It was less than five years old as a company, but internally, they realized they were going to make a pile of money before I was hired. There was a big difference between people who were hired before and after that, in terms of equity and such. I was probably employee number 1,000, give or take.

Google was small enough that if you wanted to talk to someone, you could just ask, and they would talk to you. It was really interesting. Google had spent that early period of its life during the dot-com crash, which presented some challenges, but one of the best things was that they were able to hire an enormous number of incredibly qualified people who were coming out of these computer science research labs that were all folding at that time. They took some of the best people from places like DEC Labs, HP, Microsoft Research, and others. These people were not just fixing code but really designing things the way a researcher would.

This is a segue into how I ended up leaving the ads world. I didn’t like my job at Google. I was incredibly naive. To make that concrete, on my first day, I had never worked in the tech industry—never even had an office job. I had worked at an ice cream store and did a bunch of music stuff in the summers during college. I joined, had no idea what was going on, and my tech lead, who wasn’t even my manager, was sitting next to me. It was like 10:45 in the morning, and I really had to go pee. I asked him, “Andrew, can I go to the bathroom?” He looked at me and said, “Yes, and never ask that ever again.” I didn’t know anything, and I thought Google was this shining beacon of progress. But I was working in the bowels of the ad system, and as a 20-something who didn’t believe in advertising as a business model, I was not pleased.

Then they had this thing where they set you up for a meeting with someone in a 15 or 20-dimensional vector space, where the vectors were things like your office location, programming languages, and so on. They set up a 30-minute meeting with no context between you and the person furthest from you in that vector space. The person they set me up with was Sharon Perl, one of these distinguished researchers who’d come out of DEC Research Labs. She was so smart and qualified and was working on multiple interesting projects, including a prototype of a distributed tracing system that she had built with Mike Burrows and Dick Sites. It didn’t really work, but she described what it did—following transactions as they flowed through Google systems with no effort from the developer. I thought that was so interesting, way more interesting than what I was doing. My manager had 120 direct reports, so he had no clue what any of us were doing. I decided I was going to work on that tracing project instead, and I just loved it.

This was late 2004 or early 2005. At some point, I formally asked to work on it, and Google, being a flexible, bottom-up organization at the time, said, “Great, do it.”

Itiel Shwartz: Today, with OpenTelemetry and OpenTracing, everyone says, “Yeah, sure, you should bake it in.” But back then, was it groundbreaking? Were there existing tools in other places? Was it interesting to trace a request back then?

Ben Sigelman: It was definitely interesting at Google. It wasn’t something you could expect to exist. Let’s remember that at that point, GitHub hadn’t been founded yet. None of this stuff existed as we know it today. Google was forced to clean-room an entire stack. The only thing they really used was Linux; the rest was user space code they wrote themselves. Almost nothing they used was off-the-shelf open-source software. Most people trying to do significant computing at the time would go and buy expensive, high-quality machines from Sun running Solaris. Those machines were reliable, especially compared to the commodity hardware we used at Google. But Google’s model was to build reliable systems from unreliable hardware and make it all redundant at the software layer in user space.

Google created these very distributed systems because the individual components weren’t meant to be reliable, so there was a lot of redundancy, which meant a lot of distribution of compute. It was a different environment from what was happening elsewhere. I don’t want to make it sound like Google was so smart that no one else could have thought of this, but it was difficult to do in practice if the software ecosystem wasn’t designed for it.

The initial instrumentation for Dapper was a couple of hundred lines of code, and that gave us coverage of many layers of the software stack. Tracing was incredibly helpful in that environment—downright necessary, actually. Google hit that wall before others because of their approach to building software.

Itiel Shwartz: Let’s fast forward a bit—you left Google, but why, and how did that happen?

Ben Sigelman: I was there for almost ten years. It’s a long time, right? I’m a mortal person, and I wanted different experiences. I felt like I was still learning things day-to-day, but every time I took on a new project, it felt very similar to the previous one in terms of the technology and the people. I just felt like I was in a rinse and repeat. If I had been 58 years old, I would have stayed at Google, but at the time, I was still pretty young, and I thought, “I’m going to leave eventually; I don’t want to do this for the rest of my career, so maybe I’ll just leave now.” So I left and started an incredibly unsuccessful social media company, which I did right after Google. I had these grand aspirations about working in consumer tech. I’m glad to say I’ve been cured of that—I have no interest in ever working on consumer tech again. But out of the ashes of that, I just wanted to do something I knew was valuable, so I got back into observability.

Itiel Shwartz: From consumer tech to ServiceNow. Why LightStep, and what was the ecosystem like back then?

Ben Sigelman: The vision for LightStep was actually pretty coherent and unchanging throughout the life of the company. If you look at applications that make a difference in the world, there’s going to be a race to move quickly because of capitalism. To do that, people are going to hire a lot of engineers. I believe the only way to get engineers to be efficient when you have more than a couple of dozen of them is to divide them into small teams that can work autonomously. That leads you to things like microservices and Kubernetes. I had a strong belief that this basic architecture—having separately deployable pieces that talk to each other—was going to happen. At the time, it wasn’t completely obvious that Kubernetes was going to win, but it was clear that architecture and way of working were the future.

I knew from working at Google that if you were going to work on a system like that, you needed something like tracing. It wasn’t necessarily clear how we were going to get there, but I knew it was going to be needed. Our timing was impeccable—a fluke, really. I thought we were several years early and was prepared to wait. But it turns out we were right on time, which was lucky. The trouble with startups is that as soon as you have a good idea, people take notice, and we had enormous competition early on. Companies like New Relic, AppDynamics, and Dynatrace were already around, and they had built their architecture around single-process, three-tier architecture servers with an agent. DataDog was very problematic for us. Honeycomb started around the same time as we did, and we’ve always had similar interests. It’s important in a startup not to treat your competitors as enemies—in many cases, you’re both trying to make a market happen faster, and it’s not a winner-take-all market.

I think we were right about the market and did a lot of things well from an execution standpoint. The decision to join ServiceNow felt a bit difficult at the time—it’s hard as a founder to sell your company. But in retrospect, I’m glad we did. ServiceNow is perceived as a place where you go to request a new laptop if you work in a large corporation, but they’re doing a lot of other things. They’re doing a large business in what I would call the Old Guard of operations. The problems we’re solving have a lot to do with fundamentals like SLOs and MTTR and less to do with distributed tracing per se, but I think it’s really interesting. I’ve enjoyed the last couple of years focusing on trying to decouple our customers from their problems via software.

Itiel Shwartz: Let’s go back to before the acquisition. Who was your target persona, and why would a platform engineer need something like that? Was it just for them, or were you targeting the broader organization?

Ben Sigelman: That’s a good question. Over the years, I’ve always believed that observability infrastructure should be in the loop for a lot of personas—certainly devs, but also customer success, financial operations, and more. There are proof points for how that can work. The reason it’s been challenging to execute on isn’t just for us but for many competitors. It comes back to how customers like to buy software. It’s difficult to get a Director of SRE and a Customer Service Manager to collaborate on a purchase. Ultimately, if you’re delivering value to both and want to charge them for it, that would be necessary. I think a lot of the issues fundamentally have to do with go-to-market rather than product technology. We’ve had customers in many different roles over the years who benefited from what we were doing if they were willing to spend the time to make that happen and integrate it. But it’s difficult to sell a single product that touches on those personas unless you’re part of a genuine platform.

To answer your question, we were selling to whatever engineering leader was responsible for production. Sometimes that was a VP of Engineering, sometimes a Director of SRE, sometimes a Director of Platform Engineering. It differed, but their job was to ensure the reliability of customer-facing, revenue-generating software systems. The users were mostly those who were on call sometimes—they didn’t always use us on call, but they had that role. Sometimes they were also developers, but that was the user persona.

Itiel Shwartz: That’s interesting. What’s your philosophy on how much time a developer should spend in an observability tool? Is it supposed to be 5% of their time, 10%, never, always? What’s your take on where we should go as an industry?

Ben Sigelman: That’s a good question. I don’t have a particularly rigid opinion on that. I don’t think a developer should be spending a lot of time looking at a big dashboard with lots of little squiggly lines. One of my concerns about observability is that the term has become too general to be meaningful. It could be used to describe almost any beneficial tool, and it’s been washed out as a marketing term to mean all sorts of different things. When people say “observability,” I think they mean different things. If you mean a big dashboard of squiggly lines, I don’t think a developer should be spending much time on that. But wouldn’t it be nice if, when you’re looking in your editor, you could see annotated over the lines of code what’s happening in production? For example, this little section of code has very high variance for performance. There are all sorts of things that could be built into the tooling that developers use, which could actually be very helpful for them.

I’ve seen demos over the years of almost magical things that have happened when we’ve built observability way, way left into developer tools, including IDEs. The reason it’s been hard to deploy is due to technology maturity and deployment issues, but I would love to see that. I’m not sure if that counts as observability, though.

Itiel Shwartz: Before founding Komodor, I worked at a startup named Rookout that was acquired by Datadog. I was the first developer and wrote the IDE plugin for Rookout back then. We faced a lot of technology challenges, which were hard to solve, but we managed to solve most of them. There’s also a mentality issue or cultural issue—developers often don’t think in production terms. Production is this beast that feels far away from them. Sometimes, I was able to show developers relevant logs from production, and they were surprised. They didn’t realize it was their application running somewhere on the cloud. There are a lot of technology issues but also a lot of cultural ones.

When we founded Komodor, we tried to think about what a developer would need in their observability tool or tooling when it comes to solving a problem in production or understanding their system in production. Even that varies quite a lot, but it’s a question of both technology and culture. From a cultural standpoint, at least, the shift-left movement is strong. I think when you founded the company, it was way too soon for developer empowerment because the market wasn’t ready yet. With the rise of IDP, Backstage, DevX, it seems much more predominant now. Do you see the same thing?

Ben Sigelman: That’s interesting. First of all, I didn’t know you were at Rookout. Rookout was a really interesting product—I loved it. As far as developers being the ones carrying pagers, I think what you’re saying is true. Some devs are amazing—they think about the whole system, the customer, the software, and how it all fits together. But there are definitely people who don’t, whether it’s due to a lack of training or engagement. Forcing people to be on call is one way to address that issue. Now they care about it.

The thing I don’t like about that model is that it solves a lot of problems for software that’s still growing, but most software that we depend on isn’t being maintained actively. Most of it is in maintenance mode. The software can still break, even if it breaks less frequently because it’s not changing as much. It seems unrealistic to have a role like “maintainer ops,” where people are mostly just trying to understand and maintain the system from a production standpoint. The DevOps model, where people carry pagers, might work, but it can’t be the full solution. If it is, I wouldn’t call that a developer.

Itiel Shwartz: That’s super interesting. We’re almost out of time. Maybe share a bit more about why ServiceNow acquired you, the current challenges in the observability space, and what the future holds for ServiceNow.

Ben Sigelman: I think the thesis for the acquisition was that observability is a big market, and ServiceNow wasn’t really in it. Now they are. But the more interesting angle is that ServiceNow does a lot of things beyond ITSM, although it’ll take decades for the world to realize that. ServiceNow is doing a lot of other things, especially in highly regulated markets. Those are the markets where there’s a need for a truly integrated story across the new and the old. There’s no such thing as a fully cloud-native application at a giant bank—there are cloud-native services, but those services depend on very non-cloud-native, legacy software that often runs on-prem and is maintained by people who don’t even know how to write code. How does that all fit together? It’s a mess, to be honest.

ServiceNow has built incredibly large businesses by creating rigor and structure in areas that are messy, and this is a very messy area. What’s interesting to me is how to use signals from observability—not just to add observability dashboards to the ServiceNow platform, but to use that data to give people a common system of record across different personas, organizations, and technology generations. That’s what I find interesting. At LightStep, as we got bigger and started to work with more big enterprise software systems, we would hit this event horizon where you get outside of the Kubernetes ecosystem, and suddenly, there are these dependencies that break a lot, but we couldn’t see them. They were outside of our purview. So, it was either build an integration for all of legacy software, which as a startup wasn’t attractive, or solve the business problems as part of a larger, integrated platform that’s partnered with many of our competitors, which is necessary to solve this problem. That’s what excited me about the acquisition, and I think that’s what excited ServiceNow as well.

Itiel Shwartz: Last question—what does the future hold for ServiceNow and for Kubernetes or the ecosystem?

Ben Sigelman: I think we’re at a point where Kubernetes-based systems need to take the blinders off and acknowledge that the rest of the technology ecosystem also needs to be integrated in some way. Integrating the old and the new is really tough, but that feels most urgent to me.

Itiel Shwartz: Ben, it was a pleasure having you here on the show. Enjoy Las Vegas, enjoy ServiceNow, and the conference. Hope to talk again soon.

Ben Sigelman: Thank you so much.

[Music]

Ben Sigelman is the General Manager of ServiceNow Cloud Observability, which solves for the reliability and performance of cloud and cloud-native applications while broadening the scope and leverage of the broader Now Platform.

Previously, he co-founded and was CEO of Lightstep, which ServiceNow acquired in 2021, and co-created both the OpenTracing and OpenTelemetry projects. Ben also helped define modern observability with his work on tracing and metrics monitoring at Google (the Monarch and Dapper projects) and was a pioneer in SRE best practices and tooling.

He holds dual Bachelor of Science degrees in Mathematics and Computer Science from Brown University.

Itiel Shwartz is CTO and co-founder of Komodor, a company building the next-gen Kubernetes management platform for Engineers.

Worked at eBay, Forter, and Rookout as the first developer.

Backend & Infra developer turned ‘DevOps’, an avid public speaker who loves talking about infrastructure, Kubernetes, Python observability, and the evolution of R&D culture.  He is also the host of the Kubernetes for Humans Podcast. 

Please note: This transcript was generated using automatic transcription software. While we strive for accuracy, there may be slight discrepancies between the text and the audio. For the most precise understanding, we recommend listening to the podcast episode