Episode #58 30:36 2026-05-13

#058 – The Future of AI and Platform Engineering with Blake Sherwood (Smarsh)

Blake Sherwood
Distinguished Technologist, Smarsh

Listen to the Podcast

Episode Overview

Itiel Shwartz sits down with Blake Sherwood, Distinguished Technologist at Smarsh, to unpack what AI is actually changing inside a regulated, petabyte-scale compliance and e-discovery business. Blake shares how Smarsh moved more than 20,000 services from Pivotal Cloud Foundry to Kubernetes, where their early AI-on-observability experiments fell flat, and why solving a decade of inconsistent logging practices unlocked more value than any model choice. The conversation digs into how Smarsh re-architected ingestion, storage, and network paths to make AI viable, why analytical thinking from adjacent industries beats hiring purely for AI experience, and the quiet risk of losing engineering muscle memory as teams lean on AI tools. Blake closes with a prediction that AI's biggest payoff will come from finally delivering on the long-promised potential of platform engineering and AI SRE through managed frameworks and platforms.

In this episode we discuss:

  • What a Distinguished Technologist actually does, and the scale of Smarsh's compliance and e-discovery footprint across clouds and colos
  • Migrating 20,000+ services from Pivotal Cloud Foundry to Kubernetes, and the lessons from a nine-month, hundred-plus-person modernization
  • Why AI-on-observability experiments stalled until Smarsh fixed a decade of inconsistent logging and ingestion practices
  • Infrastructure changes required to make AI viable: ingestion, on-prem aggregation, network latency, identity, and AWS org structure
  • Hiring and team building in an AI world: analytical thinking from adjacent industries vs. hiring purely for AI experience

Key Takeaways

1
Most enterprise AI efforts stall in a 'phase zero' of open-ended experimentation; progress comes from naming a specific problem to solve rather than asking what to do with AI.
2
AI didn't fix Smarsh's observability problem on its own — they had to solve an underlying information dilemma by standardizing logging, story-based events, and libraries before AI delivered value.
3
Infrastructure choices made five-plus years ago, especially around network latency, data locality, and AWS org structure, often don't lean into an AI-first world and need deliberate, low-risk rework.
4
Hiring for AI is less about prior AI experience and more about analytical thinking; some of Blake's strongest AI-era hires came from retail and medical backgrounds.
5
The biggest near-term shift won't be AI for AI's sake — it will be AI supercharging platform engineering and AI SRE so the original DevOps and platform promises finally become real.

Itiel Shwartz: Hello everyone and welcome to another episode of the Kubernetes for humans podcast. Today with me in the show, we have a special guest. Blake, do you want to introduce yourself?

Blake Sherwood: Yep, sure. thanks to you. Blake Sherwood, I am a distinguished technologist working with Smarsh currently. I spend a lot of time at AI and platform.

Itiel Shwartz: So, what is a distinguished engineer? What is Smarsh? And And what do you actually do on a day-to-day basis?

Blake Sherwood: I often ask myself that question. What is a distinguished technologist? so, the way I kind of explain it, everyone jokes that I need a monocle, but the idea is that I am not spending time not just in engineering. I spend time in product, sales, marketing, you name it. It’s across the board. it’s someone that’s involved in and influences significant traction inside the company. so, Smarsh, we deal with high compliance and regulatory environments. We do data capture, everything from e-discovery, conduct. we process a lot of data. And I’m talking petabytes upon petabytes. We process

Itiel Shwartz: For what? Like petabytes is like, you know, it’s like it’s it’s only a number. Like, what What’s the scale or like how much can we talk about? Like, how big is this operation on like public numbers, right? I don’t want to get you in trouble here, Blake.

Blake Sherwood: So, I think the number I can use is somewhere north of around a couple of petabytes a day Mhm. And And data processing. So, we obviously look after a lot of data. to give you kind of an indication of the scale we’re across five cloud locations and I think at last count, and I could be wrong on this one, but about 17 colocation facilities as well. But that’s an amalgamation of where Smarsh has been over the last 15 plus years.

Itiel Shwartz: Okay, sounds indeed big and like you’re a distinguished engineer, maybe you can share a bit about your journey. Like I guess that you didn’t start your career or your place in in Smarsh like that. So, like how did you start? What happened? Walk us through a bit. Like what did you do and why did you do that?

Blake Sherwood: Yeah, sure. my early career actually started in tourism. I have very fond memories of sitting in a very very low-lit office and my boss coming to me and saying, “We’re going to develop a website.” And I’m like, “Okay, great. When’s that due?” He goes, “Friday.” And it was often Tuesday. -huh. And you know, it is as much as I like to think back on those things, it taught me one very important interesting thing very early in my career. Things change at lightning speed. So, when I started in tourism, I got a very good exposure to I guess you could say a volatile changing environment. going from there, I moved into mining, believe it or not. I developed on-board SCADA systems for mining cut machinery.

Itiel Shwartz: You’re talking about coal mining, right? Like not Bitcoin mining, like gold mining, copper mining, like iron mining. What do you guys mine?

Blake Sherwood: Coal. Coal at the time.

Itiel Shwartz: Even more feels even more outdated in a way. Sorry. Yeah, yeah. I don’t want to insult, okay. It was back when I was living in Australia, so I’m currently in London now. so, coal was kind of a thing. Okay. So, I heard coal is going to come back, by the way. Like, now with all of the lack of oil that coal is like in its highest that it’s ever been in like the last 15 years or something like that. And Yeah. Okay. So, we were in travel, we were in coal, and then, sorry, well, like somehow you did end up as a distinguished engineer. So, walk us through like what happened.

Blake Sherwood: Around about that time, I got a taste for everything non-engineering, as well. so, I found myself doing things like DevOps, architecture, I got into platform engineering. and I also spent a lot of time in the product space. I didn’t actually think about too much back then, but I transitioned from engineering and architecture to product, merger and acquisitions, compliance, and, you know, systems built around that environment. Then I got into, HRIS, HRIS systems, background screening, and then I went into fintech. So, I’ve I think a very, very broad spectrum of industries on a really crazy level. and then the last 5 years I spent, obviously with Smash.

Itiel Shwartz: Okay. So, maybe like walk us through like you have been in Smash for quite quite a lot of time, I want to say, Blake. Yeah. coming on 5 years now. Oh, nice. So, like maybe walk us through your different roles in the organization, and then we’ll start talking maybe about Kubernetes and AI, sorry, which I think are like the most interesting parts.

Blake Sherwood: Yeah, so, I’ll talk a little bit about my first couple of years at Smash. and what I focus on. so, my first 3 years, I predominantly worked in delivery and DevOps, and I worked on projects that would be standing up infrastructure for our customers, running larger projects for data migrations. It was predominantly around taking large orchestrated tasks that took, say, 15, 20 days and bringing them down to about a day. my first couple of years was spent around a lot of different projects, but I transitioned into doing large-scale initiatives for modernization for technologies as well. So, Pivotal Cloud Foundry, we did a big quite a big project that I was heavily involved in moving to Kubernetes technology. and that took north of 20,000 independent independently deployed services moved into Kubernetes technology. So, it was a substantial effort that took almost 9 months. Mhm. I mean, north of around about 100 people. you know, big win for the company, massive technology benefit for us, scale, high availability, the works, modern tech stack. I think everyone can agree that Kubernetes has a you know, there’s a lot of things to love and hate about it, but there’s a lot of things we love as well. but we focused a lot of those different initiatives.

Itiel Shwartz: Mhm. Okay. Okay, and maybe like you know, like walk us through you know, I know that you’re like quite active on LinkedIn as well, but we are talking about the SRE. So, maybe even before going to the SRE part, like what’s your take on what’s happening in the market? Like you feel free to mention everything, like Claude Code or agentic inside a product or agentic for testing or for infra, like what’s your take on the on the Yeah, like the change of the land, basically. over the last 5 years, you know, even ever since you started in Splunk.

Blake Sherwood: So, it’s definitely changing. I think we can all align to that. the hype is real to a degree. I say that very tongue in cheek. But, you know, the thing that I picked up on the most, I mean, I’ve been doing this for, I think, I think officially 25 years now. If you look at all the changes over the course of the last almost two decades, there’s been a lot of things that claim to save you time. Whether it’s platform engineering, whether it’s DevSecOps, whether it’s delivery, focused initiatives, whether it’s DevOps, there’s a lot of different momentums or movements that say, “I’m going to save you thousands upon thousands upon hours.” I feel like, honestly, the technology finally caught up to the promise. Now, when AI hit the scene in the way it changed a lot of different things, and not just impacted our roles, but impacted, you know, all the adjacent ones as well. Going into product, going into sales, going into marketing, the downstream impacts and upstream to a lot of business level initiatives are quite insane. The interesting thing that we haven’t quite grasped in a lot of ways yet is it’s just a tool. While the magic is real and the promise is amazing and the opportunity is clearly groundbreaking, it is just another moment in time where a tool has come into the space that can back up what it says. So, what’s exciting for me is watching this change and watching the change in the roles in the industry itself and what’s going to come next because when you look at everything else going on the last couple years, when platform engineering finally got its legs, the industry shifted. When DevOps got its legs, the industry shifted. I’m really keen to see what is going to genuinely shift in the next five to seven years with the advent of AI. And I think one thing that I kind of take away from a lot of the last couple years, especially it’s March, is I don’t think we’ve seen the ceiling yet. I don’t think we’ve seen where this is going. I don’t think we’ve seen where this is going to stop. where it becomes democratized and very, very commoditized to a lot of industries. And so I think it’s quite interesting.

Itiel Shwartz: You are talking about the ceiling. Like where are we like like before talking about how how far can it go? Maybe like share what’s happening in your organization. And you know there’s like like 1 to 100, I don’t know, 1 to 10. How far are we in in the ceiling basically?

Blake Sherwood: I say from a ceiling number perspective, I think we’re probably around six.

Itiel Shwartz: It’s not that far, Blake. It’s not that far from the ceiling. No, right. I would say like a lower number. So maybe you can share about like what you already do and what do you think like the future has in all when it comes to your guys using on like AI in general or forestry as well. Yeah. So the way we’ve been looking at it and I talk about this a little bit in my talk. but it’s it’s a complete rewiring of how we think about work. It didn’t really take away anything. actually in a lot of ways it kind of made it a little bit difficult. Because the way we adjust to using a tool is very, very important and paramount to the successes and outcomes that we seek to achieve from it. So I jokingly said to someone recently, it’s like trying to paint a wall with a hammer. If you give someone a paint brush, it’s going to go very much much more smoother. But if you give someone a roller, you know, the result is even better. The way we’ve been kind of running the last couple months is focusing on how do we get from a paint brush to a roller? And how do we change the conversation to here’s a tool to here’s the outcome I’m seeking to achieve. The conversation’s definitely shift and a lot I’ve been a lot of them recently where we look at things like AI SRE and the signals that come back from it. How does it change decision? Not that the fact that AI SRE is a great place to be from an autonomous workforce perspective, but what decisions are changing? What’s what’s changing in the landscape? Those conversations we shifted to probably around about 8 months ago. And the way we looked at AI was completely different. It focused on the realities of how it links to real value inside the enterprise where I think a lot of enterprises are still struggling. And the applicab- applicability, that’s a hard word, to things like Kubernetes is I think where the ceiling itself is probably probably a two. From an arduous point of view from implementation inside enterprise, I’d say it’s still six. But the technologies, the way we apply AI to a technology, I think we’re nowhere near the roof yet. But from how it translates to going faster or being more productive to what it really changes and shifts in a business and the way it perceives its outcomes and value, that’s where I think we focus on inside Smarsh and not just, you know, AI for AI’s sake. we spend a lot of time in the AI SRE space right now going back and forth on what works, what doesn’t work. but yeah, it’s about translating value. That’s where we kind of sit and where we focus.

Itiel Shwartz: So so what works and what doesn’t work? Like if you already have done like the hard parts on like share share with our listeners. Like a lot of them are like platform engineers. What works? What doesn’t work in in this space?

Blake Sherwood: Yeah, I think what doesn’t work and what I’ve watched not just in Smarsh, but just inside, you know, everywhere else. It’s just I think a lot of people go through these phase zero and phase one. And to kind of give you context, phase zero is what do you want to do with AI? It’s try out everything. And often you’ll end up getting 12 months down the road, and we saw a little bit of this well internally, where you’re not really going to have everything change inside the company. There’s no difference to productivity. People feel faster. They feel more productive. They feel happy with the technology. All good things, right? But as far as an enterprise is concerned, nothing really shifts. And I think a lot of companies go through this phase. And what I find that doesn’t work is if you continually ask the question of what do you want to do with AI? You’re going to stay in that phase. So, when you get to the next stage, and where I noticed a shift personally, is when you start asking a question or saying, “This is what we’re going to do with AI, or this is the problem we’re going to solve with AI.” I think it breaks that paralysis, that decision, or constant discovery that you have in regards to what outcome you’re trying to achieve. And you start shifting into a realm where enterprises and we definitely have seen this, where they basically focus on this reality of “Oh, now I have a tangible outcome because I sought to solve a problem.” And so, that’s where I find it gets into the second stage of what works is can you articulate what you’re trying to solve? Discovery is good, but solutioning is better.

Itiel Shwartz: Let’s give an example. Like, maybe like give us like a couple of examples of things that you tried. Again, like if it’s legal for you to say, and it didn’t turned out like as well as you guys thought, and like maybe like why? Maybe like a couple of like real examples of I tried building X and it failed. And you can also bring like two or three like good examples, right? And then we did whatever, and then everyone was super happy.

Blake Sherwood: Yeah. so, I think one that really kind of fell on its face a little bit. and I mean experimentation is very very good thing. But that’s a good observability. and correlation. we’ve had a couple of different experiments where we’ve seek to achieve processing all our logs. And Smarsh has a lot of logs. we have north of 41 terabytes a day. And getting insight, getting decisions that you need to make, things like anomaly detection, things like observability. I think this is a really easy space people tend to gravitate towards to with AI. At that volume, you’re looking at a couple of different ways to try and gain insight. Whether you’re going through your typical RAG approaches or different ways to kind of interpret the data. Or you’re going through a processing environment where you’re prepending a lot of decision frameworks around it. We spent a lot of time focusing on how to better gain insight of things that went wrong. Whether this was an engineering activity, whether this was a product activity, whether this was a solutions architect activity, it’s kind of irrelevant at this point. It’s we have so much data in the observability space. We we needed to find a way to actually decide how to do something with it because just like the way our customer data is very valuable to our customers, our observability data is very valuable to our business. I think probably about four or five attempts at processing agents in a way that would obviously be cost effective. Was a very very big thing inside AI. and getting the right amount of data out to make the right amount of decisions. too much false positives. or false positives, sorry. Not negatives. The way we have and process our data needed to change. We figured out that while we were putting a lot of AI technologies on it, we had to make a simple choice, solve the information dilemma. And we moved into an area where we actually had to solve some poor practices over course of 10 years of getting out of a logging data in a static means or a standard format or a prescribed direction. And it allows a lot of technologies like AI to play a little nicer and it ends up having outcomes where costs is a more appropriate as well and doesn’t hit the bottom line. Once we did that and shifting from the obsession on AI for implementing AI’s sake and realizing that we needed to solve some practices, things went a little bit smoother. It’s what are you trying to play AI to rather than, you know, flipping it, we’re going to retroactively rig AI into an ecosystem that needed to be cleaned up a little bit. And for us it was just pretty much standard logging concepts. structured logging implementations and standard libraries and we have a lot of systems, so that took a while. So going into the positive ones, we get a lot of heavy interaction in things like deployment. It’s can you do things in a manner that is consistent taking deterministic workflows into a non-deterministic environment and how that works out. We’re spending a lot of time right now and this is getting where we’re kind of work in progress, so I won’t get into deep depths with it, but building out a workflow orchestration system that takes business value and those signals from things like AI SRE and makes decisions from a a human-guided perspective as well. And when we have the size of the systems we are, we want to make better informed decisions. And this isn’t about AI doing the things for us, we still are the final step in this chain. But going back to our observability data and about a volume that we have, it’s how do we make better decisions from a deployment and orchestration perspective for the for the end user. Can we tighten our maintenance windows to be less disruptive? Can we focus on things where a feature’s going out this week that really should go out the next week because we’ve got a major customer event. so that’s more of a bit of a work in progress side of things.

Itiel Shwartz: Did you change like your infrastructure to match AI expectation or needs? Like did you change that way? Did you Maybe you can share about that cuz even internally we’re like having a lot of discussions on that topic, so if you can

Blake Sherwood: Yeah. We had to change a lot of like the observability one’s an easy one to talk to. We had to change the way we ingested data. we had to look at our external vendors, the way we work with and store data from a gamut of different reasons. We had to look to internal supplier activities through AWS. We had to make a lot of decisions in regards to who do we want to work with and where do we want to store the data. and a lot of the times it’s not practical to bring things in house. I think if the last 10 years have taught me personally anything, the whole build versus buy conversation’s a big deal. but there are certain aspects and certain things that lean into your outcomes like observability data for AI implementations. So your product facing offerings to your customers, the amount of metrics and log data that is being produced out of those activities caused us to have to build on-premises logging and aggregation systems because the latency means were insane. we focus on a lot of those different activities. for a lot of our infrastructure, but it even went into how do we manage the virtual our AWS infrastructure setup, how we did our org structures from 5 years ago that were recommended to implement don’t necessarily lean into an AI first world. So, we make small minor adjustments that focus on things like security and identity. And we’re currently still in progress of changing a lot, you know, at a pace that’s sustainable and non-risk, but focusing on small specific changes that allow AI to excel. But the big one for us was things like that are constantly overlooked, network metrics, logs, latency, where your data’s stored, how fast can you get it back out. Making those decisions up front early allowed us to really take advantage of AI in an current environment.

Itiel Shwartz: Did you change the people that you hire, Blake, or actually the company changed the kind of people that you hire or train? Like again, like it’s something that we hear more and more. Like did something change about like hiring process or promotion processes or anything else given that revolution?

Blake Sherwood: I have a lot of conversations about this one recently. I think there’s a degree of the talent acquisition space is obviously getting its own impacts. But the question is always asked and has always been asked of me, do we hire people only with AI experience? And then a flip is do we hire people that have no AI experience and then teach them? How do we tackle this as a company? How do we develop? How do we not only give people a place to learn and grow, but also have a healthy expectation of we need outcomes in the space just like anyone else. I think it’s still an open question. we definitely don’t And this is last conversation I had. Think that hiring specific AI talent right now is best for the solutions that we saw them working in. And for a lot of ways, having someone coming out of platform engineering that has an aptitude for AI, having someone coming out of like level two tech support has an aptitude for AI. The way that we think about it is these roles and opportunities and people that have been sitting in these adjacent or next to industries for quite some time are often the best people to train and use AI because of the way they think about the problem. So, technology, like a lot of things inside coding, is it getting a little bit commoditized? You can teach people coding a lot faster nowadays, especially with the advent of AI. But you can’t teach analytical skills. And often we found, at least from my experience, the best talent comes from industries that you would not think to look at. I’ve made two hires over the last 3 years, which I’m quite particularly happy with. One came from retail. And one came from medical. And they excel very, very well on AI now. It’s just the way they think. So, I think there’s no hard and fast rule. I think it comes down to a lot of how we perceive a value, how we perceive problems, how we perceive outcomes. What’s

Itiel Shwartz: No, like it’s a it’s a great answer. And by the way, like we’re all talking the same problem, same issues. I think a lot of the more like senior folks, you know, we need to understand like what things should we should we change across across the span space? And maybe I’ll ask you like what do you think are like the main blockers or like the main challenges that currently we face as an industry or are you facing in smart like whatever you choose your framing but like what’s the biggest like hardest that currently prevents us to like leverage AI even more?

Blake Sherwood: I think one right now is compute. my the speed at which conversations I’m having right now it seems the world continues to run out of compute. And I only bring that up because Anthropic just went through a recent outage and I sat there thinking my code code’s down. What do I do? And I had a mental block there for 20 minutes and realized I couldn’t remember how to do a simple line of code. And I think that’s a little scary. I’ve I mean as I mentioned I’ve been doing this for a really long time and I genuinely couldn’t write an opening line. I I sat there and panicked a little bit and I went oh yeah I remember how to do this and I went to Google. what I used to do for a long time and I figured it out. the challenge is for the industry I think is that’s going to happen a lot more. things we take for granted right now and I think we see a little bit of this at smart watches as well. Those that adopt AI at such speed you could argue in one way we’re losing trust in ourselves. And I think that’s an industry-wide issue. the way we perceive technology and like I mentioned AI was the first one to come back up its claims. I much think it might have been the same way back when platform engineering hit. Cuz platform engineering hit and said it was going to allow us so you would never have to do another pipeline ever again. Never have to build a deployment script for a database ever again. And I don’t think that quite eventuated. I think there was a lot of SME technologies out there that tried to address it, but I think you still had to fall back on to donate ability. But I think the scariest thing that we’re going to have to realize is as these technologies align with us and how we use them, we’re going to start not having that muscle memory on how to do simple things. Much like the way I would go into and sit down with some juniors and teach them how to do things and give them best practices and show them what I do, I think it’s going to get to a point where you’re going to have to have AI train IC1s and IC2s because I think your seniors and your principals and as we get further down the road, another 10 years from now, I think there’s going to be a bit of a harsh reality where you’re going to have a lot of people sitting there going, “Well, I don’t have access to AI. I don’t know how to do this anymore.” And I think that’s a big big fear of mine personally, and I think that’s a big big fear of people online, frankly, as well. I think the technology is here to stay. it’s just we have to change how we think and how we interact with it.

Itiel Shwartz: Good good words. Good good words. Maybe like to to finish it up, maybe give some predictions on where are we heading for? We talked about it like between the lines, but what do you think are like the biggest revolutions that are going to happen over the next couple of years?

Blake Sherwood: I think the one thing I’m really excited for and I’m starting to see, personal agents that work I think OpenClaw or whatever its latest rebrand was was a big step in that direction. the frontier labs are starting to tackle that problem. we’re starting to see some items that are getting released now that kind of leaning in that direction. But I think one thing that really stands out for me going beyond personal agents is managed frameworks and managed platforms. I think if you look back at things like DevOps, platform engineering, the promise was real, the technology didn’t kind of keep up. I think with AI, it’s going to make those promises become reality. It’s not necessarily much that you’re using AI to solve problems that those areas tried to solve. I think it’s going to supercharge things like platform engineering. I think it’s going to supercharge things like AI SRE or concepts in that regard. I think it’s going to supercharge a lot of different things. It’s going to bridge communication. It’s going to bring bridge business value and it’s going to recenter the conversation not so much in how many lines of code you’re going to run. It’s what outcomes changed. What outcomes for a business simply move the needle. I think that’s going to come from implementation or building of managed platforms that take those concepts that we’ve been talking about for the last 10 years and make them real. It’s not so much going to be about AI. It’s going to be about how AI infused what we’ve been talking about and what made this technology revolution possible become better.

Itiel Shwartz: No, good good ending words. I can click in that where we went to pick Blake. Thank you for coming. And always good to have you talking.

[Music] Kubernetes for Humans.

This is an AI generated transcript of the conversation

About the Guest

Blake Sherwood
Distinguished Technologist, Smarsh
Blake Sherwood is a Distinguished Technologist at Smarsh, where he works across engineering, product, sales, and marketing on AI and platform initiatives in a highly regulated compliance and e-discovery environment that processes north of a couple of petabytes a day across five clouds and roughly seventeen colo facilities. Over a 25-year career, he has moved from tourism web work and on-board SCADA systems for coal mining in Australia into DevOps, platform engineering, architecture, M&A, HRIS, background screening, and fintech. At Smarsh, where he is approaching five years, he led a nine-month migration of more than 20,000 independently deployed services from Pivotal Cloud Foundry onto Kubernetes and now focuses on AI SRE, workflow orchestration, and platform engineering modernization.