Home
Customer Stories
How Sophos Reduced Kubernetes MTTR and Unlocked Developer Velocity

How Sophos Reduced Kubernetes MTTR and Unlocked Developer Velocity

Company Size:

6,000 employees

Industry:

Cybersecurity

Komodor Installation:

10-100 clusters on AWS

67%

Reduction in MTTR

47%

Increase in development velocity

79%

Faster K8s onboarding

“The biggest benefit has been the unification of key observability signals, enriched with infrastructure and release events. When issues arise, Klaudia AI saves valuable time and has proven to be an exceptional support tool for troubleshooting our environments.”

Matija Topfer

Senior Cloud Architect

About Sophos

Sophos is a cybersecurity leader defending 600,000 organizations globally with an AI-driven platform and expert-led services. Its solutions combine machine learning, automation, and real-time threat intelligence with frontline human expertise from Sophos X-Ops to deliver advanced, 24/7 threat monitoring, detection, and response.

Sophos offers industry-leading managed detection and response (MDR) alongside a comprehensive portfolio of cybersecurity technologies — including endpoint, network, email, and cloud security, extended detection and response (XDR), identity threat detection and response (ITDR), and next-gen SIEM. Together with expert advisory services, these capabilities help organizations proactively reduce risk and respond faster, with the visibility and scalability needed to stay ahead of evolving threats.

The Problem

Sophos was embarking on an ambitious project to launch a new, innovative, proprietary product. This new product was designed to be Kubernetes-native from end to end. Since this was a greenfield project, Sophos aimed to implement it correctly from the start and fully embrace cloud-native principles. This initiative also served as a pilot to evaluate Kubernetes for future product developments.

Introducing a new container orchestration platform always introduces technical and strategic risks. The successful outcome would be key to unlocking unprecedented development velocity and time-to-market for the organization.

The Challenge

For Sophos, uptime and Mean Time to Repair (MTTR) were critical metrics that had to be optimized. Historically, identifying and triaging the root causes of incidents was a time-consuming process. The challenge was compounded by the fact that traditional observability tools often missed crucial event data like configuration changes, parameter updates, or deployments, making it difficult to correlate cause and effect, which impacted both operational efficiency and customer satisfaction.

The dedicated Incident and Problem Management Team (RCA), responsible for handling L1 and L2 incidents, built playbooks for some of the more mature services. However, this process still involved manual efforts and escalations, which ultimately failed to meet the K8s challenge.

The introduction of Kubernetes brought additional complexity, and while general observability was covered with the existing tools, they were missing Kubernetes-specific management and observability capabilities. Moreover, different teams within Sophos had varying needs, necessitating a solution that could cater to diverse requirements while maintaining robust security and reliability.

The Solution

Komodor emerged as the ideal partner for Sophos, providing a comprehensive solution to address their multifaceted challenges. Komodor’s platform offered robust monitoring and observability capabilities for the new application, ensuring operational availability and reliability from the outset. The platform provided Enhanced Visibility and Context by collecting both the service’s current state and crucial events (changes, updates, parameter modifications), which proved a key differentiator over traditional tools that only offered logs, metrics, and traces. This enhanced context and the Single Pane of Glass view for workload and infrastructure health across all services drastically reduced the need to check multiple, individual dashboards.

The platform is being utilized by, and providing genuine value to, Site Reliability Engineers (SREs), DevOps teams, and developers across the organization, facilitating a unified approach to Kubernetes management. For the Sophos team, the single, unified view provided time savings in routine checks, allowing engineers to quickly check the system status, which was described as a “huge time saver and a platform confidence builder.” Furthermore, Komodor promoted developer enablement by translating complex Kubernetes insights into a developer-friendly language, offloading basic troubleshooting from DevOps/SRE teams, and allowing developers to own their workloads and troubleshoot independently.

Komodor’s involvement reduced the time required for Sophos to be ready on Day 1, providing a crucial head start over potential delays. This expedited the release and go-to-market strategy for their new product.

The platform simplified Kubernetes management through centralized monitoring, multi-cluster/cloud/hybrid capabilities, and robust Role-Based Access Control (RBAC) and Kubernetes user management features. The RBAC capabilities also provided a layer of security and auditability, which was preferred over direct kubectl access, allowing Sophos to audit all changes to pinpoint the source of issues. By enabling faster detection, investigation, and remediation of incidents, Komodor directly contributed to uptime maintenance, quickly becoming an “invaluable” tool in their day-to-day tasks.

To further optimize incident response, Sophos leveraged Klaudia, Komodor’s autonomous AI SRE. Klaudia acts as an “upgrade” to our human engineers, automatically performing correlation and analysis and eliminating the need for engineers to manually search for information and build a correlation map in their minds. This delivered a massive time-saver in troubleshooting, reducing the workflow from checking logs and metrics to simply clicking on the Klaudia tab, which provides a full analysis in 30 seconds to a minute. The analysis is always “on spot,” providing a high-quality, complete analysis that sometimes requires no further human input or troubleshooting. Crucially, the automated RCA also provides detailed remediation guidance.

This collaboration not only facilitated a successful Kubernetes migration but also laid a solid foundation for future Cloud-Native projects within Sophos.

How Sophos Reduced Kubernetes MTTR and Unlocked Developer Velocity

About Sophos

The Problem

The Challenge

The Solution

Related Case Studies

How Forter Reduced Cloud Native MTTR and Engineering Toil with AI SRE

How ControlUp Scaled Kubernetes Adoption and Removed DevOps Bottlenecks with Komodor

How Nelnet Transformed Multi-Cloud Kubernetes with Komodor

Get started with Komodor

Get started with Komodor

AI SRE Summit 2026

You're In!