Komodor Blog

All articles
Page 2
Welcome to Komodor's blog, your go-to resource for insights on all things Kubernetes. Stay tuned for expert advice, in-depth tutorials, and the latest industry trends to help you throughout your K8s journey.

AI SRE in Practice: Accelerating Engineer Onboarding with Contextual Expertise

6 min read

Part 7 of our AI SRE in Practice Series. This scenario walks through how AI-augmented knowledge transfer changes the onboarding experience, using a real example from a containers team implementing changes to HiveMQ infrastructure.

AI SRE in Practice: Diagnosing AWS CNI IP Exhaustion Before Widespread Outage

6 min read

Part 6 of our AI SRE in Practice Series. In this scenario we walk through an AWS CNI IP exhaustion incident where 15 services experienced outages before platform teams identified the root cause.

klaudia-blueprints-knowledge-base

Contextualizing AI SRE: How Klaudia Leverages Organizational Knowledge

5 min read

For an AI SRE to be safe and effective, it cannot rely on generic training data alone. It needs context. Klaudia solves this through a dual-layer approach to context engineering: the Organization Blueprint and the Knowledge Base Integration.

AI SRE in Practice: Tracing Policy Changes to Widespread Pod Failures

6 min read

Part 5 of our AI SRE in Practice Series. This scenario walks through a policy enforcement incident where a seemingly minor configuration change caused widespread pod failures that required deep investigation across the cluster to understand the scope and root cause.

mcp-komodor-klaudia-ai-sre

From Blueprint to Production: Building a Kubernetes MCP Server

3 min read

This post details how to build an MCP server that connects AI agents (like Claude Desktop or Cursor) to a Kubernetes cluster, enabling natural language control over kubectl operations.

komodor-ai-sre

Building Trust in the Machine: A Guide to Architecting Agentic AI for SRE

5 min read

This article explores the technical realities of building Klaudia, an agentic AI solution for Cloud-Native infrastructure.

Market Guide for AI Site | Komodor

Komodor Named a Representative Vendor in the 2026 Gartner® Market Guide for AI Site Reliability Engineering Tooling

3 min read

Komodor Named a Representative Vendor in the 2026 Gartner® Market Guide for AI Site Reliability Engineering Tooling Komodor's AI SRE platform helps organizations maximize uptime, reduce cloud costs, and simplify operations across complex, cloud-native environments

Komodor AI SRE vs. OSS AI Agent: A Technical Comparison of Agentic AI for Kubernetes Troubleshooting

6 min read

When a new, competing open-source Kubernetes troubleshooting agent was launched, we thought it would be a good idea to put both tools through identical real-world failure scenarios our customers typically encounter. The objective was to benchmark Klaudia Agentic AI and the open-source AI agent, and compare their performance across common Kubernetes failure scenarios.

klaudia-blueprints-knowledge-base

AI SRE in Practice: Resolving Node Termination Events at Scale

6 min read

Part 4 of our AI SRE in Practice Series. In this part we examine what happens when a node terminates unexpectedly, and dealing with the harder question of why it happened and how to prevent it from happening in the future.