Komodor Blog

All articles
Page 3
Welcome to Komodor's blog, your go-to resource for insights on all things Kubernetes. Stay tuned for expert advice, in-depth tutorials, and the latest industry trends to help you throughout your K8s journey.

Komodor AI SRE vs. OSS AI Agent: A Technical Comparison of Agentic AI for Kubernetes Troubleshooting

6 min read

When a new, competing open-source Kubernetes troubleshooting agent was launched, we thought it would be a good idea to put both tools through identical real-world failure scenarios our customers typically encounter. The objective was to benchmark Klaudia Agentic AI and the open-source AI agent, and compare their performance across common Kubernetes failure scenarios.

klaudia-blueprints-knowledge-base

AI SRE in Practice: Resolving Node Termination Events at Scale

6 min read

Part 4 of our AI SRE in Practice Series. In this part we examine what happens when a node terminates unexpectedly, and dealing with the harder question of why it happened and how to prevent it from happening in the future.

Market Guide for AI Site | Komodor

Komodor Appoints Ziv Harfenist as Chief Financial Officer 

2 min read

Komodor, the autonomous AI SRE platform for cloud-native infrastructure and operations, today announced the appointment of Ziv Harfenist as Chief Financial Officer (CFO) and the promotion of Yogev Goldis to Chief People Officer (CPO).

AI SRE in Practice: Diagnosing Configuration Drift in Deployment Failures

5 min read

Part 3 of our AI SRE in Practice Series. In this part we cover how an AI SRE helps diagnose configuration drift in deployment failures.

AI SRE in Practice: Resolving GPU Hardware Failures in Seconds

4 min read

Part 2 of the AI SRE in Practice Series. In this post we discuss: Resolving GPU Hardware Failures in Seconds

When is it ok or not ok to trust AI SRE with your production reliability?

3 min read

This series demonstrates what AI SRE trained on real workloads actually looks like in practice. We're going to walk through real troubleshooting scenarios that our customers encounter daily, showing the before and after of AI-powered investigations.

From Promise to Practice: What Real AI SRE Can Actually Do When Production Breaks

4 min read

This series demonstrates what AI SRE trained on real workloads actually looks like in practice. We're going to walk through real troubleshooting scenarios that our customers encounter daily, showing the before and after of AI-powered investigations.

7 Kubernetes Predictions for 2026 – AI Will Push SRE to its Limit

3 min read

SRE teams are about to feel even more pressure. GPU-heavy computing is breaking the assumptions today's clusters were built on, while enterprises are beginning to trust autonomous operations and cost pressure is pushing consolidation across the cloud-infrastructure stack. Based on these forces, here are my 2026 Kubernetes predictions as well as some best practice recommendations to help platform teams prepare for what reliable operations will mean next year. 

Kubernetes v1.35: The Release That Tackles the Industry’s $100 Billion Waste Problem

8 min read

There's a bigger story here that every platform team needs to understand: K8s is finally acknowledging that cluster utilization is fundamentally broken.