• Home
  • Komodor Blog
  • Contextualizing AI SRE: How Klaudia Leverages Organizational Knowledge

Contextualizing AI SRE: How Klaudia Leverages Organizational Knowledge

The escalating complexity of modern Kubernetes and cloud-native environments presents a challenge that raw configuration data, or “manifests,” simply cannot solve. To move beyond reactive automation and achieve true autonomy, an AI SRE must understand not just what is deployed, but why. 

Komodor’s autonomous agentic AI SRE, Klaudia, understands that not all incidents are the same, large-scale environments vary significantly, and deterministic approaches to troubleshooting cannot keep up with the unique intricacies of different enterprise infrastructures.   

Klaudia provides engineers with this vital architectural context through a dual-layer approach, combining the Organization Blueprint for structural knowledge with a Knowledge Base for procedural truth.

The Context Gap: Why Kubernetes Data Isn’t Enough

Kubernetes manifests tell you what is deployed, but they rarely explain why. They don’t capture the architectural dependencies, compliance boundaries, or “tribal knowledge” that live in the heads of senior engineers.

For an AI SRE to be safe and effective, it cannot rely on generic training data alone. It needs context. Klaudia solves this through a dual-layer approach to context engineering: the Organization Blueprint and the Knowledge Base Integration.

1. The Organization Blueprint: Structural Truth

The Organization Blueprint is a concise, declarative document that captures the “invisible” logic of your infrastructure. Unlike standard RAG (Retrieval-Augmented Generation) implementations, where chunks of text are retrieved upon query, the Blueprint is loaded in full into every session. It acts as a permanent “cheat sheet”, injected into all investigations, and informs every decision Klaudia makes – Ensuring her reasoning aligns with your specific business rules and architectural constraints.

It defines:

  • External Dependencies: Critical links that aren’t visible inside the cluster, such as Crossplane-managed databases or external APIs (e.g., “The payment service depends on an external fraud detection API”).
  • Topology & Compliance: Multi-region failover rules and data residency constraints (e.g., “User data must never leave prod-eu-west due to GDPR”).
  • Operational Constraints: The why behind configurations, such as specific resource requests required for model loading rather than traffic load, or namespaces that require CAB approval before changes.

2. The Knowledge Base: Procedural Truth

While the Blueprint defines structure, the Knowledge Base defines procedure. Klaudia integrates with your existing documentation (Confluence, Notion, Wikis) to index runbooks, post-mortems, and troubleshooting guides.

This layer utilizes Semantic Search. When an investigation matches a known pattern, Klaudia queries this vectorized index to retrieve specific “how-to” steps. Instead of offering a generic Kubernetes fix, Klaudia can surface your organization’s specific protocol, for example, extracting the exact steps to restart a proprietary queue from a legacy runbook.

The “Senior SRE” Mental Model

The distinction between these two layers is critical for preventing hallucinations and unsafe actions:

  • The Knowledge Base answers: “How do I restart the payment service?”
  • The Blueprint answers: Do not restart the payment service during business hours.”

By combining these layers, Klaudia passes the “Mirror Test” – reaching the same conclusion a human expert would when provided with the same tools and context. It moves AI from a generic chatbot to a context-aware partner that understands not just Kubernetes, but your Kubernetes.

Synthesis: Translating Tribal Knowledge into AI Insights

By combining K8s manifests with the Org Blueprint and internal Knowledge Base, Klaudia moves from reactive automation to autonomous operations. This synthesis allows the AI to interpret “invisible” data.

  • External Dependencies
    • The Manifest Sees: A standard internal service named payments-service.
    • The Blueprint Explains: Architectural dependency: payments-service egress to the external Fraud-Detection API is a hard requirement for transaction processing logic.
  • Infrastructure Beyond the Cluster
    • The Manifest Sees: A standard service endpoint.
    • The Blueprint Explains: This is an AWS RDS instance managed via Crossplane and injected via External Secrets Operator. Issues may require checking the Crossplane Claim status or AWS health.
  • Multi-Cluster Logic
    • The Manifest Sees: A pod running in the prod-eu-west cluster.
    • The Blueprint Explains: GDPR Residency Constraint: This workload contains EU customer data and must never fail over to the US primary cluster.
  • Scaling Constraints
    • The Manifest Sees: High memory requests (16Gi) for an idle service.
    • The Blueprint Explains: This service requires 16Gi at startup specifically to load a heavy ML model; reducing requests will cause an OOMKill loop regardless of traffic.
  • Compliance & Operational Rules
    • The Manifest Sees: Taints on a specific node group.
    • The Blueprint Explains: These are PCI-regulated nodes. SOC2 compliance requires 99.9% uptime on the auth-service, making any automated restart a high-risk action requiring manual CAB approval.

Decoding the Architecture: Org Blueprint vs. Knowledge Base

Organizations often confuse the Org Blueprint with a standard Knowledge Base. While both are critical, they serve distinct roles in the “Contextual Layer” of Klaudia’s architecture.

FeatureKnowledge Base (The ‘How’)Org Blueprint (The ‘Why’)
Content TypeProcedural: Runbooks, guides, and past postmortems.Structural: Architectural truth, dependencies, and topology.
Size & ScopeExtensive: MBs of data across hundreds of documents.Focused: A single document, strictly under 10KB.
Usage by AISearches and retrieves snippets when a pattern matches.Loaded in full; always present to inform reasoning.
Mental ModelA searchable library for reference.A permanent cheat sheet on the desk.
Question Answered“How do I restart the payment queue?”“Is it safe to restart this service right now?”

Best Practices for Blueprint Construction

The Org Blueprint is the “architectural truth” of your system. It is the encoded version of the tribal knowledge that lives in your senior engineers’ heads. It defines the relationships and constraints that are invisible to a standard kubectl get command.

The Blueprint is defined by three specific characteristics:

  1. Conciseness: It is a focused, high-level structural document that must be under 10KB.
  2. Persistence: Unlike a Knowledge Base which is searched on-demand, the Blueprint is loaded in full into every AI session, ensuring it is always “top of mind.”
  3. Purpose: It explains the logic behind configurations, such as why a Pod Disruption Budget (PDB) exists or why a specific namespace has strict taints.

Effective Blueprints focus on high-level structural truth rather than procedural steps.

The Do’s and Don’ts

  • DO: Document the “Invisible”—business logic, external API relationships, and the “why” behind resource requests.
  • DO: Write for a “Smart Stranger”, a senior SRE who understands Kubernetes but is new to your specific environment.
  • DON’T: Include step-by-step commands (e.g., kubectl delete pod). These belong in the Knowledge Base.
  • DON’T: Repeat data found in manifests. Klaudia already knows your Pod labels and Service names.

Conclusion: The Result of Context-Aware AI

By fusing the technical “what” from Kubernetes manifests with the Organization Blueprint’s structural “why” and the Knowledge Base’s procedural “how,” Klaudia transforms into a context-aware, autonomous partner. This synthesis of tribal and operational knowledge allows the AI to interpret “invisible” constraints, prevent unsafe actions, and consistently pass the “Senior SRE Mirror Test.” The result is a radical reduction in incident resolution time, moving organizations toward a fully autonomous stage of SRE operations. Klaudia achieves unparalleled operational sophistication by ingeniously fusing three critical layers of organizational knowledge. It starts with the technical “what,” derived directly from raw Kubernetes manifests and operational telemetry, providing a precise, real-time snapshot of the infrastructure’s state and configuration.

This technical foundation is then elevated by the integration of the Organization Blueprint. This blueprint represents the structural “why”—the codified, desired state of the organization’s architecture, compliance standards, security policies, and performance SLOs. It is the repository of tribal knowledge regarding architectural intent, risk tolerance, and governance.

Finally, the Knowledge Base supplies the procedural “how.” This encompasses documented runbooks, post-mortems, historical incident data, and established best practices for deployment, remediation, and maintenance.

This unique synthesis transforms Klaudia from a mere automation tool into a context-aware, autonomous operational partner. By cross-referencing the active manifest (‘what’) against the codified intent (‘why’) and historical wisdom (‘how’), the AI can interpret complex, “invisible” constraints—such as unspoken inter-service dependencies, unwritten capacity planning rules, or deeply embedded organizational anti-patterns. This deep contextual understanding allows Klaudia to proactively prevent actions that, while technically valid, would be operationally unsafe or non-compliant.

The ultimate measure of this intelligence is its ability to consistently pass the “Senior SRE Mirror Test.” This means any action Klaudia takes, or any recommendation it makes, is indistinguishable from the judgment of the most experienced Senior Site Reliability Engineer. The practical result is a radical, demonstrable reduction in Mean Time To Resolution (MTTR) and Mean Time To Detection (MTTD), effectively decoupling organizational growth from the linear scaling of SRE headcount, and propelling organizations toward a fully autonomous stage of Site Reliability Engineering operations.