Komodor is an autonomous AI SRE platform for Kubernetes. Powered by Klaudia, it’s an agentic AI solution for visualizing, troubleshooting and optimizing cloud-native infrastructure, allowing enterprises to operate Kubernetes at scale.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Guides, blogs, webinars & tools to help you troubleshoot and scale Kubernetes.
Tips, trends, and lessons from the field.
Practical guides for real-world K8s ops.
How it works, how to run it, and how not to break it.
Short, clear articles on Kubernetes concepts, best practices, and troubleshooting.
Infra stories from teams like yours, brief, honest, and right to the point.
Product-focused clips showing Komodor in action, from drift detection to add‑on support.
Live demos, real use cases, and expert Q&A, all up-to-date.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Here’s what they’re saying about Komodor in the news.
K8sGPT is a tool that uses large language models (LLMs), including those from OpenAI, Azure, Cohere, Amazon Bedrock, Amazon Sagemaker, Google, and Vertex, to improve the management and automation of Kubernetes clusters. It also integrates with open source Large Language Models (LLMs) like Meta LLaMA for on-premises use.
Kubernetes is an open-source platform used to automate the deployment, scaling, and operation of application containers. K8sGPT integrates with Kubernetes to provide intelligent insights, automating routine tasks, and improving operational efficiency.
K8sGPT uses LLMs to analyze logs, monitor performance metrics, and predict potential issues before they escalate. This helps in maintaining the health and performance of Kubernetes clusters, reducing downtime, and ensuring optimal resource utilization.
K8sGPT is open source under the Apache 2.0 license. It has over 5K GitHub stars, over 80 contributors, and has been accepted as a Sandbox project by the Cloud Native Computing Foundation (CNCF).
You can get K8sGPT from the official GitHub repo.
The tool offers the following features:
K8sGPT operates similarly to an experienced Site Reliability Engineer (SRE), providing continuous monitoring and analysis of Kubernetes clusters to detect anomalies and potential issues. It starts with a data collection process where it selectively gathers information from the clusters. It ensures that only relevant data is used, maintaining privacy and security by anonymizing collected data and filtering out unnecessary information.
Once the data is collected, K8sGPT uses the LLM of your choice to interpret and analyze the information, much like an SRE would. For example, if a pod isn’t running, K8sGPT checks the event stream to identify possible causes, such as a missing service account in a replica set. This allows it to generate precise problem explanations using generative AI models, sometimes uncovering issues that even seasoned SREs might overlook.
K8sGPT supports integrations with OpenAI, Azure, Cohere, Amazon Bedrock, Amazon Sagemaker, Google Gemini, and Vertex. It anonymizes pod names before sending prompts to these providers, ensuring data security. It also allows connections to local models, catering to organizations that prefer not to send their data externally.
Itiel Shwartz
Co-Founder & CTO
Based on my experience, here are a few ways to make better use of K8sGPT in your organization:
Beyond default metrics, set up custom metrics relevant to your applications to allow K8sGPT to provide more precise insights and recommendations.
Use K8sGPT to automate security scans of your containers and Kubernetes configurations, ensuring compliance with best practices and identifying vulnerabilities.
Combine K8sGPT with tools like OPA (Open Policy Agent) to automate the enforcement of policies across your Kubernetes environments, ensuring consistency and security.
Configure K8sGPT to alert on detected anomalies in real time, enabling quicker response times to potential issues.
For large-scale Kubernetes environments, consider fine-tuning custom LLMs tailored to your specific Kubernetes workloads and integrate them with K8sGPT for specialized insights.
This tutorial is adapted from the official K8sGPT documentation.
To install K8sGPT on your Linux or Mac machine, you will use Homebrew, a popular package manager. Follow these steps to ensure a smooth installation process:
https://brew.sh/
https://docs.brew.sh/Homebrew-on-Linux
brew tap k8sgpt-ai/k8sgpt
install k8sgpt
To try out K8sGPT, you need a Kubernetes cluster. You can set up a local cluster using tools like KinD (Kubernetes in Docker) or Minikube. Below are the steps for setting up a KinD cluster, which is useful for local testing:
brew install kind
minikube start -p k8sgpt-demo
To leverage the AI capabilities of K8sGPT, you need to authenticate with an AI provider, such as OpenAI. Follow these steps to authenticate with OpenAI:
k8sgpt generate
k8sgpt auth add --backend openai --model gpt-4-turbo
K8sGPT provides a variety of commands to interact with and analyze your Kubernetes cluster. You can view all available commands and their usage by running k8sgpt --help. This will display a list of commands along with a brief description of each.
k8sgpt --help
Ensure you are connected to the correct Kubernetes cluster before analyzing it with K8sGPT. For this example, use the KinD cluster you set up earlier:
1. Check the current Kubernetes context and ensure you are connected to the KinD cluster:
kubectl config current-contextkubectl get nodes
This will display the current context and the nodes in your cluster.
2. To demonstrate K8sGPT’s capabilities, create a pod with an intentional error. Create a new YAML file named bad-pod.yml with the following contents:
bad-pod.yml
apiVersion: v1kind: Podmetadata: name: bad-pod namespace: defaultspec: containers: - name: broken-pod image: nginx:1.19.6 ports: - containerPort: 80 protocol: TCP readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 5 periodSeconds: 5
In this configuration, the readiness probe is set up incorrectly. It should be a liveness probe to check if the pod is alive, not ready. This will cause an error as the readiness probe will continuously fail.
3. Apply this configuration by running kubectl apply -f bad-pod.yml
kubectl apply -f bad-pod.yml
4. Use k8sgpt analyze to analyze the cluster and identify issues. This command will scan the cluster and list any detected problems. For the broken pod example, it will highlight the error related to the incorrect container image.
k8sgpt analyze
5. To explore additional flags and options for the analyze command, use k8sgpt analyze -h.
k8sgpt analyze -h
6. For a detailed explanation of the issues, run k8sgpt analyze --explain.
k8sgpt analyze --explain
Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.
Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance.
By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Share:
Gain instant visibility into your clusters and resolve issues faster.