Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
Amazon Elastic Kubernetes Service (Amazon EKS) is a fully managed Kubernetes service provided by Amazon Web Services (AWS). It helps manage your containerized applications in the cloud as well as on-premises.
EKS monitoring involves observing and tracking the performance and health of your EKS clusters. Effective monitoring is crucial for identifying problems before they escalate into major issues, ensuring high availability and optimal performance of your Kubernetes workloads, and understanding how to improve performance and utilization of your EKS deployments.
This is part of a series of articles about Kubernetes monitoring
Observability is the ability to understand the state of a system by observing its external outputs. In the context of Amazon EKS, observability involves understanding the state of your EKS clusters by observing output such as logs, metrics, and traces. By ensuring EKS clusters generate the right signals, you can identify issues faster, troubleshoot efficiently, and optimize your clusters for better performance.
Observability in Amazon EKS also involves understanding the dependencies and interactions between workloads within your EKS clusters. This is crucial in a microservices architecture where multiple services are interacting with each other. By understanding these interactions, you can identify bottlenecks and optimize workloads for better performance and reliability.
Achieving observability in EKS requires addressing several layers: the EKS control plane, EKS worker nodes, and the workloads and applications running within your Kubernetes clusters.
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better monitor EKS:
Use Prometheus for real-time metrics collection and Grafana for powerful visualization and alerting. This combination can provide more granular insights compared to native AWS tools.
Deploy Kube-state-metrics to expose Kubernetes cluster-level metrics such as pod counts, deployments, and resource limits. These metrics complement node and system metrics for a complete picture.
Use Fluent Bit to aggregate logs from EKS clusters. It’s lightweight and can be configured to send logs to various destinations, enhancing log management and analysis.
Configure Horizontal Pod Autoscalers (HPAs) to use custom metrics instead of just CPU and memory. This can help scale applications more intelligently based on application-specific performance indicators.
Turn on VPC Flow Logs to capture information about the IP traffic going to and from network interfaces in your VPC. This can help in diagnosing network issues and understanding traffic patterns.
Amazon provides several tools that can help you monitor EKS clusters and achieve observability.
CloudWatch Container Insights is a fully managed observability service that collects, aggregates, and summarizes metrics and logs from your containers. With Container Insights, you can monitor, troubleshoot, and set alarms for your Amazon EKS clusters.
Container Insights provides you with a detailed view of your EKS cluster’s performance, including CPU and memory utilization, network traffic, and disk I/O. It also provides insights into your cluster’s health, helping you identify issues before they affect your applications.
AWS Distro for OpenTelemetry (ADOT) is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. With ADOT, you can collect, correlate, and export telemetry data (metrics, traces, and logs) from your applications and infrastructure, providing a 360-degree view of EKS cluster performance.
ADOT supports a wide range of AWS services and open source tools, allowing you to collect telemetry data from multiple sources and export it to various AWS monitoring tools such as CloudWatch, X-Ray, and more.
Amazon DevOps Guru is a fully managed operations service that uses machine learning to analyze your operational data and provide you with actionable insights. It identifies potential issues and their probable causes, allowing you to proactively address them before they impact your applications.
With DevOps Guru, you can set up anomaly detection for your EKS clusters, and receive alerts when abnormal behavior is detected. It also provides you with recommendations on how to address the detected issues, helping you reduce downtime and improve your application’s performance.
AWS X-Ray is a distributed tracing service that helps you understand how your applications and services are performing and where bottlenecks are occurring. It provides you with an end-to-end view of requests as they travel through your EKS cluster, allowing you to trace their path and understand their impact on your users.
X-Ray’s service maps let you visualize your application’s architecture, showing how services are interconnected and where performance bottlenecks are occurring. This helps you identify issues faster, troubleshoot more efficiently, and optimize your clusters for better performance.
The Amazon CloudWatch Observability Operator for Kubernetes (CW Operator) makes it easy to set up and manage CloudWatch resources for your EKS cluster. It allows you to define CloudWatch Alarms, Dashboards, and Metrics using Kubernetes manifests, making it easier to monitor your clusters.
With the CW Operator, you can automate the process of setting up CloudWatch resources, saving you time and reducing the risk of errors. It also makes it easier to manage your monitoring setup, as you can use the same Kubernetes tools and workflows you are already familiar with.
Learn more in our detailed guide to Kubernetes observability
Kubernetes control plane metrics are an essential part of EKS monitoring. They provide insights about the health and performance of the Kubernetes master components:
Node and system metrics are another critical aspect of EKS Monitoring. They provide insights about the health and performance of the worker nodes in the Kubernetes cluster.
CPU usage, memory usage, disk I/O, and network I/O are some of the key metrics to monitor at the node level. These metrics can help identify resource contention issues, potential bottlenecks, and any anomalies that might affect the performance of the applications running on the nodes.
In addition to the node-level metrics, system metrics like system load, system uptime, and system error rates are also important. These metrics can provide early warning signs of potential system failures.
Application metrics are crucial for understanding the behavior and performance of your applications running on EKS. They can help identify application-specific issues which might not be visible at the Kubernetes or node level.
Key application metrics include request count, error rates, response time, and throughput. These metrics can help identify performance bottlenecks, potential failures, and any anomalies in the application behavior.
If you’re running your EKS clusters on AWS Fargate, there are additional metrics that you should monitor. These metrics can provide insights about the performance and cost-efficiency of your Fargate deployments.
Key metrics for EKS on Fargate include CPU usage, memory usage, network I/O, and storage I/O. These metrics can help identify resource contention issues, potential bottlenecks, and any anomalies that might affect the performance and cost-efficiency of your Fargate deployments.
Monitoring the health of your EKS clusters and applications is crucial for maintaining their performance and availability. This involves monitoring key health indicators at all levels—Kubernetes control plane, node, and application.
In addition to the key metrics mentioned earlier, it’s also important to monitor the status of the Kubernetes objects like pods, services, and deployments. This can help identify any issues with the Kubernetes objects which might affect the health of your clusters and applications.
Log retention is another important aspect of EKS monitoring. By ensuring you have sufficient log retention, you can better troubleshoot issues, analyze trends, and maintain the security of your EKS clusters.
It’s important to configure log retention policies that meet your operational and compliance needs, but do not take up excessive storage space. These policies should define how long the logs should be retained, when they should be archived or deleted, and who should have access to them.
eBPF (extended Berkeley Packet Filter) is a powerful technology that can enhance EKS monitoring. It allows for deep visibility into the Linux kernel without affecting its performance.
Monitoring Kubernetes with eBPF can provide insights about the network communications, file I/O, and system calls at the kernel level. This can help detect issues which might not be visible at the Kubernetes, node, or application level.
Komodor’s platform streamlines the day-to-day operations and troubleshooting process of your Kubernetes apps. Regardless of which Kubernetes Managed Service provider you may be using (and you may be using multiple!), Komodor acts as your single pane of glass for monitoring your Kubernetes workloads, providing enhanced visibility into your clusters and integrating with popular monitoring tools like Datadog, Prometheus or any of the EKS specific tools mentioned above, for clear metric and event visualization. Additionally, it features static monitors that enforce best practices and prevent misconfigurations, and historical data retention that lets you see a complete timeline of events leading up to the current state.
Moreover, Komodor’s Workspace view feature reduces the cognitive load on K8s non-experts by filtering out irrelevant data, ensuring that they stay informed about their app’s performance data and can take swift action when issues arise. By mitigating the overwhelming flow of data that emerges from various dashboards and APMs, Komodor helps end-users own their apps e2e and operate them independently.
To learn more about how Komodor can make it easier to empower you and your teams to troubleshoot K8s, sign up for our free trial.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Share:
and start using Komodor in seconds!