AKS Monitoring: Tools and 5 Critical Best Practices

What Is Azure Kubernetes Service Monitoring? 

AKS monitoring is the process of overseeing and managing the performance and availability of Azure Kubernetes Service (AKS) clusters. It involves the collection and analysis of metrics, logs, and traces from AKS to gain insights into the system’s health and performance.

AKS monitoring allows administrators to keep track of the status of their applications, detect anomalies, troubleshoot issues, and plan for capacity. It can provide visibility into various aspects of AKS, including node performance, pod status, network traffic, and more.

Azure provides a range of tools you can use to monitor AKS, including Azure Monitor, Container Insights, and Azure Log Analytics. These tools provide visibility into your AKS environment, allowing you to diagnose and resolve issues in your AKS clusters.

This is part of a series of articles about Kubernetes monitoring

The Importance of Monitoring AKS 

Monitoring your AKS clusters is vital for several reasons: 

  • Keep workloads running smoothly by detecting and addressing issues before they impact your users. 
  • Obtain real-time insights into the performance and health of applications, enabling you to take corrective action promptly.
  • Ensure applications are meeting performance SLAs. By tracking key performance indicators (KPIs), you can ensure that your applications are delivering the expected level of service.
  • Optimize resource usage. By analyzing performance and usage metrics, you can identify opportunities for improving efficiency and reducing costs. For example, you might find that nodes are underutilized, or that pods are consuming more resources than they should and need to be optimized.
expert-icon-header

Tips from the expert

Itiel Shwartz

Co-Founder & CTO

Itiel is the CTO and co-founder of Komodor. He’s a big believer in dev empowerment and moving fast, has worked at eBay, Forter and Rookout (as the founding engineer). Itiel is a backend and infra developer turned “DevOps”, an avid public speaker that loves talking about things such as cloud infrastructure, Kubernetes, Python, observability, and R&D culture.

In my experience, here are tips that can help you better monitor AKS clusters:

Use Azure Monitor effectively

Configure Azure Monitor to collect and analyze metrics and logs for real-time operational insights.

Enable Prometheus scraping

Enable Prometheus metrics scraping for detailed performance data and alerting capabilities.

Implement control plane logging

Collect and analyze control plane logs for deeper insights into cluster operations and troubleshooting.

Set up network observability

Use Azure Monitor for Networks to track and analyze network traffic and identify potential bottlenecks.

Utilize traffic analytics

Implement traffic analytics to monitor network flows and optimize performance and security.

AKS Monitoring Tools 

Azure provides a range of tools for AKS monitoring. Let’s explore a few of them.

Azure Monitor

Azure Monitor is a comprehensive service that collects, analyzes, and visualizes metrics and logs from your Azure resources, including AKS. It provides real-time operational insights, allowing you to diagnose issues and understand trends.

Azure Monitor integrates with AKS, enabling you to collect metrics and logs from your AKS clusters. It also supports querying and alerting, allowing you to set up alerts based on specific conditions and send notifications when these conditions are met.

Managed Prometheus with Azure Monitor

Prometheus is an open source observability tool built for containerized and Kubernetes environments. Managed Prometheus with Azure Monitor is a fully managed service that provides Prometheus-as-a-Service for AKS. It enables you to collect Prometheus metrics from your AKS clusters and analyze them using Azure Monitor.

Managed Prometheus integrates seamlessly with AKS, allowing you to monitor your clusters using the same Prometheus queries and dashboards you are familiar with. It also supports alerting, allowing you to set up Prometheus alert rules and receive notifications when these rules are triggered.

Microsoft Defender for Cloud

Microsoft Defender for Cloud is a security management tool that integrates with AKS to provide threat protection. It monitors your AKS clusters for potential security threats and provides recommendations for improving your security posture.

Defender for Cloud collects security-related logs and metrics from your AKS clusters and analyzes them using advanced analytics and threat intelligence. It also supports automated responses, allowing you to take quick action when a threat is detected.

Related content: Read our guide to Kubernetes monitoring tools

How to Monitor Your Azure Kubernetes Cluster 

Here are the main steps that will allow you to monitor your Azure Kubernetes clusters.

Enable Container Insights for your AKS Cluster

Container Insights is a feature of Azure Monitor that provides deep insights into the performance and health of your AKS clusters. It collects metrics, logs, and events from your AKS clusters and visualizes them in Azure Monitor.

To enable Container Insights for your AKS cluster, you need to install the Azure Monitor agent on your cluster nodes. This agent collects the necessary data and sends it to Azure Monitor for analysis. Once enabled, you can view the collected data in the Azure Monitor dashboard, where you can analyze it and set up alerts.

Monitor Cluster Performance

Monitoring the performance of your AKS cluster involves tracking key metrics such as CPU usage, memory usage, network traffic, and more. These metrics can provide insights into the health and performance of your cluster.

You can monitor these metrics using Azure Monitor, which collects performance metrics from your AKS clusters and visualizes them in a dashboard. You can also set up alerts based on these metrics, enabling you to receive notifications when certain conditions are met.

Create Container Insights Alert Rules

Container Insights supports alerting, allowing you to create alert rules based on specific conditions. These alert rules can help you detect and respond to issues in your AKS cluster.

To create an alert rule, you need to specify the condition that triggers the alert, the action to take when the alert is triggered, and the recipients of the alert notification. For example, you could create an alert rule that triggers when the CPU usage of a node exceeds a certain threshold and sends an email notification to your operations team.

AKS Monitoring Best Practices 

1. Enable Scraping of Prometheus Metrics for Your Cluster

It’s crucial that you enable scraping of Prometheus metrics for your cluster, as this will provide you with a wealth of information about your system’s performance. Prometheus metrics can provide:

  • Real-time data about your cluster’s CPU usage, memory consumption, and network traffic.
  • Data on the performance of your individual pods and nodes. You can monitor how much CPU and memory each pod is using, and identify any pods that are consuming more resources than they should. 
  • Data on the performance of nodes, ensuring they are distributing workloads evenly and not becoming overloaded.
  • Information about the health of your Kubernetes objects, such as deployments, services, and ReplicaSets. You can track the status of these objects, monitor their availability, and receive alerts if any of them fail.

2. Enable Community or Recommended Prometheus Alerts

There are two templates available in the Managed Prometheus service, which can help you automatically set up alerts for your cluster: 

  • Community alerts are alert rules selected from the Prometheus community. You can use this template if you don’t have any other alert rules enabled.
  • Recommended alerts will apply Prometheus alerts equivalent to your custom metric alert rules. Use this template if you are moving from custom metrics to Prometheus metrics and want to keep the same alerts.

3. Create Diagnostic Settings to Collect Control Plane Logs for AKS Clusters

Control plane logs provide detailed records of the operations performed by the Kubernetes control plane, which can help you understand how your cluster is functioning and troubleshoot cluster-level issues.

To collect control plane logs for AKS clusters, you need to create diagnostic settings. These settings allow you to specify which logs you want to collect and where you want to store them. You can choose to send the logs to a Log Analytics workspace, a storage account, or an event hub.

Creating diagnostic settings for control plane logs also enables you to set up alerts based on specific log events. For example, you can create an alert that triggers whenever there’s a failed API request, indicating a potential issue with your cluster. Control plane logs also let you track changes made to your cluster, such as the creation or deletion of pods. This can help you understand the impact of these changes on your cluster’s performance and stability.

4. Enable Network Observability

Network observability is a crucial aspect of AKS monitoring. It gives you visibility into your cluster’s network traffic, allowing you to understand how your applications are communicating with each other and with external services.

To enable network observability, you can use tools like Azure Monitor for Networks, which provides real-time network flow data for your AKS clusters. This data can help you identify network bottlenecks, troubleshoot connectivity issues, and ensure that your applications are communicating efficiently.

In addition, network observability can help you detect potential security threats. For example, if you see an unusually high amount of traffic coming from a specific IP address, this could indicate a potential DDoS attack. By monitoring your network traffic, you can detect such threats early and take action to protect your cluster.

5. Use Traffic Analytics to Monitor Network Traffic to and from Your Cluster

Lastly, an effective AKS monitoring strategy should include the use of traffic analytics. Traffic analytics is an Azure solution that provides visibility into user and application activity in your Azure networks. It analyzes Azure Network Watcher flow logs and provides insights into traffic flows in your Azure subscriptions. 

In the context of AKS, traffic analytics provide detailed insights into the network traffic to and from your cluster, helping you understand your application’s network behavior and optimize its performance. With traffic analytics, you can monitor the volume of network traffic, the source and destination of the traffic, and the protocols and ports being used. This information can help you identify potential bottlenecks or inefficiencies in your network.

Kubernetes Monitoring with Komodor

Komodor’s platform streamlines the day-to-day operations and troubleshooting process of your Kubernetes apps. Regardless of which Kubernetes Managed Service provider you may be using (and you may be using multiple!), Komodor acts as your single pane of glass for monitoring your Kubernetes workloads, providing enhanced visibility into your clusters and integrating with popular monitoring tools like Datadog, Prometheus or Grafana for clear metric and event visualization. Additionally, it features static monitors that enforce best practices and prevent misconfigurations, and historical data retention that lets you see a complete timeline of events leading up to the current state.

Moreover, Komodor’s Workspace view feature reduces the cognitive load on K8s non-experts by filtering out irrelevant data, ensuring that they stay informed about their app’s performance data and can take swift action when issues arise. By mitigating the overwhelming flow of data that emerges from various dashboards and APMs, Komodor helps end-users own their apps e2e and operate them independently.

To learn more about how Komodor can make it easier to empower you and your teams to troubleshoot K8s, sign up for our free trial.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial.