Observability is a very important aspect of software that’s often taken for granted. You need to have visibility into what your application is doing at different levels to better understand an issue when it occurs.
There are multiple open-source tools and initiatives to help you achieve improved visibility. When we talk about observability, there are three parts to consider: logs, traces and metrics. This article will look at how you can use two of the most popular open-source solutions in the monitoring space for your metric needs.
Prometheus and Grafana – The Basics
Prometheus and Grafana are two of the most widespread tools in the monitoring space. Prometheus is an open-source time-series database you can use to save metrics. It has great support for custom query languages and data modeling with the help of PromQL, Prometheus’ native query language.
Grafana is a web-based visualization tool that supports a wide range of charts and graphs. Grafana acts as a feeder for metrics from Prometheus, which then presents them via dashboards. You can also import open-source pre-defined dashboards very easily.
Both of these tools are free and easy to use, although you will have to implement numerous other tools to solve the scalability and high-availability issues of Prometheus.
Prometheus and Grafana, along with some of Prometheus’ features, form a great start for your monitoring needs. Below is a diagram of the different components of this architecture and how they interact with each other.
Grafana talks to Prometheus for the queries provided and then creates charts on top of it. Prometheus works on a pull-based model to get the metrics; this means Prometheus collects metrics at regular intervals from different sources defined in its configurations. You can enable support for push-based metrics using Prometheus Pushgateway. A metrics producer can push the metrics to Pushgateway, and Prometheus will pull the metrics from there.
Alert Manager is another service you can use to configure alerts, which can then be sent via Slack, Email, or PagerDuty.
This is a very basic architecture, and you can face challenges when you want to horizontally scale. Thanos can help you scale Prometheus and achieve high availability, as it can query from multiple Prometheus servers in the backend. You can also opt for Trickster here as a reverse proxy and cache the data served on the Trickster layer, improving dashboard performance.
How to Install Prometheus and Grafana
There are multiple methods to deploy Prometheus and Grafana in your infrastructure
Virtual Machine Installation
You can install Grafana and Prometheus in two virtual machines or on the same one. If you’re installing in production, you should use different machines. Make sure you have connectivity between these two machines so that Grafana can talk to Prometheus.
Simply download a pre-built binary for Prometheus, run it, and give it the configuration file. You can download the binary from these locations. The location of the configuration file is generally at:
Go ahead and define all configurations for Prometheus metric sources in this file.
Step-by-step installation instructions are available here. There are other installation methods like Docker installation, or you can use config management tools like Chef, Puppet, Ansible, and SaltStack. You can find installation methods for all these here.
Use package managers to install Grafana:
apt-get install grafana-server
After this, you can see your server at localhost:3000. Use the default username and password to log in, after which, you have to connect Grafana via Prometheus. To do that, you have to add the Prometheus server as a data source in the Grafana dashboard. Follow this document from Prometheus to add it as a data source.
Both of these tools have Helm charts available for the installation on Kubernetes. Follow these links to find them:
You can install both of these using Helm commands. After that, as in previous steps, you’ll have to add Prometheus as a data source. In the case of Kubernetes, you have to add the service of Prometheus in the Grafana data source.
Apart from Helm charts, Grafana and Prometheus can also be deployed using Prometheus Operator and Grafana Operator. Operators are extensions of the Kubernetes API using Golang or any other programming language; you can use them to install and run tools on top of Kubernetes.
When you’re installing Prometheus Operator, the ServiceMonitor methodology lets you discover your metrics sources. In the case of a Helm chart installation, you can either rely on the Prometheus configuration file or annotations on the Kubernetes pods from which the metrics have to be scrapped.
What Is ServiceMonitor?
If you’re aware of Kubernetes Operator, you can define custom resource definitions and then code your operator to work on these objects. Prometheus Operator creates a CRD for ServiceMonitor, which you can then add using a kubectl apply command. You can find more details about this here.
If you’re confused as to which method to choose to install Prometheus and Grafana on Kubetenetes, go for the operator way. Operators are native Kubernetes objects and can help in self-healing and defining Prometheus and Grafana constructs.
How Prometheus and Grafana Can Help You Troubleshoot Production Issues
Prometheus lets you capture all your application, infrastructure, database, and cloud metrics, while Grafana lets you visualize them and have a centralized view of everything happening in your environment. Having such capabilities is great when there are any production issues.
One of the major aspects of visibility is the ability to identify a problem as early as possible. Prometheus alerts will accomplish this for you. You can put alerts on your metrics and get informed whenever there’s any threshold breach.
You can also add alerts in Grafana so that if Prometheus isn’t working, Grafana will alert you to this, essentially alerting you to your monitoring being down.
There are a few aspects pertaining to visibility that are tough to capture, for example, application changes or deployments.
In real-life scenarios, almost 80%-90% of production issues happen due to some human error causing a wrong configuration to go to production. If you have a history of all the changes executed, it becomes very easy to identify which change caused a problem. But with Kubernetes deployments, it’s hard to retrieve all the change history on your cluster and application deployments.
Komodor can be a great solution to this, as it can be easily integrated with Prometheus and many other tools like Datadog, Github, Sentry, etc. Komodor lets you easily achieve a holistic view of your cluster on your screen, giving you key data like cluster status, resource usage, and config and application changes. Plus, it can automate all monitoring and alerting.
Komodor even provides playbooks on alerts and suggestions on what could have gone wrong, as well as multi-cluster and multi-cloud support so you can be cloud-agnostic.
Prometheus and Grafana have great advantages, but the biggest benefits are that they’re free, relatively easy to understand, and provide good observability into your stack. On the other hand, you have to invest time in manually managing them as well as money in hosting them. You’ll also have to spend quite a bit of engineering resources to scale Prometheus beyond a certain range – and as we all know, with scale comes complexity – meaning, the chances of things breaking down are more likely to occur.
This is where Komodor comes in as a native K8s platform, helping you monitor your entire K8s stack, identify issues, uncover their root cause and understand the necessary action to troubleshoot efficiently and independently. To learn more about how Komodor can make it easier to empower you and your teams to troubleshoot K8s, sign up for our free trial.