1. Introduction
1.1. Background and motivation
You may be wondering why you should even integrate Prometheus into Checkmk at all — therefore we would like to make an important note at this point: Our integration of Prometheus is aimed at all of our users who already use Prometheus. By integrating Prometheus into Checkmk, we can close the gap that has opened up here so that you do not have to continuously check two monitoring systems.
This enables you to correlate the data from the two systems, accelerate any error analysis and, at the same time, facilitate communication between Checkmk and Prometheus users.
Finally, context again
As a most pleasant side benefit of this integration, it is likely that your metrics from Prometheus automatically receive a meaningful context thanks to Checkmk. For example, while Prometheus correctly shows you the amount of main memory used, you do not have to take any extra manual steps in Checkmk to find out how much of the total available memory this is. As banal as this example may be, it shows at which points Checkmk makes monitoring easier — even in some of the smallest details.
1.2. Exporter or PromQL
The integration of the most important exporters for Prometheus is provided via a special agent. The following exporters for Prometheus are available:
If we do not yet support the exporter you need, experienced Prometheus users also have the option of sending self-defined queries to Prometheus directly via Checkmk. This is performed using Prometheus’ own query language, PromQL.
2. Setting up the integration
2.1. Creating a host
Since the concept of hosts in Prometheus simply doesn’t exist, first create a place that gathers the desired metrics. This host forms the central point of contact for the special agent, and this then later distributes the delivered data to the correct hosts in Checkmk. To do this, create a new host using the WATO module of the same name.
If the specified host name does not correspond to an FQDN, enter the IP address at which the Prometheus server can be reached.
Make all other settings for your environment and confirm your selection with Save & go to folder.
2.2. Create a rule for the Prometheus datasource
Before Checkmk can find metrics from Prometheus, you must first set up the special agent using the Prometheus rule set. You can find this in WATO via Setup > Agents > VM, Cloud, Container > Prometheus. There are several options for customizing the connection of your Prometheus server’s web frontend, regardless of which exporter you want to use. Please note, only the port option has been available prior to version 2.0.0.
Prometheus connection option: Specify here how the Prometheus server should be contacted. Especially if the server is only accessible via HTTPS, the options Custom URL or Host name are important and useful, since the certificate rarely includes the IP address.
Prometheus web port: The port only needs to be changed if it differs from the Prometheus server default.
Basic authentication: If a login is required, specify it here.
Protocol: After installation the web frontend is provided via HTTP. If you have secured the access with HTTPS, change the protocol here accordingly.
You can see the default values in the following screenshot:
Integration using Node Exporter
If, for example, you now want to integrate the hardware components of a so-called Scrape Target from Prometheus, use the so-called Node Exporter. Select Add new Scrape Target, and from the dropdown menu that opens, select Node Exporter:
Furthermore, here you can select which hardware or which operating system instances are to be queried by the Node Exporter. The services created in this way use the same check plug-ins as are used for other Linux hosts. This means that their behavior is identical to those already familiar, so without needing to adapt to something new you can quickly configure thresholds, or work with graphs.
Normally the agent will try to automatically assign the data to the hosts in Checkmk, and likewise also for the host in Checkmk that fetches the data. However, if in the data from the Prometheus server neither the IP address, the FQDN, nor localhost are present, use the Explicitly map Node Exporter host option to specify which host from the Prometheus server data is to be assigned to the Prometheus host in Checkmk.
Integration using cAdvisor
The cAdvisor exporter enables the monitoring of Docker environments, and returns metrics on usage and performance data.
Via the selection menu Entity level used to create Checkmk piggyback hosts you can determine whether and how the data from Prometheus should be collected in an ready-aggregated form. You can choose from the following three options:
Both - Display the information for both pod and container levels
Container - Display the information on container level
Pod - Display the information for pod level
Select either Both or Container, and also define the name under which hosts are created for your containers. The following three options are available for the naming. The option Short is the default:
Short - Use the first 12 characters of the docker container ID
Long - Use the full docker container ID
Name - Use the container’s name
Please note that your selection here affects the automatic creation and deletion of hosts according to your dynamic host configuration.
To be able to limit the number of monitored objects, with Monitor namespaces matching you have the option of limiting the number of monitored objects. All namespaces which are not covered by the regular expressions will accordingly be ignored.
Integration using kube-state-metrics
Within a Kubernetes cluster, deployments, nodes and pods can be queried with the kube-state-metrics exporter. The mechanics here are largely the same as for the Node Exporter, or the cAdvisor described above: You select the metrics that you want to monitor. Only by using the Cluster name field can you determine the name of the host under which the data for a cluster should be displayed.
Integration via PromQL
As already mentioned, with the help of the special agent it is also possible to send requests to your Prometheus servers via PromQL. Enter the port via which Prometheus can be reached, and select Service creation using PromQL queries > Add new Service. Use the Service Name field to determine what the new service should be called in Checkmk.
Next, select Add new PromQL query and use the Metric label field to specify the name of the metric to be imported into Checkmk. Now enter your query in the field PromQL query. It is important that this query may only return a single value.
In this example, Prometheus is queried about the number of running and blocked processes. In Checkmk these processes and the two metrics — Running and Blocked — are then combined in a service called Processes.
From version 2.0.0 you can also assign thresholds to these metrics. To do this, activate Metric levels and then choose between Lower levels or Upper levels. Note that these always specify floating point numbers, but of course they also refer to metrics that return integers only.
Assigning a rule to the Prometheus host
Assign this rule explicitly to the host you just created, and confirm your entries with Save.
2.3. Service Discovery
Now that you have configured the special agent, it is time to run a service discovery on the Prometheus host.
3. Dynamic host configuration
3.1. General configuration
Monitoring Kubernetes clusters is probably one of the most common tasks that Prometheus performs. In order to ensure the integration of the sometimes very short-lived containers, which are orchestrated by Kubernetes and monitored with Prometheus — also in Checkmk without great effort — it is advisable to set up a dynamic host configuration. The data from the individual containers is forwarded as piggyback data to Checkmk.
Simply create a new connection using WATO > Hosts > Dynamic config > New connection, select Piggyback data as the connector type, and use Add new element to define the conditions under which new hosts should be created dynamically.
Please also note whether it is necessary for your environment to dynamically delete hosts again when no more data arrives at Checkmk via the Piggyback mechanism. Set the option Delete vanished hosts accordingly.
3.2. Special feature in interactions with cAdvisor
Containers usually receive a new ID when they are restarted. In Checkmk the metrics from the host with the old ID are not automatically transferred to the new ID. In most cases, that wouldn’t make any sense. In the case of containers, however, this can be very useful, as seen in the example above. If a container is only restarted, you probably do not want to lose its history. To achieve this, do not create the containers under their ID, but instead under their name (option Name - Use the container’s name in the Prometheus rule). In this way, with the Delete vanished hosts option in the dynamic host configuration you can still delete containers that no longer exist, without having to fear that their history will also be lost. Instead, this will be continued — by the use of the identical container name — even if it is actually a different container which uses the same name.