Checkmk
to checkmk.com

1. Foreword

There may still be changes to both this description and the new Kubernetes monitoring feature in Checkmk version 2.1.0 itself. Please see the Changes to this article on GitHub or our Werks regarding changes to the feature itself.

Since Kubernetes integration with Checkmk is natively built into Kubernetes itself, we also directly use the README files in the GitHub repositories. In particular, the Instructions for installing the agent is a direct source to read up on the current recommended ways.

1.1. Getting started with the Kubernetes monitoring

For an introduction to the new monitoring of Kubernetes, we recommend the two videos Kubernetes Monitoring with Checkmk and Detecting issues and configuring alerts for Kubernetes clusters.

1.2. Differences to the previous Kubernetes monitoring

Kubernetes monitoring in Checkmk has been rewritten from scratch. The amount of data that can be monitored has grown significantly. Since the technical basis for Kubernetes monitoring is fundamentally different in Checkmk 2.1.0, it is not possible to adopt or even rewrite previous monitoring data for your Kubernetes objects.

2. Introduction

Kubernetes has been the most widely used tool for container orchestration for quite some time. Checkmk helps you monitor your Kubernetes environments.

Starting with version 2.1.0, you can use Checkmk to monitor the following Kubernetes objects:

  • Cluster

  • Nodes

  • Deployments

  • Pods

  • DaemonSets

  • StatefulSets

For a complete listing of all available check plugins for Kubernetes monitoring, see our Catalog of Check Plug-ins.

3. Perequisites in the cluster

To be able to monitor your Kubernetes cluster in Checkmk, first create the prerequisites in your cluster. First and foremost, tell the cluster which pods/containers to deploy and how to configure them.

3.1. Setting up the Helm repository

Currently, we recommend installing Kubernetes monitoring using the tool helm, as it is also suitable for less experienced users and standardizes the management of configurations. Helm is a kind of a package manager for Kubernetes. You can use it to include repositories as sources and easily add the Helm charts they contain like packages to your cluster. To do this, first of all make the repository known. In the following example, we use the name tribe29 to make it easier to access the repository later. However, you can of course use any other name:

3.2. Adjustments to the configuration

With Helm, you create the necessary configuration files completely on your own. In order to be able to determine certain parameters over all configurations, you give thereby a control file along, the so-called values.yaml. As a starting point we recommend the template provided by us. Copy it and adapt it to your own environment.

Since we cannot know in advance how your Kubernetes cluster is set up, we have chosen the safest option for how the Checkmk collectors are started: By default, they do not expose any ports to be reached from the outside. To allow you to access the collectors later, adjust these settings accordingly.

For simplicity, let’s take our template as a starting point. We support two communication paths by default: the query via Ingress and the query via NodePort. Depending on which variant you support in your cluster, the configuration will vary.

Provide communication via Ingress

If you use Ingress to control access to your services, adjust the already prepared parts in values.yaml accordingly. For a better overview only the relevant part is shown in the following example. Set the value enabled to true. You adjust the remaining values according to your environment:

  ingress:
    enabled: true
    className: ""
    annotations:
      nginx.ingress.kubernetes.io/rewrite-target: /
    hosts:
      - host: checkmk-cluster-collector.local
        paths:
          - path: /
            pathType: Prefix
    tls: []
    #  - secretName: chart-example-tls
    #    hosts:
    #      - chart-example.local

Provide communication via NodePort

You can also provide access to the services directly through a port. This is necessary if you do not use Ingress. Also in the following example only the relevant section is shown. You set the value type to NodePort and remove the comment for the value nodePort:

  service:
    # if required specify "NodePort" here to expose the cluster-collector via the "nodePort" specified below
    type: NodePort
    port: 8080
    nodePort: 30035

3.3. Let create the configuration files

After customizing values.yaml or creating your own, use the following command to create all the necessary configuration files to set up your Kubernetes cluster for monitoring in Checkmk:

user@host:~$ helm upgrade --install --create-namespace -n cmk-monitoring checkmk tribe29/checkmk -f values.yaml

Since the command is not self-explanatory, we provide an explanation of each option below:

command partmeaning

helm upgrade --install

This part is the basic command to send the configuration to the Kubernetes cluster.

--create-namespace

In Kubernetes you always specify to which namespace the configuration should be added. You need this option if the namespace does not exist yet. Helm will create it in this case.

-n cmk-monitoring

This option specifies the namespace to which the configuration should be added. cmk-monitoring` is just an example of what it could be called.

checkmk

checkmk is an example for the name of the helmet chart. Ideally, you should leave this name as it is, because only then you will automatically benefit from the fact that Kubernetes objects get short names.

tribe29/checkmk

The first part of this option describes the repository you created with the command before. The second part — after the slash — is the package where the necessary information is located to be able to create the configuration of your Kubernetes monitoring.

-f values.yaml

Finally, specify the configuration file that you created or customized earlier. It contains all the customizations to be included in the configuration files created with helm.

After you run the command, your Kubernetes cluster is prepared to be monitored with Checkmk. The cluster will now take care of itself to ensure that the necessary pods and the containers within them are running and accessible.

3.4. Alternative: Set up via manifest

Normally, it does not make sense for you to customize the manifests (configuration files) yourself. On the one hand, because you need detailed knowledge of the architecture of the Checkmk Kubernetes Collectors for this and, on the other hand, because manual customization is much more error-prone. For example, in helm you set up communication over TLS once, instead of adding it to all relevant places in the manifests themselves.

However, if you don`t use helm, or want to have control over all the setup details, you can still go this route.

To do so, first download the manifests we have pre-built from our corresponding repository at GitHub. We have split the whole configuration into several files to facilitate their maintenance or to provide more concise files for clearly defined purposes.

You need at least the following five files:

filenamepurpose

00_namespace.yaml

Create namespace named checkmk-monitoring

checkmk-serviceaccount.yaml

Create service account named checkmk and cluster role named checkmk-metrics-reader in namespace checkmk-monitoring

cluster-collector.yaml

Here we create the cluster collector we have named. Among other things, a service account named cluster-collector is created in the namespace checkmk-monitoring and the service accounts are assigned roles within the cluster. In addition, the deployment named cluster-collector is defined.

node-collector.yaml

Analogous to cluster-collector.yaml for the nodes

service.yaml

Create service named cluster-collector in namespace checkmk-monitoring. Create a service named cluster-collector-nodeport in the namespace checkmk-monitoring. The port for the NodePort is also specified here.

If you don’t want to clone the whole repo right away - which you are free to do, of course - you can use the following command to download specifically the five files you need:

user@host:~$ URL='https://raw.githubusercontent.com/tribe29/checkmk_kube_agent/main/deploy/kubernetes/'; for i in 00_namespace checkmk-serviceaccount cluster-collector node-collector service; do wget "${URL}${i}.yaml"; done

If you also want to set up a network policy and a pod security policy, you also need the following two files:

network-policy.yaml
pod-security-policy.yaml
user@host:~$ URL='https://raw.githubusercontent.com/tribe29/checkmk_kube_agent/main/deploy/kubernetes/'; for i in network-policy pod-security-policy; do wget "${URL}${i}.yaml"; done

In the files cluster-collector.yaml and node-collector.yaml you have to fill four placeholders with concrete content. In both files you will find places where main_<YYY.MM.DD> is written. Replace these placeholders with tags from our Kubernetes collector on Docker Hub. For example, you could use the following command to replace all occurrences of main_<YYY.MM.DD> with the March 1, 2022 build tag of our container.

user@host:~$ sed -i 's/main_<YYYY.MM.DD>/main_2022.03.01/g' node-collector.yaml cluster-collector.yaml

For the communication to the outside a service of the type NodePort is needed. This allows communication from the outside and is also permanently set to TCP port 30035 in the service.yaml file. If this port is already assigned in your cluster, please change the port accordingly.

Once you have made these settings, you can apply these manifest files collectively to your cluster. To do this, run the following command from the manifest location:

user@host:~$ kubectl apply -f .
namespace/checkmk-monitoring created
serviceaccount/checkmk created
clusterrole.rbac.authorization.k8s.io/checkmk-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/checkmk-metrics-reader-binding created
serviceaccount/cluster-collector created
clusterrolebinding.rbac.authorization.k8s.io/checkmk-cluster-collector created
clusterrolebinding.rbac.authorization.k8s.io/checkmk-token-review created
deployment.apps/cluster-collector created
serviceaccount/node-collector-machine-sections created
serviceaccount/node-collector-container-metrics created
clusterrole.rbac.authorization.k8s.io/node-collector-container-metrics-clusterrole created
podsecuritypolicy.policy/node-collector-container-metrics-podsecuritypolicy created
clusterrolebinding.rbac.authorization.k8s.io/node-collector-container-metrics-cluterrolebinding created
daemonset.apps/node-collector-container-metrics created
daemonset.apps/node-collector-machine-sections created
service/cluster-collector created
service/cluster-collector-nodeport created

You can also use kubectl to check whether the manifests have been applied correctly. To do this, use the following command to display all the pods in the checkmk-monitoring namespace:

user@host:~$ kubectl get pods -n checkmk-monitoring

Furthermore, you can also check all services within the namespace as follows:

user@host:~$ kubectl get svc -n checkmk-monitoring

4. Set up the monitoring in Checkmk

Next, in the GUI of Checkmk, we move on to setting up the specialagent and a rule to automatically create hosts for your Kubernetes objects. However, to set up the special agent, there are a few prerequisites that need to be met first:

4.1. Store password (token) in Checkmk

The best way to store the password (token) of the service account is to store it in the password store of Checkmk. This is the most secure variant, because you can separate the storage and use of the password organizationally. Alternatively, enter it directly in plain text when creating the rule (see below).

If you have kept the default checkmk-monitoring as the namespace for monitoring your Kuberentes cluster, the following command line will cut the password directly from the output of kubectl get secrets:

user@host:~$ kubectl get secret $(kubectl get serviceaccount checkmk -o=jsonpath='{.secrets[*].name}' -n checkmk-monitoring) -n checkmk-monitoring -o=jsonpath='{.data.token}' | base64 --decode
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJjaGVjay1tayIsI
mt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJjaGVjay1tay10b2tlbi16OWhicCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5
hbWUiOiJjaGVjay1tayIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjIxODE3OWEzLTFlZTctMTFlOS1iZjQzLTA4MDAyN2E1ZjE0MSIsInN1YiI6I
nN5c3RlbTpzZXJ2aWNlYWNjb3VudDpjaGVjay1tazpjaGVjay1tayJ9.gcLEH8jjUloTeaAj-U_kRAmRVIiETTk89ujViriGtllnv2iKF12p0L9ybT1fO-1Vx7XyU8jneQRO9lZw8JbhVmaPjrkEc8
kAcUdpGERUHmVFG-yj3KhOwMMUSyfg6wAeBLvj-y1-_pMJEVkVbylYCP6xoLh_rpf75JkAicZTDmhkBNOtSf9ZMjxEmL6kzNYvPwz76szLJUg_ZC636OA2Z47qREUtdNVLyutls7ZVLzuluS2rnfoP
JEVp_hN3PXTRei0F5rNeA01wmgWtDfo0xALZ-GfvEQ-O6GjNwHDlsqYmgtz5rC23cWLAf6MtETfyeEJjRqwituhqUJ9Jp7ZHgQ%

The password is really that long. If you work directly under Linux, you can add a | xsel --clipboard at the end. Then the password is not printed at all, but copied directly to the clipboard (as if you had copied it with the mouse):

user@host:~$ kubectl get secret $(kubectl get serviceaccount checkmk -o=jsonpath='{.secrets[*].name}' -n checkmk-monitoring) -n checkmk-monitoring -o=jsonpath='{.data.token}' | base64 --decode | xsel --clipboard

Add the password to the Checkmk password store with Setup > General > Passwords > Add password e.g. under the ID and the title kubernetes:

kubernetes password

4.2. Import CA of the service account into Checkmk

In order for Checkmk to trust the Certificate Authority (CA) of the service account, you must store the CA certificate in Checkmk. You can read out the certificate - provided you have kept checkmk-monitoring as the namespace - with the following command:

user@host:~$ kubectl get secret $(kubectl get serviceaccount checkmk -o=jsonpath='{.secrets[*].name}' -n checkmk-monitoring) -n checkmk-monitoring -o=jsonpath='{.data.ca\.crt}' | base64 --decode

Copy everything here including the lines BEGIN CERTIFICATE and END CERTIFIACTE and add the certificate in the Setup menu under Setup > General > Global settings > Site management > Trusted certificate authorities for SSL.

kubernetes ca

4.3. Create Piggyback source host

Create a new host in Checkmk in the usual way and name it mykubernetesclusterhost, for example. As the title and host name suggest, this host is used to collect the Piggyback data and also to map all services and metrics at the cluster level. Since this host only receives data via the special agent, set the IP address family option to No IP.

4.4. Set up dynamic host configuration

To ensure separation between the numerous Kubernetes objects and the rest of your monitoring environment, it is a good idea to first create a folder via Setup > Hosts > Add folder in which the dynamic host configuration can automatically create all necessary hosts. Creating or using such a folder is certainly to be considered optional.

However, it is absolutely necessary to set up a connector for the piggyback data. Via Setup > Hosts > Dynamic host management > Add connection you get to the page for the corresponding setup. First enter a title and then click show more under Connection Properties.

Next, click Add new element and under Create hosts in select the folder you created earlier.

In a Kubernetes environment, where monitorable and monitored objects naturally come and go, it is also recommended to enable the Automatically delete hosts without piggyback data option. What exactly this option does and under what circumstances hosts are then actually deleted is explained in the section Automatically deleting hosts in the article on dynamic host configuration.

Now enter the previously created Piggyback source host under Restrict source hosts and enable the Discover services during creation option.

The Connection Properties section of this new connector might look like the following afterwards:

Sample dynamic host configuration settings.

4.5. Setting up the special agent

Now that all the prerequisites are in place in the cluster and in Checkmk, you can turn your attention to configuring the special agent. This can be found via Setup > Agents > VM, Cloud, Container > Kubernetes.

First of all, you need to assign a name for the cluster you want to monitor. You can choose this name freely. It is used to give a unique name to all objects that originate from exactly this cluster. For example, if you enter mycluster here, the names of the hosts of all pods from this cluster will later start with pod_mycluster. The next part of the host name will then always be the namespace in which this Kubernetes object exists.

Under Token, now select the previously created entry from the password store of Checkmk.

Under API server connection > Endpoint Checkmk now requires the entry of the URL (or IP address) via which your Kubernetes API server can be reached. The port must also be specified here if the service was not provided via a virtual host. The easiest way to find out this IP address — if you don’t already have it handy — depends on your Kubernetes environment. The following command will give you the endpoint of the API server as IP address and port, which you will find as the last entry under server in the shortened output:

user@host:~$ kubectl config view
apiVersion: v1
clusters:
  - cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://10.73.42.21:6443
    name: my-kubernetes

If the server is provided via a DNS record, the output will look more like this instead:

user@host:~$ kubectl config view
apiVersion: v1
clusters:
  - cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://DFE7A4191DCEC150F63F9DE2ECA1B407.mi6.eu-central-1.eks.amazonaws.com
    name: xyz:aws:eks:eu-central-1:150143619628:cluster/my-kubernetes

If you have stored the CA of your cluster - as described above - in Checkmk, you can select Verify the certificate under SSL certificate verification.

If your Kubernetes API server is only accessible via a proxy or special timeouts are required for the connection, you can enter them under HTTP proxy and TCP timeouts.

Next, you have the choice to enrich the monitoring of your Kubernetes cluster with usage data collected by the Checkmk cluster collector. To do this, you need to specify the protocol, URL, and port of the cluster collector under Collector NodePort/Ingress endpoint. If you set it up using our manifests, the port here is 30035 by default. If you have customized the port in the service.yaml file, change the port here accordingly. You should be able to find out the URL or IP address of the nodeport from the description of the cluster-collector pod. Just run the following command and look in the output in the line starting with Node::

user@host:~$ kubectl describe pod $(kubectl get pods --no-headers -o custom-columns=":metadata.name") | grep -A5 Name:.*cluster-collector
Name:         cluster-collector-5b7c8468cf-5t5hj
Namespace:    checkmk-monitoring
Priority:     0
Node:         minikube/172.16.23.2
Start Time:   Wed, 03 Mar 2022 20:54:45 +0100
Labels:       app=cluster-collector

With the options Collect information about…​ you can now finally select which objects within your cluster should be monitored. Our preselection covers the most relevant objects. If you decide to monitor the Pods of CronJobs as well, please refer to the inline help for this point.

Last but not least, you can choose whether you want to monitor only certain namespaces within your clusters or whether explicit namespaces should be excluded from monitoring. You specify this using the Monitor namespaces option.

Your rule might now look like the following:

Exemplary completed rule for Kubernetes special agent.

Important: Under Conditions > Explicit hosts you must now re-enter the previously created host:

Rules for special agents must always be set to explicit hosts, as seen here.

Next, store the rule and perform a service discovery on this host. You will see the first cluster-level services right here:

Exemplary view of the first service discovery after the configuration is complete.

Afterwards, activate all the changes you made and let the dynamic host configuration do the work from now on. It will generate all hosts for your Kubernetes objects after a short time.

5. Labels for Kubernetes objects

Checkmk automatically generates labels for Kubernetes objects such as clusters, deployments, or namespaces during service discovery. All labels to Kubernetes objects that Checkmk automatically generates start with cmk/kubernetes/. For example, a pod always gets a label of the node (cmk/kubernetes/node:mynode), a label that just shows that this object is a pod (cmk/kubernetes/object:pod) and a label for the namespace (cmk/kubernetes/namespace:mynamespace). This makes it very easy to create filters and rules for all objects of the same type or in the same namespace.

6. Hardware/Software Inventory

Kubernetes monitoring of Checkmk also supports HW/SW inventory.

Exemplary view of hardware and software inventory of a pod.

7. Removing Checkmk

If you have deployed Checkmk to your cluster via our manifests, you can remove created accounts, service and so on just as easily as they were set up. To do this, go back to the directory where the YAML files are located and run the following command:

user@host:~$ kubectl delete -f .
namespace "checkmk-monitoring" deleted
serviceaccount "checkmk" deleted
clusterrole.rbac.authorization.k8s.io "checkmk-metrics-reader" deleted
clusterrolebinding.rbac.authorization.k8s.io "checkmk-metrics-reader-binding" deleted
serviceaccount "cluster-collector" deleted
clusterrolebinding.rbac.authorization.k8s.io "checkmk-cluster-collector" deleted
clusterrolebinding.rbac.authorization.k8s.io "checkmk-token-review" deleted
deployment.apps "cluster-collector" deleted
serviceaccount "node-collector-machine-sections" deleted
serviceaccount "node-collector-container-metrics" deleted
clusterrole.rbac.authorization.k8s.io "node-collector-container-metrics-clusterrole" deleted
podsecuritypolicy.policy "node-collector-container-metrics-podsecuritypolicy" deleted
clusterrolebinding.rbac.authorization.k8s.io "node-collector-container-metrics-cluterrolebinding" deleted
daemonset.apps "node-collector-container-metrics" deleted
daemonset.apps "node-collector-machine-sections" deleted
service "cluster-collector" deleted
service "cluster-collector-nodeport" deleted
On this page