1. Hosts, services and agents
So, Checkmk is ready. But before we start with the actual monitoring, we will briefly explain some important terms. First of all, there is the host. A host in Checkmk is any stand alone, physical or virtual system monitored by Checkmk. Usually these are things with their own IP address (servers, switches, SNMP devices, virtual machines), but also, for example, Docker containers or other logical objects which do not have such an IP address. Each host always has one of the states UP, DOWN, UNREACH or PEND.
On each host a number of services are monitored. A service can be anything — for example, a file system, a process, a hardware sensor, a switch port — but it can also simply be a certain metric such as CPU utilization or RAM usage. Each service can have one of the states OK, WARN, CRIT, UNKNOWN or PEND.
In order for Checkmk to be able to request data from a host, an agent is necessary. This is a small program that is installed on the host and which provides data on the state (or 'health') of the host on request. Servers running Windows, Linux, or Unix can only be effectively monitored by Checkmk if you install a Checkmk agent there — an agent provided by us. In the case of network devices and many appliances, the manufacturer will usually have built-in an agent that Checkmk can easily query using the standardized SNMP protocol. Cloud services such as Amazon Web Services (AWS) or Azure alternatively provide an interface ('API') that can be queried by Checkmk via HTTP.
2. Preliminary considerations for DNS
Even though Checkmk does not require name resolution of hosts, a well-maintained Domain Name System (DNS) makes configuration much easier and avoids errors, since Checkmk will then be able to resolve the host names on its own without you needing to enter IP addresses in Checkmk.
So setting up a monitoring system is a good opportunity to check whether your DNS is up to date and, if necessary, to add any missing entries.
3. Folder structure for hosts
Checkmk manages your hosts in a hierarchical tree of folders — quite analogous to what you know from files in your operating system. If you only monitor a handful of hosts, this may seem not so important to you. But remember — Checkmk is designed to monitor thousands and tens of thousands of hosts — so good order can be half the battle won.
Before you include the first hosts into Checkmk, it is therefore advantageous to think about the structure of these folders. On the one hand, the folder structure is useful for your own overview. More importantly, however, it can be used for the configuration of Checkmk. All configuration attributes of hosts can be defined in a folder, which are then automatically inherited by its subfolders and hosts contained there. Therefore, it is elementary, not only but especially for the configuration of large environments, to set up a well-considered folder structure from the beginning.
Once you have created a folder structure, you can change it — but you must do so very carefully. Moving a host to another folder can have the effect of changing its attributes without you being aware of it.
The real consideration when building a folder structure that will be most useful to you is the criteria by which you want to organize the folders. The criteria can be different at each level of the tree. For example, you can distinguish by location in the first level and by technology in the second level.
The following classification criteria have proven themselves in practice:
Location/Geography
Organization
Technology
Sorting by location is particularly obvious in larger companies, especially if you distribute the monitoring over several Checkmk servers. Each server then monitors a region or a state/country, for example. If your folders map this distribution, then you can define, for example, in the folder 'Munich' that all hosts in this folder are to be monitored from the Checkmk site 'muc'.
Alternatively, 'organization' (i.e. the answer to the question 'Who is responsible for a host?') may be a more meaningful criterion, since location and responsibility may not always be the same. For example, it may be that one group of your colleagues is responsible for the administration of Oracle, regardless of the actual physical location of the corresponding hosts. If, for example, the folder 'Oracle' is intended for the hosts of the Oracle colleagues, it is easy to configure in Checkmk that all hosts below this folder are only visible to these colleagues and that they can even maintain their hosts there themselves.
Structuring by technology could, for example, provide a folder for Windows servers and one for Linux servers.
This would simplify the implementation of the scheme 'The sshd
process must run on all Linux servers'.
Another example is the monitoring of devices such as switches or routers via SNMP.
Here, no Checkmk agent is used, but the devices are queried via the SNMP protocol.
If these hosts are grouped in separate folders, you can make the settings necessary for SNMP, such as the 'Community', directly at the folder.
Since a folder structure can only rarely reflect the complexity of reality, Checkmk provides another supplementary possibility for structuring using the host tags — but more on this in a separate chapter on fine-tuning the monitoring. For more information on the folder structure, see the articles about administration and structuring of hosts.
4. Creating folders
You can access the administration of folders and hosts via the navigation bar, the Setup menu, the Hosts topic and the Hosts entry. The Main page is then displayed:
Before we create the first folder, we will briefly discuss the structure of this page, since you will find the various elements on most Checkmk pages in the same or a similar format. Below the page title Main you will find the breadcrumb path, which shows you where you are currently located within the Checkmk interface. Below this, the menu bar is displayed, which summarizes the possible actions on this page in menus and menu items. The menus in Checkmk are always context-specific, i.e. you will only find menu entries for actions that make sense on the current page.
Below the menu bar you will find the action bar, in which the most important actions in the menus are offered as buttons for direct clicking. You can hide the action bar with the button to the right of the Help menu and show it again with . When the action bar is hidden, the icons are displayed in the menu bar to the right of .
Since we are currently on an empty page (without folders and without hosts), the important actions for creating the first object are additionally offered via even larger buttons — so that the options offered by the page cannot be overlooked. These buttons will disappear after the first object has been created.
Now let’s get back to the reason we are on this page:
the creation of folders.
One folder — the main folder — exists in every freshly set up Checkmk system.
It is called the Main, as you can see in the title of the page.
Below the main folder, we will now create the three folders Windows
, Linux
, and Network
as a simple exercise.
Create the first of the three folders by selecting one of the actions offered to create a folder, e.g. the Add folder button. On the new page Add folder enter the folder name in the first box Basic settings:
In the above image, the Show less mode is active and only the entry that is absolutely necessary for creating a folder is displayed. Confirm the entry with Save.
Analogous to the Windows
folder, create the other two folders Linux
and Network
.
After that, the situation will look like this:
Tip: When you point the mouse at the tab or the top of a folder icon, the folder unfolds to reveal the icons you need to perform important actions with the folder (change the properties, move the folder or delete it).
One more tip: At the top right of each page you will find the information whether — and if so, how many — changes have already been accumulated in the meantime. Since we have created three folders, there are three changes, but they do not need to be activated yet. We will deal with activating changes in more detail below.
5. Adding the first host
Now everything is in place and ready for the first host to be added the monitoring — and what could be more obvious than to monitor the Checkmk server itself? Checkmk won’t be able to report its own total failure of course, but this is still useful since not only does it give you an overview of CPU and RAM usage, but also a number of metrics and checks concerning the Checkmk system itself.
The procedure for including a Linux host (as well as a Windows host, by the way) is in principle always as follows:
Download the agent
Install the agent
Create the host
Register the agent
Finally, once the host has been created in the configuration environment, the services can be configured and the changes activated for the monitoring environment.
5.1. Downloading the agent
Since the Checkmk server is a Linux machine, you will need the Checkmk agent for Linux.
For the Raw Edition, you can find the agent’s Linux packages via Setup > Agents > Linux:
In the commercial editions, Setup > Agents > Windows, Linux, Solaris, AIX takes you to a page that also gives you access to the Agent Bakery, with which you can 'bake' individually configured agent packages. From this page, the Related > Linux, Solaris, AIX files menu item will take you to the agent files page as in the Raw Edition.
Download the package file: Choose the RPM file format for Red Hat Enterprise Linux (RHEL) based systems, and SLES or the DEB file format for Debian and Ubuntu.
5.2. Installing the agent
For the following installation example, we assume that the downloaded package file is located in the /tmp/
directory.
If you have downloaded the file to another directory, replace the /tmp/
directory definition with the actual directory in the following installation command.
Similarly, replace the name of the package file with the name of the file you downloaded.
The package file is only needed during the installation, and it can be deleted once the installation has been completed.
Note: In our example the agent will be installed on the Checkmk server, i.e. you do not need to copy the package file to another computer.
If the downloaded file is not on the host targeted for the installation of the agent, you must first copy the file to the target host, for example with the command line tool scp
.
This is performed in the same way as for the installation of the Checkmk software and as described for a Linux installation, for example, for an installation under Debian and Ubuntu.
The installation is performed as root
on the command line, for the RPM file with rpm
, preferably with the -U
option, which stands for Update and which ensures that the installation goes through without errors even if an older version of the agent is already installed:
root@linux# rpm -U /tmp/check-mk-agent-2.2.0p1-1.noarch.rpm
And for the DEB file with the dpkg -i
command:
root@linux# dpkg -i /tmp/check-mk-agent_2.2.0p1-1_all.deb
The agent installs the Agent Controller, which is used, among other things, to establish TLS encryption of the connection to the Checkmk server during registration.
For the installation of the Agent Controller to work it requires a Linux distribution with the init system systemd
, which has been standard in most Linux distributions since 2015.
For rare cases where the Agent Controller cannot be used, see the article Monitoring Linux in legacy mode.
This completes the installation of the agent. You can leave the command line open. It will be needed again when registering the host.
5.3. Creating a host
After installing the agent on the host, you can add the host to Checkmk’s configuration environment — namely into the prepared Linux folder. Just a reminder: In this example, the Checkmk server and the host to be monitored are of course the very same element.
In the Checkmk interface, open the same Main page where you have already created the three folders: Setup > Hosts > Hosts. There, change to the Linux folder by clicking on that folder.
Click Add host to open the Add host page:
As with the creation of the three folders above, the Show less mode will still be active. Therefore, Checkmk only shows the most important host attributes in the menu — those that are necessary to create a host. If you are interested, you can see the rest by clicking the ellipsis at each of the open submenus and by opening the two collapsed submenus at the bottom of the page. As mentioned at the beginning, Checkmk is a complex system that has an answer to every question. That’s why you can configure so much on a host (but not only there).
Tip: On many pages — including this one — you can also display help texts for the attributes. To do this, select Show inline help from the Help menu. The selected setting remains active on other pages until you switch off the help. The following image shows the inline help for the IPv4 address parameter:
But now for the inputs for creating the first host. You only need to fill in one field, namely Hostname in the Basic settings.
This name has a free format, and can be assigned as required. However, you should know that the host name is of central importance, because it serves as an internal ID (or key) for unambiguous identification of the host at all points in the monitoring. Since it is so important in Checkmk and is so often used, you should think carefully about the naming of your hosts. A host name can be changed at a later date, but this is a time-consuming process and should be avoided.
It is best if the host can be resolved under its name in the DNS. If this is the case, you will be finished with this form. If not, or if you do not want to use DNS, you can also enter the IP address manually in the IPv4 address field.
Note: To ensure that Checkmk can always run stably and with good performance, it maintains its own cache for the resolution of host names. For this reason, the failure of the DNS service does not lead to a failure of the monitoring. Detailed information on host names, IP addresses and DNS can be found in the article on host administration.
A host must exist in the configuration environment before it can be registered in the next step. So, complete the creation of the host for now by clicking Save & view folder.
5.4. Registering the host
Registering the host with the Checkmk server establishes the trust relationship between the two. Communication between host and server is then only Transport Layer Security (TLS) encrypted.
Registration is done by calling the Agent Controller cmk-agent-ctl
as root
on the command line.
For the command you need the names of the Checkmk server (mycmkserver
in the example), the Checkmk site (mysite
) and the host (localhost
) as set up in Checkmk in the preceding section.
Completing the options are the name of a Checkmk user with access to the REST API.
You can use cmkadmin
to do this:
root@linux# cmk-agent-ctl register --hostname localhost --server mycmkserver --site mysite --user cmkadmin
If the specified values were correct, you will be asked to confirm the identity of the Checkmk site to which you want to connect. For clarity we have shortened the server certificate output to be confirmed here in this example:
Attempting to register at mycmkserver, port 8000. Server certificate details:
PEM-encoded certificate:
-----BEGIN CERTIFICATE-----
MIIC9zCCAd+gAwIBAgIUM7th5NaTjbkXVo1gMXVDC3XkX4QwDQYJKoZIhvcNAQEL
[...]
jbXj75+c48W2u4O0+KezRDIG/LdeVdk0Gq/kQQ8XmdqgObDU7mJKBArkuw==
-----END CERTIFICATE-----
Issued by:
Site 'mysite' local CA
Issued to:
mysite
Validity:
From Tue, 28 Feb 2023 15:55:26 +0000
To Thu, 28 Feb 3022 15:55:26 +0000
Do you want to establish this connection? [Y/n]
> Y
Please enter password for 'cmkadmin'
> *****
Registration complete.
Confirm with Y
and then when requested enter the password for the cmkadmin
user to complete the process.
If no error message is displayed, the encrypted connection will have been established. All data will subsequently be transmitted in compressed form via this connection.
After this (for the time being last) excursion to the command line, we will continue, again in the Checkmk interface.
5.5. Diagnostics
Murphy’s law — "Everything that can go wrong will go wrong" — unfortunately still applies to Checkmk. Things can go wrong, especially when you are trying them for the first time. Good tools for diagnosing errors are therefore important.
During the creation of a host, Checkmk not only offers to save the entries (host name and IP address) on the Add host page, but also to test the connection to the host. Following the short interruption for registration, we will now catch up on this connection test. On the Linux page, click the icon on the host you have just created to open the host properties. In the action bar of the Properties of host page you will find, among other things, the Save & run connection tests button. Click on this button.
The Test connection to host page will be displayed and Checkmk will try to reach the host in various ways. For Linux and Windows hosts only the two upper boxes are interesting:
The output in the Agent box assures you that Checkmk can successfully communicate with the agent you have previously installed and registered manually on the host.
In further boxes you can see how Checkmk tries to make contact via SNMP. This predictably leads to SNMP errors in this example, but this is very useful for network devices, which we will discuss below.
On this page you can try a different IP address in the Host Properties box if necessary, run the test again and even transfer the changed IP address directly to the host properties with Save & go to host properties.
Click this button (whether you have changed the IP address or not) and you will end up back on the Properties of host page.
By the way, you can find more diagnostic possibilities in the Linux agent article.
5.6. Configuring services
Once the host itself has been included, the really interesting part begins — the configuration of its services. On the host properties page mentioned above, click Save & run service discovery and the Services of host page will appear.
On this page you specify which services you want to monitor on the host. If the agent on the host is accessible and running correctly, Checkmk automatically finds a number of services and suggests these for the monitoring (shown here in an abbreviated form):
For each of these services, there are the following options:
Undecided : You have not yet decided whether to monitor this service.
Monitored : The service is currently being monitored.
Disabled : You have chosen not to monitor the service.
Vanished : The service was being monitored, but it now no longer exists.
This page shows all services ordered by the above categories into tables. As you have not yet configured a service, you will see only the Undecided table.
If you click Monitor undecided services, all of the services will be directly added to the monitoring and all of the Undecided services will become Monitored services.
Conversely, services can also disappear, for example, when a file system has been removed. These services will then appear in the monitoring as UNKNOWN and on this page as Vanished and can be removed from the monitoring with Remove vanished services.
For now, it’s easiest to click the Accept all button, which does everything at once — adding missing services, removing vanished ones — and additionally apply any changes found to host labels.
You can always visit this page later to customize the configuration of the services. Sometimes new services are created by changes to a host, for example, when you include a Logical Unit Number (LUN) as a file system or configure a new Oracle database instance. These services then reappear as Undecided, at which point you can include them in the monitoring individually or all at once.
5.7. Activating changes
Checkmk initially saves all changes you make only in a temporary 'configuration environment' that does not yet influence the currently-operating monitoring. Only by 'activating the pending changes' will they be transferred to the monitoring. You can read more about the background to this in the article on configuring Checkmk.
As we mentioned above, on the top right of each page you will find information on how many changes have so far accumulated that have not yet been activated. Click on the link with the number of changes, which will take you to the Activate pending changes page, which lists, among other things, the changes that have not yet been activated at Pending changes:
Now click the Activate on selected sites button to apply the changes.
Shortly after, you will be able to see the result in the sidebar in Overview, which now shows the number of hosts (1) and the number of services you previously selected. In the standard dashboard, which you can reach by clicking on the Checkmk logo in the top left of the navigation bar, you will also now be able to see that the system has become filled with life.
You have now successfully transferred the first host and its services into the monitoring — Congratulations!
More detailed information on the Linux agent can be found in the article on Linux monitoring. You can find information on how to revert pending changes in Configuration of Checkmk.
6. Monitoring Windows
Just as for Linux, Checkmk also has its own agent for Windows. This is packaged as an MSI package. You will find it just one menu entry away from the Linux agent. Once you have downloaded the MSI package and copied it to your Windows computer, you can install it by double-clicking, as is usual with Windows.
Once the agent has been installed, you can create the host in Checkmk, register it by command, and add it to the monitoring.
Follow the same procedure as described above for the Linux host, but create the host in the designated Windows
folder.
Since Windows is structured differently from Linux, the agent will naturally find other services.
For a detailed introduction to this subject, see the article on Windows monitoring.
7. Monitoring with SNMP
Professional quality switches, routers, printers and many other devices and appliances already have a built-in interface for monitoring from the manufacturer — the Simple Network Management Protocol (SNMP). Such devices can be monitored very easily with Checkmk — and you don’t even need to install an agent.
The basic procedure is always the same:
In the device’s management interface, enable SNMP for read access from the IP address of the Checkmk server.
Assign a Community when doing so. This is nothing more than a password for access. Since this is usually transmitted in plain text in the network, it only makes limited sense to choose a very complicated password. Most users simply use the same community for all devices within a company. This also greatly simplifies the configuration in Checkmk.
In Checkmk, create the host for the SNMP device as described above, this time in the designated
Network
folder.In the host properties, in the Monitoring agents box, check Checkmk agent / API integrations and select No API integrations, no Checkmk agent.
In the same Monitoring agents box, check SNMP and select SNMP v2 or v3.
If the Community is not
public
, under Monitoring agents again activate the SNMP credentials entry, select SNMP community (SNMP Versions 1 and 2c) and enter the Community in the input field below.
For the above last three points (4, 5, 6), the result should look like in the following screenshot:
Tip: If you have created all SNMP devices in a separate folder, simply carry out the configuration of the Monitoring agents for the folder. This will automatically apply these settings to all the hosts in this folder.
The rest runs as usual. If you want, you can take a look at the Test connection to host page with the Save & go to connection tests button. There you can immediately see whether access via SNMP works, here for a switch, for example:
On the Properties of host page, click on Save & run service discovery to display the list of all services. This naturally looks completely different from Linux or Windows. On all devices, by default Checkmk monitors all ports that are currently in use. You can customize this later as you wish. In addition, one service that is always OK shows you the general information about the device, and another service shows you the uptime.
A detailed description can be found in the article on monitoring via SNMP.
8. Clouds, containers and virtual machines
You can also monitor cloud services, containers and virtual machines (VM) with Checkmk, even if you do not have access to the actual servers. Checkmk uses the application programming interfaces (API) provided by the manufacturers for this purpose. These interfaces always use HTTP or HTTPS for access.
The basic principle is always the following:
Set up an account for Checkmk in the manufacturer’s management interface.
Create a host in Checkmk to access the API.
Set up a configuration for this host to access the API.
For the monitored objects such as VMs, EC2 instances, containers, etc., create additional hosts in Checkmk or automate their creation.
You can find in the User Guide step-by-step instructions for setting up monitoring of Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), Docker, Kubernetes and VMware ESXi.