1. Basic configuration
1.1. Initializing at an initial start
At this point you should have set up the appliance on a rack or installed and started it in a virtual machine, either VirtualBox or VMWare ESXi.
During the first start there will be a message for specifying the desired language:

The selected language will be set for the whole device. Afterwards, a message appears prompting you to initialise the data volume:

Confirm this dialogue and wait for device startup to be resumed and to complete. The status view will now be displayed on the local console:

This view shows you general status information and the most important configuration options for your device.
1.2. Network and access configuration via the console
From the status view you can get to the configuration menu by pressing the F1
key:

To put the device into operation, you now need to set up the network configuration and specify the device password.
Network configuration
First set up the network via the Network Configuration item. To do this, the IP address, netmask and the optional standard gateway will be queried one after the other.
In most cases the device will need to also access network devices outside of its own network segment. The default gateway must also be configured for this purpose.
Once these values have been entered the configuration will be activated — meaning the device will be immediately reachable via the network and at the entered IP address.
One way of testing this is to send a ping
from another device in the network.
Enabling the web interface
A large part of the device’s configuration is carried out via the web interface. Access to this web interface is protected by a password – the device password – which you will first need to define. The factory settings do not include a device password, which means that you cannot yet access the web interface.
In the configuration menu, select Device Password in order to specify the password. The password must be at least 8 characters long and must contain at least one lower case letter, one upper case letter and one digit.
Then select Web Configuration from the configuration menu to enable the web interface.
Once you have completed these steps you will see that the console’s status view will have changed:

In the Device Infos box you will see the configured IP address and in the Access box Web Configuration: on. If you have already connected the device correctly to your network, you will also see in the Status box that the network connection is active: LAN: UP.
Protecting access to the console
When you started the appliance, you may have noticed that there was no password prompt. Anyone who has direct access to the rack or the virtualisation solution’s management interface is able to change the basic configuration via the console.
Therefore you should activate password protection: in the configuration menu via the Console Login menu item. If protection is activated, the device password is requested before the status view is displayed and settings can be changed.
You will then see the Console Login: on entry in the status view in the Access box.

1.3. Basic settings on the web interface
Once you have enabled access to the web interface through the previous configuration, you can now access the web interface via a web browser on a computer connected to the device via the network.
To do this enter the appliance URL in the address bar of the browser, in this case http://192.168.178.60/
.
Here you can see the web interface’s login screen:

When you have logged in with the previously-set device password, the main menu will open. From here you can access all of the features in the web interface.

Select Device Settings to view the most important device settings and to change these if necessary:

By clicking on the parameter name you will be taken to the respective page for editing the respective setting.
If you have DNS servers available in your environment you should now configure one or more of these so that the resolution of host names can be used. If you have one or more NTP servers for time synchronisation available in your environment, enter these as IP addresses or host names under NTP Servers.
If emails are to be sent from your device – such as notifications in case of problems – you must configure the Outgoing Emails option. To do this enter the SMTP relay server responsible for this device and any access data required. All emails generated on the device will be sent to this server. Under this setting you can also configure all emails generated by the device’s operating system (e.g. in the case of critical errors) to be sent to a particular email address.

This completes the basic device configuration and you can proceed with the installation of the Checkmk software and the setup of the first monitoring site.
2. Administering Checkmk software versions
Starting with appliance version 1.4.14, Checkmk software is no longer pre-installed on the appliance.
The Checkmk software for installation on the appliance is available to you as a CMA file. Download the CMA file — either in the customer portal for the Standard Edition and Managed Services Edition, or on the download page for the Free Edition. You will find the CMA file after selecting the appropriate Checkmk edition, version and platform appliance.
After downloading the CMA file, select Check_MK versions from the main menu, on the following page the CMA file from your hard disk with the help of the file selection dialogue, and confirm your selection by clicking on Upload & Install.
The Checkmk software will be uploaded to the device. Depending on the network connection between your computer and the device this may take a few minutes. Once uploading is complete you will see the new version in the table of installed versions:

It is possible to install several Checkmk versions on the device at the same time. This allows several sites to be run using differing versions, and for individual sites to be changed to newer or older versions independently of one another. This means that you can install a new version for example and try it out initially in a test site in order to then update your production site if the testing has been successful.
You load and install another Checkmk software version in the same way as the first one. The result will look like this:

If a software version is not being used by any site you have the option to delete it with the recycle bin icon.
3. Administering monitoring sites
3.1. Creating a site
In the main menu of the web interface, click on Site Management. On this page you have access to all of the monitoring sites on this device. You can configure, update and delete sites, as well as create new ones.
The first time you open the page it will be empty. To create your first site, click on the Create New Site button. On the following page you can specify the initial site configuration.

Start by entering a site ID which serves to uniquely identify the site.
The ID may only contain letters, numbers, hyphens (-
) and underscores (_
), must start with a letter or underscore and may be a maximum of 16 characters in length.
Now select the Checkmk version with which to create the site. You will be offered all versions that are installed in the administration of Checkmk software versions.
Finally, you define the user name of the Checkmk administrator with password. You can leave all other settings as they are for the time being. If necessary, you can edit these later via the site editing page.
Click Create Site to create the site. This may take a few seconds. Once the site has been created and started you will be taken to the list of all sites:

The list is short and currently only shows the site just created with the ID mysite and its status, here running. With the button on the far right in the Control column you can stop or start the site. On the far left in the Actions column, symbols are shown for the possible actions you can apply to the site, from left to right: Edit, Update, Rename, Clone, Delete and Login.
You can now log in to the site that has been started, either by clicking on the site ID or by calling up the URL of the site in the address bar of your web browser, in our example: http://192.168.178.60/mysite
.
In the site’s login dialogue, enter the access data you specified when creating the site.
Once you have logged in you can set up Checkmk in the usual manner — the first steps are described in the article on Getting started with monitoring.
The snapin Checkmk Appliance is available in all sites and for all administrators. You will find it in the sidebar:

The entries in this snapin will take you from your sites directly to the appliance’s web interface.
3.2. Updating a site
When updating a site, it is updated with a new Checkmk software version. First install the desired new version as described in the chapter on administration of Checkmk software versions.
Then list the sites on the appliance’s web interface (Main Menu > Site Management):

Make sure that the site is not running, i.e. if the Status is currently running, stop the site (Control > Stop).
Then, under Actions, click on the update icon.
The following page lists the possible target versions for the update:

Select the Target Check_MK version and then click on Update now. After a short time, the update messages are displayed:

With the Back button you can return to the list of sites in which the new Checkmk version is now shown in the Version column. You can now restart the site.
Note: The update of a site in the appliance follows the same principle as the update on a Linux server. In the event of problems, error messages or conflicts, you can obtain detailed information on the update process in the article Updates and Upgrades.
3.3. Migrating a site
It is a commonly required to migrate existing sites from other Linux systems to a Checkmk appliance. The Checkmk appliance offers a page for this purpose with which you can carry out this migration.
The following requirements need to be met:
You need to have a network connection between the source system and your device.
The Checkmk version of the source site needs to be installed on your device (architecture changes from 32-bit to 64-bit are possible).
The source site needs to be stopped during the migration.
Open the main menu of the web interface and click on Site Management. Then click on the Migrate Site button, which will take you to the below menu:

On this page under Source Host you first need to configure the host address (host name, DNS name or IP address) of the source system which you want to migrate the site from. Next under Source site ID you need to enter the site ID of the site you want to migrate.
The migration of the site is done via SSH.
To get access to the source site, you need to provide the credentials of a user which is able to connect to the source system and access all of the source site’s files.
You can use the root
user of the source system or, if you have configured a
password for the site user, you can use the site user credentials.
Optionally you can choose to let the migration create the site with a new site ID on your device, or carry the original ID unchanged over to the new device.
Youa additionally have the option to skip the carrying-over of performance data (measurements, graphs) and historical data during the migration. This can be useful if you don’t need an exact copy of the source site and only want to duplicate it – e.g. for testing purposes.
After you have filled in the parameters and confirmed with Start, the progress of the migration will be displayed:

Once the migration has completed you can finalise the migration by clicking on Complete. You will be returned to the site management where you can start and manage the newly-imported site in the usual way.

4. Updating the firmware
You can update your device’s software, i.e. the firmware of the appliance, to a newer version during operation — or also change back to an older version. Before updating the firmware, you must obtain the new firmware.
You can download the appliance firmware as a CFW file from the customer portal for the Standard Edition and Managed Services Edition, or from the download page for the Free Edition. The CFW file can be found in the download area at the product Checkmk Appliance.
Note: Be sure to select firmware that matches the edition installed on your appliance. However, if you want to upgrade from the Free Edition to a full version of the Enterprise Editions, this is also possible by selecting the firmware for the full version and upgrading to it.
After downloading the CFW file, select Firmware Update from the main menu and on the following file selection dialogue page, select the CFW file from your hard drive:

Confirm with a click on Upload & Install. The firmware will now be loaded onto your device. Depending on the network connection, this may take a few minutes.
Once the file has been recognised as valid firmware, the Confirm Firmware Update dialogue will be displayed. Depending on the differences between the current version and the one to be installed various messages will appear telling you what to do with your data during the update.
Change of the first digit (major release) of the version number: You must back up the data from your device manually and restore it after the update. An update cannot be performed without data migration.
Update to a higher number in the second digit (minor release): The update can be carried out without data migration. You are advised to back up your data beforehand in any case.
Downgrade to lower number in the second digit: You must back up the data from your device manually and restore it after the update. An update cannot be performed without data migration.
Change of the third digit (patch) of the version number: The update can be carried out without data migration. You are advised to back up your data beforehand in any case.
At this point, if necessary, you can cancel the dialogue with No and first make a backup. Then start a new attempt.
Important: If you confirm the Confirm Firmware Update dialogue with Yes!, the device will be rebooted immediately!
During the restart, the selected firmware will be installed. This will cause restarting to take much longer than usual. It will normally take less than 10 minutes however. Afterwards, another reboot will be carried out automatically, which then completes the firmware update. The console status view will show the newly-installed firmware version.
5. Device settings
5.1. Changing the language
During the basic configuration you specified the language for your device. You can change this at any time, either via the console configuration or via the device settings in the web interface. Like all other settings in this dialog changes will be effective immediately when saved.
5.2. Changing the network configuration
During the basic configuration you specified the network configuration of your device. You can change this at any time, either via the console configuration or via the device settings in the web interface. If you made an error when specifying the network configuration and the device is no longer accessible via the network you can only correct the settings on the console.
5.3. Configuring host and domain names
Host and domain names serve to identify a computer in the network. When sending emails for example, these names are used to form the sender address. In addition, the configured host name is added as a source host to all log entries that are sent to a syslog server. This makes it easier to assign the entries.
5.4. Configuring name resolution
In most environments DNS servers are used to translate IP addresses into host names and vice versa. Host names or FQDNs (Fully Qualified Domain Names) are frequently used for monitoring instead of IP addresses.
In order to use the name resolution on your device, you must configure the IP addresses of at least one DNS server in your environment. It is recommended to enter at least two DNS servers.
Only when you have configured this option can you use host and domain names (in the configuration of NTP or mail servers for example).
5.5. Configuring time synchronisation
The system time of the device is used for many purposes, such as for recording measurement data or writing log files. A stable system time is therefore very important. This is best ensured by using a time synchronisation service (NTP).
To activate the synchronisation enter the host address of at least one time server under NTP server.
5.6. Forwarding syslog entries
Log messages are generated on the device by the operating system and some permanently running processes. They are initially written into a local log via syslog.
You can also send these entries to a central or higher-level syslog server where they can be evaluated, filtered or archived.
Select the item Syslog to configure the forwarding.
In the dialogue box that appears next you can configure which protocol you wish to use for forwarding. Syslog via UDP is more widely used, but not as reliable as via TCP. So if your syslog server supports both protocols it is recommended to use TCP.
You also need to configure the host address of the syslog server that is to receive the log messages.
5.7. Changing the default web page
If you access the host address of the device directly via the web browser without entering a path by default you will be taken to the device’s start page. However it is also possible for you to be forwarded directly to a monitoring site of your choice.
You can configure this using the setting HTTP access without URL. Via this
setting, select the monitoring site to open instead of the web interface.
The Appliance home page can then be reached via the URL along with the path
– for example 192.168.178.60/webconf
.
5.8. Configuring outgoing emails
So that you can send emails from the device (in the case of events during monitoring for example), the forwarding of emails to one of your mail servers must be configured using Outgoing Emails.
In order for the sending of emails to work you must have at least configured the host address of your mail server as an SMTP relay server. This server will then receive the emails from your device and forward them.
However configuring the SMTP relay server is only sufficient as long as your mail server accepts emails via anonymous SMTP. If your mail server requires authentication, then you need to activate the appropriate login method under the Authentication item and indicate the access data of an account that can log onto the mail server.
If you do not even receive any emails after the configuration it is worth taking a look at the device’s system log. All attempts to send emails are logged here.
The device itself can send system emails if there are critical problems (e.g. a job cannot be executed or a hardware problem has been detected). In order to receive these emails you must configure an email address to which these emails are to be sent using Send local system mails to.
5.9. Changing access to Checkmk agents
A Checkmk agent is installed on the device and in the basic setting can only be queried by the device itself. You can use it to create an site on the device and directly add the device to the monitoring.
It is also possible to make the Checkmk agent accessible from another device, meaning the device can also be monitored by another Checkmk system (e.g. in a distributed environment by a central server). For this purpose, you can configure a list of IP addresses that are allowed to contact the Checkmk agent.
6. Remote access via SSH
6.1. Access options
You can activate various access types for the SSH remote management protocol. Basically
access to the console and
direct access to the sites
are supported. Access with the system user root is possible but not recommended or supported, because with that it is pretty easy to damage configurations or software.
6.2. Activating site login via SSH
You can activate access to the command line of the individual monitoring sites, enabling you to view and control the entire environment of the site.
This access is controlled via the site management. In the settings dialogue of each individual site you can activate and deactivate access as well as set a password to protect access.

6.3. Activating console via SSH
It is possible to activate access to the console of the device via the network, enabling you to view and adjust the basic configuration of the device even without direct access to the device.
You can enable access via the configuration dialogue of the console. To do this, select the menu item Activate console via SSH.

When you activate this option, you will be asked to enter a password. You must
enter this password if you are connecting as a setup
user via SSH.
Access will be automatically enabled directly after confirming this dialogue.
You can now connect to the device as a setup
user using an SSH Client
(e.g. PuTTY).
You can check whether access is currently enabled by looking at the Access box of the console’s status screen.
6.4. Activating root access via SSH
It is possible to activate access to the device as a root
system user.
Once the device has initialised however this access will be deactivated. Once
activated you can log onto the device as a root
user via SSH.

Commands you execute on the device as root
can cause lasting
alteration or damage, not only to your data, but also to the delivered
system. The manufacturer shall accept no liability for alterations you make
in this way. Only activate and use the root
user if you are sure
what you are doing and only for diagnostic purposes.
You can enable access via the configuration dialogue of the console. To do this, select the menu item Root access via SSH.

Then set the option to enable.

When you activate this option you will be asked to enter a password. You must
enter this password if you are connecting as a root
user via SSH.
Access will be automatically enabled directly after confirming this dialogue.

You can now connect to the device as a root
user using an SSH Client
(e.g. PuTTY).
You can check whether access is currently enabled by looking at the Access box of the console’s status screen.
7. Protecting the appliance-GUI via TLS
7.1. Setting up TLS access
By default the web interface of your device is accessed via HTTP in plain text. You can protect this access via HTTPS (TLS), so that data is transferred including encryption.
You can open the configuration by pressing the Web access type button in the device settings.
7.2. Installing a certificate
In order to encrypt data traffic the device next needs a certificate and a private key. There are several ways available for you to install these.
Create a new certificate and have it signed by a certification authority by sending a certificate signing request (CSR).
Upload an existing private key and certificate.
Create a new certificate and sign it yourself.
You can choose one of the options above that fits your requirements and possibilities. Certificates signed by certification authorities generally have the advantage that clients can automatically verify the authenticity of the host (device) at the time of access. This is normally the case with official certification authorities.
If a user accesses the web interface via HTTPS and the certificate is either self-signed or signed by a certification authority not trusted by the user, this will cause a warning to appear in the user’s web browser first.
Creating a new certificate and having it signed
To create a new certificate, select the option New certificate. In the dialogue box that follows, you now enter device and operator information, which is then stored on the certificate and can be used by both the certification authority and clients later on to verify the certificate.

Once you have confirmed the dialogue box with Save, you can download the certificate signing request (CSR) file from the web access page. You must provide this file to your certification authority. You will then receive a signed certificate from your certification authority and, where necessary, a certificate chain (often consisting of intermediate and/or root certificates). You will usually receive these in the form of .pem or .crt files or directly in PEM-encoded text form.

You can now transfer the signed certificate to the device via the Upload certificate dialogue. If you have received a certificate chain you can likewise upload it via this dialogue.
Once you have confirmed the dialogue with Upload you can continue configuring the types of access.
Creating a new certificate and signing it yourself
To create a new certificate select the option New certificate. In the dialogue box that follows you now enter device and operator information, which is then stored on the certificate, and which can later be used by clients to verify the certificate.

In the last section Signing method you now select Create a self-signed certificate. After that you can specify the maximum validity period of the certificate.
Once this validity period has expired you must generate a new certificate. This should be done in good time before the expiration so that there are no problems accessing your device.
Once you have confirmed the dialogue with Save you can continue configuring the types of access.
Uploading existing certificate
If you have an existing certificate along with a private key and wish to use this to protect HTTPS traffic, you can transfer these files to your device via the Upload certificate dialogue.
Once you have confirmed the dialogue with Upload you can continue configuring the types of access.
7.3. Configuring access types
Once you have installed a certificate you can now configure the access types according to your requirements.
If you wish to protect access to your device via HTTPS you are recommended to select the HTTPS enforced (incl. redirect from HTTP to HTTPS) option. The device will only respond via HTTPS, but will redirect all incoming HTTP requests to HTTPS. This means that users who inadvertently access the web interface via HTTP, either directly or via bookmarks, will automatically be redirected to HTTPS.
If it is very important that not a single request goes over the net in plain language, you can select the option HTTPS only. This setting will cause users accessing via HTTP to receive an error message.
You can also have a simultaneous configuration of HTTP and HTTPS. However this setting is only recommended in exceptional cases, for migration purposes or for testing.
If you ever want to deactivate HTTPS, you can do this by selecting the HTTP only option.
7.4. Displaying current configuration/certificates
On the access type configuration page you can see the types of access currently active as well as information regarding the current certificate.

8. Device control
8.1. Restarting / Shutting down
You can restart or shut down the device over both the web interface and the console.
In the web interface you will find the menu items Reboot device and Shutdown device under the point Control device in the main menu. The device will execute the action immediately after the command has been selected.

In the console you can open the device control menu by pressing F2
.


You should only shut down your rail2 appliance if you have physical access to the system, since you can only restart the device by disconnecting and restoring the power.
8.2. Restoring factory the configuration
You can reset your device to its factory settings. This means that any changes you have made to the device (e.g. your device settings, monitoring configuration or recorded statistics and logs) will be deleted. When resetting the settings the firmware version currently installed will be retained – the firmware installed with the device as delivered will not be restored.
You can perform this action on the console. To do this press the F2
key on the status screen and select Factory Reset in the dialogue box
that follows. Confirm the next dialogue box by clicking on yes. Your
data will now be deleted from the device and the device then restarted
immediately. The device will now start with a fresh configuration.
9. Backup
9.1. Basics
In order to preserve your monitoring data in case of a hardware failure or similar destruction, a backup of your data can be configured via your appliance’s web user interface.
To be certain the data really is backed up it must be saved to another device – a file server for example. For this, via mount management, first configure the network file sharing to be used for the backup. This will be defined as the target when configuring the data backup. Once this is completed a backup job can be created that at predefined intervals saves a backup of your system to the shared network.
The full backup includes all of the configurations defined on the system, installed files, and likewise your monitoring sites.
The backup is executed (online) during active operations. This can however first be fully-realised when all monitoring sites on the appliance use Checkmk 1.2.8p6, 1.4.0i1 or a Daily-Build from or newer than 22.07.2016. Active sites using older versions will be stopped before, and restarted after the backup.
9.2. Automatic backup
To set up an automatic data backup, configure one or more backup jobs. A backup data set must be created on the target system for each backup job. When each new backup is completed, the previous backup will be deleted – meaning that on the target system double the storage allocation will be temporarily required.
The backup does not manage multiple generations. If you require more copies over an extended time frame to be retained, you will need to create these yourself.
9.3. Configuring the backup
With help from the file system management first configure your
network sharing. In our example a network sharing is configured under the
file path /mnt/auto/backup
.
Next, select the Device backup item in the web interface’s main menu,
and in the next menu open the Backup target. Then create a New backup target.
The title and the ID have a free syntax. Under the Target directory for backup
item configure the mounted network sharing’s data
path - in this case /mnt/auto/backup
. The Is mountpoint option
must be active if you are backing up to a network file sharing – this
verifies to the backup that the file sharing really is mounted.

Once the backup target has been created, return to the Device backup page and from there select New job. Here again you can choose an ID and a title. Next, select the newly-created backup target and define the desired periods for running the backup.

After saving you will see an entry for your new backup job on the Appliance backup page. The scheduled time for the next execution will be shown at the end of this line. As soon as the job has started, or respectively, completed, its status will be shown in this view. Here you can also manually start, or if needed interrupt running backups.

To test your newly created job, click on the Play-icon. You will see in the table that your job is currently running. By clicking on the Log-icon you can display the job’s progress in the form of a log output.

As soon as the backup has completed this will also be shown in the table.

9.4. Backup format
Every backup job creates a directory on the backup target. This directory’s name conforms to the following schema:
Appliance backups:
Checkmk_Appliance-[HOSTNAME]-[LOCAL_JOB_ID]-[STATE]
Site backups:
Checkmk-[HOSTNAME]-[SITE]-[LOCAL_JOB_ID]-[STATE]
In the wildcard character fields, any - (minus) characters are replaced by + so as not to be confused with the field separators.
During the backup the directory will be saved with the suffix:
-incomplete
. Once completed the directory is renamed and the suffix
changed to: -complete
.
A data set mkbackup.info
containing the meta information pertaining
to the backup is saved in the directory. Alongside this file a number of
archives are saved to the directory.
The archive named system
contains the appliance’s configuration,
system-data
contains the data file system’s data – excluding
that of the monitoring sites. The monitoring sites are saved in
separate archives that use the site-[SITENAME]
naming schema.
Depending on the backup’s mode, these data sets are saved with the
.tar
file extension for uncompressed and unencrypted, .tar.gz
for compressed but unencrypted, and .tar.gz.enc
for compressed and
encrypted archives.
9.5. Encryption
If you want to encrypt your backup you can configure this directly from the web user interface. Your backed-up data will then be completely encrypted before being transferred to the backup target. The encryption is achieved using a predefined encryption key. This key is protected by a password defined when creating the key, and with which the key must be securely retained, as only with these is it possible to retrieve the backed up data.
To this end, open the Device backup page and from there select the Backup keys page. Here you can create a new encryption key. When entering the password be sure to use a sufficiently complex character string – the longer and more complex your password, the harder it is for an attacker to decrypt your key and thus your backup.

Once you have created your key, download it and retain it in a secure location.

An encrypted backup can only be restored with the encryption key and its corresponding password.
Now, from the Device backup edit the backup job that is to create the encrypted backups – there activate the Encryption item and select the freshly-created encryption key.

Once you have confirmed the dialogue, the next backup will be automatically encrypted.
9.6. Compression
It is possible to compress the data during the copy procedure. This can be useful if you need to save bandwidth or if space on the target system is limited.
But please be aware however that the compression requires noticeably more CPU time and therefore the backup procedure will take longer. As a rule it is advisable not to activate compression.

Uncompressed backups are first supported from Checkmk-version 1.2.8p5. If you run monitoring sites with older versions, you must activate compression for the complete backup.
9.7. Recovery
Using the web user interface’s built-in functions you can only make a complete restore. Restoring individual data sets via the web interface is not provided. This is nevertheless possible via the command line and by manually unpacking from the backup.
If you wish to restore a complete backup on a currently running appliance, select the Restore item on the Device backup page, and on the next page select the backup target from where you want to source the backed-up data. Once the backup target has been selected a list of all of its available backups will be shown.

Next, click on the arrow beside the backup data you wish to use and the restore will initiate – and following confirmation of a security query the restore will start.
While the restore is running you can view its progress by refreshing the Restore page that will be automatically displayed.

At the end of the restore the appliance will automatically restart – following this new start the restore will be complete.
Disaster recovery
If you need to completely restore an appliance the disaster recovery runs the following steps:
You have an appliance with the factory default configuration (a new, identical appliance, or an appliance that has been reset to the factory default).
Ensure that the firmware version matches that of the backup.
Configure the following minimum settings on the console:
Network settings.
Access to the web interface.
In the web interface, configure:
the backup source from which you wish to restore.
for an encrypted backup upload the security key.
Now start the restore as described in the preceeding chapter.
9.8. Monitoring
From Checkmk version 1.4.0i1, for every configured backup job the Service Discovery on the appliance has a new service: Backup [JOB-ID]. This service notifies of potential problems with the backup, and displays useful values such as size and duration.
9.9. Special features with clusters
The complete configuration of the backups, including the encryption keys will be synchronised between the cluster nodes. The cluster nodes run the backups separately, and likewise save separate directories for their backups on the backup target.
The active cluster node backs up the complete appliance including the data from the data file system and from the monitoring site. The inactive cluster node saves only its local appliance configuration.
Thus, when restoring a backup, only an active cluster node’s backup can restore the monitoring sites.
10. Mounting network file systems
10.1. Introduction
If for example, you wish to make a backup on a shared resource, you must first configure the required network file system.
The network file system (NFS Version 3), the Windows Shared Resources (Samba or CIFS) and SSHFS (SFTP) are currently supported.
10.2. Mounting a network file system
In the web user interface’s main menu select the item Manage mounts and from here create a new file system. Enter an ID that will later be used in devices to identify the file system.

Next select if and how the file system is to be mounted. Recommended is automatic mounting when accessed and, respectively, automatic unmounting when inactive.
Next configure the type of share to be mounted, and finally, depending on this, the necessary settings for mounting the share - for example the file server’s network address and the exported file path in the case of NFS.
Once saved the newly-configured file system and its current status can be viewed in the file system management. By clicking on the plug icon you can manually mount the file system to test that the configuration is correct.

If there are problems you may possibly find error messages in the system log.
11. Failover cluster
11.1. Basics
You can combine two Checkmk appliances into a failover cluster. All configurations and data are synchronized between the two devices. The devices that are connected as a cluster are also called nodes. One of the nodes in the cluster assumes the active role, i.e. performs the tasks of the cluster. Both nodes continuously exchange information about their status. As soon as the inactive node recognizes that the active node can no longer fulfill its tasks – due to a failure for example – the inactive node takes over the tasks and becomes the active node.
The failover cluster is there to increase the availability of your monitoring installation by protecting a device or individual components against hardware failures. The clustering is not a substitute for data backups.
The cluster ensures a shorter downtime in the following situations:
There are two servers: one (active) server performs tasks, such as Monitoring, and the other (inactive) server simply checks that the first server is fulfilling its tasks.
If the active server can no longer access the network, it cannot perform its tasks (for example, Monitoring).
The inactive server notices this and takes over the tasks automatically.
The active server becomes inactive and the inactive server is now active – thus swapping their roles.
The server that is now active and performing the monitoring has also taken over the resources.
If you carry out a firmware update you can update the nodes individually. While one node is being updated the other node will continue to perform the monitoring.
11.2. Prerequisites
In order to build a cluster you will first need two compatible Checkmk appliances. The following models can be clustered with one another:
2x Checkmk rack1
2x Checkmk rack4
2x Checkmk rail2
2x Checkmk virt1
1x Checkmk rack1 and 1x Checkmk virt1
In addition, the two devices must use a compatible firmware, and at least version 1.1.0.
The devices must be wired with at least two mutually-independent network connections. It is recommended to use as direct a connection as possible between the devices, and to make a further connection over your LAN.
To increase the availability of network connections, you should – instead of
using two connections via individual network connectors – create a bonding
configuration that uses all four of the Checkmk rack1’s network connectors.
Use the LAN1
and LAN2
interfaces for the connection to
your network, and the LAN3
and LAN4
interfaces for the
direct connection between the devices.
Virtual machines: If you want to perform the cluster function with two ‚Checkmk virt1’ appliances and VirtualBox, for example for testing, you should do without the bonding configuration and with a total of four network interfaces - this becomes a gamble under VirtualBox, if it works at all. Even if both VMs run on the same machine and thus there are no multiple hardware lines, you still need two virtual network interfaces to be able to set up a separate channel to synchronize the data later. You can easily add these in the VirtualBox management interface of the virt1 machine.
So instead of setting up the bonding as shown below, simply activate the unused second network interface - but not for your normal LAN subnet (e.g., 192.168.178.0/24) – but a separate subnet (e.g. 192.168.100.0/24). For the actual clustering you simply select your two individual interfaces instead of the bundled bonding interfaces.
11.3. Migration of existing installations
Devices that were delivered and initialised with the firmware version 1.1.0 or higher can be clustered without migration.
Devices initialised with earlier firmware must first be updated to version 1.1.0 or higher. The device’s factory settings then need to be restored, preparing the device for clustering. Please note that, in order to prevent data loss during this procedure, you must first back up your data from the device and then restore it.
11.4. Configuration of the cluster
This guide assumes that you have already pre-configured both devices so that the web interface can be opened with a web browser.
Before actually setting up the cluster you must first prepare both devices. This mainly involves adapting the network configuration to fulfill clustering requirements (see prerequisites).
The configuration of a cluster with two Checkmk rack1 is shown in the following. A cluster is built which looks as shown in the diagram below.
The interface designations LAN1, LAN2 etc., used in the diagram correspond to the designations of the physical interfaces on the device. In the operating system, LAN1 corresponds to the device eth0, LAN2 to the device eth1 etc.

This configuration complies with the recommendations for the clustering
of two Checkmk rack1s. You can of course use IP addresses suitable for
your environment. Make sure however that the internal cluster network
(bond1
in the diagram) uses a different IP network to the
‚external‘ network (bond0
in the diagram).
Network configuration
Open the web interface of the first node, select Device settings and Network settings at the top. You will now be on the network settings configuration page. There are two modes available to you here. The Simple Mode which you can only use to configure your device’s LAN1 is activated by default.

The Advanced mode is required for clustering. In order to activate this mode click on the Advanced mode button at the top and confirm the security prompt.
On the following page you will see all of the network interfaces available in the device. Only the interface eth0 (corresponding to LAN1) – enp0s17 in the screenshot will currently have a configuration which was applied by the Simple Mode.

Now create the first bonding interface bond0
by clicking on Create Bonding.
For this purpose enter into the dialogue that follows all data
as shown in the diagram below, and confirm the dialogue with Save.

Now create the second bonding interface bond1
with the appropriate
configuration.

After you have created the two bonding interfaces, in the network configuration dialogue you will be able to review all of the settings for the network interfaces …

… and likewise the bondings:

Once you have successfully completed all configuration steps, make the settings effective by clicking on Activate Changes. The new network settings will then be loaded. After few seconds the network interface configuration will look like this, with OK Statuses for the interfaces:

And the bonding configuration will look like this:

Now, with the appropriate settings, repeat the configuration of network settings on your second device.
Host names
Devices to be connected in a cluster must have different host names. You
can specify these now in the device settings.
In our example, we configure node1
as a host name on the first
device and node2
on the second device.
Connecting the cluster
Having completed preparations you can now continue setting up the cluster. To
do this open the Clustering module in the main menu of the first device
(here node1
) in the web interface and click on Create Cluster.
Now enter the appropriate configuration in the cluster creation dialogue and confirm the dialogue with Save. If you require more information about this dialogue, click on the icon beside the Checkmk logo in the top right-hand corner. Context help will then appear in the dialogue explaining the individual options.

On the following page you can connect the two devices to form a cluster. To do this you need to enter the password of the web interface of the second device. This password is used once to establish the connection between the two devices. Then confirm the security prompt if you are sure that you want to overwrite the data of the target device with the IP address displayed.

Once this connection is successful, cluster setup is commenced. You can have the current status displayed on the cluster page. As soon as the cluster has been successfully built, the synchronisation of monitoring data will start from the first to the second node. While this synchronisation is still taking place, all resources – including any monitoring sites you may have – will be started on the first node.

From now on you can, using the cluster IP address (here
192.168.178.110
), access the resources of the cluster (e.g., your
monitoring sites), regardless of the node by which the resources are
currently being held.
11.5. The state of the cluster
When the first synchronisation is complete, your cluster will be fully operational. You can view the state at any time on the cluster page.

Using the status screen on the console you can also view the current state of the cluster in a summarised form in the Cluster box. The role of the respective node is shown after the current status with (M) for the main host und (S) for the subordinate host.

11.6. Special cases in the cluster
Access to resources
All requests to the monitoring sites (e.g. web interface access) as well as incoming messages (e.g. SNMP traps or syslog messages to the event console or requests to Livestatus) should normally always be sent via the cluster IP address.
Only in exceptional cases (e.g. diagnostics or updates of a particular node) should you need to access the individual nodes directly.
Device settings
The settings (e.g. time synchronisation or name resolution settings) that have been made independently on the individual devices until now are synchronised between the two nodes in the cluster.
However, you can only execute these settings on the node that is active at the time. The settings are locked on the inactive node.
There are some device-specific settings, (e.g. those of the management interface of the Checkmk rack1) which you can adapt to the individual devices at any time.
IP addresses or host names of the nodes
To be able to edit the IP configuration of the individual nodes, you must first disable the connection between the nodes. To do this click on Disconnect cluster on the cluster page. You can then adapt the desired settings via the web interface of the individual nodes.
Once you have made the adjustments you must now select Reconnect cluster on the cluster page. If the nodes can be successfully reconnected the cluster will resume operation after a few minutes. You can see the status on the cluster page.
Administering Checkmk versions and monitoring sites
The monitoring sites and Checkmk versions are also synchronised between the two nodes. You can only modify these in the web interface of the active node.
If, to do this, you also access the cluster IP address directly, you will always be referred to the device with which you can configure these things.
11.7. Administrative tasks
Firmware updates in the cluster
The firmware version of a device is not synchronised in cluster operation. The update is thus carried out for each node. You have the advantage however that one node can continue performing the monitoring while the other node is updated.
When updating to a compatible firmware version, you should always proceed as follows:
First open the Clustering module in the web interface of the node to be updated.
Now click on the heart symbol in the column of this node and confirm the security prompt that follows. This will put the node into maintenance state.
Nodes that are in maintenance state release all resources currently active on the node, upon which the other node takes control of them.
While a node is in maintenance state, the cluster is not failsafe – so if the active node is now switched off, the inactive node in maintenance state will not take control of the resources. If you now additionally put the second node into maintenance state, all resources will be shut down. These will only be reactivated when a node is taken out of its maintenance state. You must always remove the maintenance state again manually.
If the cluster page shows the following you will see that the node is in maintenance state.

You can now perform the firmware update on this node, and likewise on standalone devices.
After you have successfully performed the firmware update, open the cluster page once more and remove the maintenance state of the updated device. The device will then automatically merge into cluster operation, upon which the cluster again becomes fully functional.

It is recommended to run the same firmware version on both nodes. You should therefore next repeat the same procedure for the other node.
11.8. Disbanding clusters
It is possible to disband the nodes from a cluster and continue running them separately. When doing so you can continue using the synchronised configuration on both devices, or reset one of the devices to factory settings and reconfigure it for example.
You can remove one or both nodes from the cluster during operation. If you wish to use both nodes, you must ensure that the data synchronisation is in good working order beforehand. You can verify this on the cluster page.
In order to disband a cluster, click on Disband Cluster on the cluster page of the web interface. Read the text of the confirmation prompt that follows. In the different situations, this text contains information as to the state the respective device is in following the disconnection.

The disconnection of the devices must be carried out on both nodes separately, so that both devices can be run separately in future.
If you only wish to use one of the devices in future, disconnect the cluster on the device you intend to continue using and then restore the factory settings on the other device.
Once you have disconnected a node from the cluster the monitoring sites will not be started automatically. If you wish to start the monitoring sites, you will need to do so via the web interface.
Exchanging a device
If the hard drives of the old device are in good order, you can take these from the old device and insert them into the new device, wiring the new device in exactly the same way as the old device was wired and then switching it on. After starting the new device will merge into the cluster in the same way as the old device.
If you want to completely replace an old device with a new one, you should proceed in the same way as when disbanding the cluster completely (see previous chapter). To do this select one of the previous devices, disconnect this device from the cluster and create a new cluster with this device and the new device.
11.9. Diagnostics and troubleshooting
Logging
Cluster administration is a largely automatic function, whereby automatic processes on the nodes decide for which device which resources are to be started and stopped on. This behaviour is logged in the form of detailed log entries. You can access these entries from the cluster page by pressing the button Cluster Log.
Please note that these entries – just like the other system messages – are lost when restarting the device. If you would like to keep the messages for longer you can download the current log file over your browser or set up a permanent forwarding of log messages to a syslog server.
12. SMS notifications
12.1. Hardware
It is possible to attach a GSM modem to the device in order to have SMS notifications sent over it by Checkmk (in the event of critical problems for example).
At the moment it is not possible to order a UMTS/GSM modem together with your appliance nor later as an accessory. But there are several modems like the MTD-H5-2.0, which are compatible to the appliance.
12.2. Starting up the modem
In order to put the modem into operation you must insert a functioning SIM card, attach the modem to a free USB connector on your appliance using the enclosed USB cable, and connect the modem to the mains using the enclosed power adapter.
As soon as this has been done the device will automatically detect the modem and set it up. Open the device’s web interface and select the Manage SMS module. The current state of the modem as well as the connection with the mobile phone network will be displayed on this page.

If you need to enter a PIN to use your SIM card, you can specify this PIN under SMS settings.

12.3. Diagnostics
If sent messages do not reach you, you can view all sent or non-sent messages and messages awaiting sending on the page Manage SMS. The entries in these lists will be kept for a maximum of 30 days and then automatically deleted.
It is possible, via the menu item Send test SMS, to send a test SMS to a number of your choice.
The telephone number must be entered without leading zeros and without a leading plus sign, e.g. 491512345678
for a mobile phone number in Germany.

You will find further information on possible SMS sending errors in the SMS Log.
13. Administering RAID on the racks
13.1. The RAID system
Your rack has two hard drive bays on the front. These are marked with numbers 1 and 2. The hard disks installed here are interconnected in a RAID-1 array (mirror) so that your data is stored redundantly on both hard disks. If one of the hard disks fails the data is still available on the second hard disk.
13.2. Administration in the web interface
You can view the state of the RAID in your device’s web interface. To do this select the item RAID-Setup in the main menu of the web interface. This screen also gives you the option to repair the RAID if necessary.

13.3. Exchanging a defective hard drive
If a hard drive is detected as being defective, this will be displayed in the web interface with defective. On the actual device – depending on the nature of the error – this will be shown by a blue flashing LED at the hard drive bay.

Moving the small lever on the left-hand side of the bay will unlock the fixing mechanism, enabling you to pull the frame out of the housing together with the hard drive. You can now loosen the screws on the underside of the frame and remove the defective hard drive. Now mount the new hard drive into the frame and push the frame back into the free bay of the device.
If the device is switched on while you are exchanging the hard drive, the RAID rebuild will start automatically. You can view this procedure’s progress in the web interface.

Failsafe operation is only restored once the RAID has been completely repaired.
The hard drive must be at least the same size as the RAID itself. You can verify this in the RAID status view.
13.4. Both hard drives defective
If the device detects that both hard drives are defective or have been removed from the device a restart is automatically triggered.
14. Management interface in the rack
Your rack has a built-in management interface that allows network access to the device even when it is not powered on. You can use the web interface of this management interface, for example, to control the device if it is not switched on or no longer accessible, and to remotely control the local console.
If you would like to use the management interface, you must first connect the dedicated IPMI LAN connector with your network.

For security reasons we recommend connecting the IPMI LAN with a dedicated management network where possible.
The management interface is delivered deactivated. You can activate and configure it via the Management Interface setting in the device settings.

You must assign a separate IP address for the management interface and specify dedicated access data for the access to the management interface.
Once you have saved these settings you can open the management interface’s IP address with your web browser and log in there using the access data you just specified.
15. Diagnostics
15.1. Logs
Despite careful tests, it cannot be altogether ruled out that unexpected errors may occur, which are difficult to diagnose without looking at the operating system.
One option is to have the log entries that are generated on the system sent via syslog to a syslog server. However the log entries of the individual monitoring sites are not processed via syslog, meaning they are not forwarded and can only be viewed on the device.
In order to make diagnostics on the device easier there is a view that displays the device’s various log files. You can go to this view by clicking on the Log Files menu item in the web interface’s main menu.

You can select the device’s logs and view their current content here.

The system log is reinitialized each time the device is started up. If you would like to keep the log entries you must send them to a syslog server.
It is also possible to view the system log on the local console. The latest
entries from the system log are displayed on the second terminal. You
can access this terminal via the key combination CTRL+ALT+F2
. All
kernel messages are displayed on the third terminal. In the case of hardware
problems, you will find the relevant messages here. This terminal can be
accessed via the key combination CTRL+ALT+F3
. The key combination
CTRL+ALT+F1
will take you back to the status screen.
15.2. Available Memory
The system memory of the device is available to your monitoring sites, reduced by the amount of memory which is needed by the system processes of the Checkmk appliance.
To provide a stable system platform a fixed amount if memory is reserved for the mandatory system processes. The exact amount of reserved memory depends on your device configuration:
Standalone device (no cluster configuration): 100 MB
Clustered: 300 MB
If you want to know exactly how much memory is available to your monitoring sites and how much is currently being used, you can monitor your device using Checkmk. After a service discovery the host automatically monitors a service User_Memory which shows you the current and historical values.
In case your you monitoring sites are trying consume more memory than available, one of the processes of the monitoring sites is automatically killed. This is done by standard mechanisms in the Linux Kernel.
16. Service and support
16.1. Manual
If you encounter any problems during start-up or operation please consult this manual first.
16.2. Internet
You can get up-to-date support information from our website. You will find the latest version of the documentation here as well as general information which is regularly updated and more detailed than this manual.
16.3. Firmware
You will find the latest firmware versions on our website. You can access this firmware using the access data for your current support contract.
16.4. Hardware support
In the event of hardware failure please contact us by email at cma-support@checkmk.com, or call us on +49 89 99 82 097 - 20. The problem will be handled by the distributor directly and in accordance with the maintenance agreed upon.
16.5. Software support
In the case of a software fault – whether firmware or Checkmk monitoring software – please contact us via your company’s own support address. Support will be provided based on the agreed support contract.