Checkmk
to checkmk.com

1. Basic configuration

1.1. Initializing at an initial start

At this point you should have set up the appliance on a rack or installed and started it in a virtual machine, either VirtualBox or VMWare ESXi.

During the first start there will be a message for specifying the desired language:

cma boot preinit language 2

The selected language will be set for the whole device. Afterwards, a message appears prompting you to initialise the data volume:

cma boot preinit init uservol 2

Confirm this dialogue and wait for device startup to be resumed and to complete. The status view will now be displayed on the local console:

cma console welcome 2

This view shows you general status information and the most important configuration options for your device.

1.2. Network and access configuration via the console

From the status view you can get to the configuration menu by pressing the F1 key:

cma console config 2

To put the device into operation, you now need to set up the network configuration and specify the device password.

Network configuration

First set up the network via the Network Configuration item. To do this, the IP address, netmask and the optional standard gateway will be queried one after the other.

In most cases the device will need to also access network devices outside of its own network segment. The default gateway must also be configured for this purpose.

Once these values have been entered the configuration will be activated — meaning the device will be immediately reachable via the network and at the entered IP address. One way of testing this is to send a ping from another device in the network.

Enabling the web interface

A large part of the device’s configuration is carried out via the web interface. Access to this web interface is protected by a password – the device password – which you will first need to define. The factory settings do not include a device password, which means that you cannot yet access the web interface.

In the configuration menu, select Device Password in order to specify the password. The password must be at least 8 characters long and must contain at least one lower case letter, one upper case letter and one digit.

Then select Web Configuration from the configuration menu to enable the web interface.

Once you have completed these steps you will see that the console’s status view will have changed:

cma console welcome basic 2

In the Device Infos box you will see the configured IP address and in the Access box Web Configuration: on. If you have already connected the device correctly to your network, you will also see in the Status box that the network connection is active: LAN: UP.

Protecting access to the console

When you started the appliance, you may have noticed that there was no password prompt. Anyone who has direct access to the rack or the virtualisation solution’s management interface is able to change the basic configuration via the console.

Therefore you should activate password protection: in the configuration menu via the Console Login menu item. If protection is activated, the device password is requested before the status view is displayed and settings can be changed.

You will then see the Console Login: on entry in the status view in the Access box.

cma console welcome console login 2

1.3. Basic settings on the web interface

Once you have enabled access to the web interface through the previous configuration, you can now access the web interface via a web browser on a computer connected to the device via the network. To do this enter the appliance URL in the address bar of the browser, in this case http://192.168.178.60/. Here you can see the web interface’s login screen:

cma webconf login 2

When you have logged in with the previously-set device password, the main menu will open. From here you can access all of the features in the web interface.

cma webconf index 2

Select Device Settings to view the most important device settings and to change these if necessary:

cma webconf settings 2

By clicking on the parameter name you will be taken to the respective page for editing the respective setting.

If you have DNS servers available in your environment you should now configure one or more of these so that the resolution of host names can be used. If you have one or more NTP servers for time synchronisation available in your environment, enter these as IP addresses or host names under NTP Servers.

If emails are to be sent from your device – such as notifications in case of problems – you must configure the Outgoing Emails option. To do this enter the SMTP relay server responsible for this device and any access data required. All emails generated on the device will be sent to this server. Under this setting you can also configure all emails generated by the device’s operating system (e.g. in the case of critical errors) to be sent to a particular email address.

cma webconf settings 3

This completes the basic device configuration and you can proceed with the installation of the Checkmk software and the setup of the first monitoring site.

2. Administering Checkmk software versions

Starting with appliance version 1.4.14, Checkmk software is no longer pre-installed on the appliance.

The Checkmk software for installation on the appliance is available to you as a CMA file. Download the CMA file — either in the customer portal for the Standard Edition and Managed Services Edition, or on the download page for the Free Edition. You will find the CMA file after selecting the appropriate Checkmk edition, version and platform appliance.

After downloading the CMA file, select Check_MK versions from the main menu, on the following page the CMA file from your hard disk with the help of the file selection dialogue, and confirm your selection by clicking on Upload & Install.

The Checkmk software will be uploaded to the device. Depending on the network connection between your computer and the device this may take a few minutes. Once uploading is complete you will see the new version in the table of installed versions:

cma webconf cmk versions upload1 finished

It is possible to install several Checkmk versions on the device at the same time. This allows several sites to be run using differing versions, and for individual sites to be changed to newer or older versions independently of one another. This means that you can install a new version for example and try it out initially in a test site in order to then update your production site if the testing has been successful.

You load and install another Checkmk software version in the same way as the first one. The result will look like this:

cma webconf cmk versions upload2 finished

If a software version is not being used by any site you have the option to delete it with the recycle bin icon.

3. Administering monitoring sites

3.1. Creating a site

In the main menu of the web interface, click on Site Management. On this page you have access to all of the monitoring sites on this device. You can configure, update and delete sites, as well as create new ones.

The first time you open the page it will be empty. To create your first site, click on the Create New Site button. On the following page you can specify the initial site configuration.

cma webconf site create 2

Start by entering a site ID which serves to uniquely identify the site. The ID may only contain letters, numbers, hyphens (-) and underscores (_), must start with a letter or underscore and may be a maximum of 16 characters in length.

Now select the Checkmk version with which to create the site. You will be offered all versions that are installed in the administration of Checkmk software versions.

Finally, you define the user name of the Checkmk administrator with password. You can leave all other settings as they are for the time being. If necessary, you can edit these later via the site editing page.

Click Create Site to create the site. This may take a few seconds. Once the site has been created and started you will be taken to the list of all sites:

cma webconf site list 2

The list is short and currently only shows the site just created with the ID mysite and its status, here running. With the button on the far right in the Control column you can stop or start the site. On the far left in the Actions column, symbols are shown for the possible actions you can apply to the site, from left to right: Edit, Update, Rename, Clone, Delete and Login.

You can now log in to the site that has been started, either by clicking on the site ID or by calling up the URL of the site in the address bar of your web browser, in our example: http://192.168.178.60/mysite. In the site’s login dialogue, enter the access data you specified when creating the site.

Once you have logged in you can set up Checkmk in the usual manner — the first steps are described in the article on Getting started with monitoring.

The snapin Checkmk Appliance is available in all sites and for all administrators. You will find it in the sidebar:

cma site sidebar 2

The entries in this snapin will take you from your sites directly to the appliance’s web interface.

3.2. Updating a site

When updating a site, it is updated with a new Checkmk software version. First install the desired new version as described in the chapter on administration of Checkmk software versions.

Then list the sites on the appliance’s web interface (Main Menu > Site Management):

cma webconf site list 2

Make sure that the site is not running, i.e. if the Status is currently running, stop the site (Control > Stop). Then, under Actions, click on the icon cma site update update icon.

The following page lists the possible target versions for the update:

cma site update 1

Select the Target Check_MK version and then click on Update now. After a short time, the update messages are displayed:

cma site update 2

With the Back button you can return to the list of sites in which the new Checkmk version is now shown in the Version column. You can now restart the site.

Note: The update of a site in the appliance follows the same principle as the update on a Linux server. In the event of problems, error messages or conflicts, you can obtain detailed information on the update process in the article Updates and Upgrades.

3.3. Migrating a site

It is a commonly required to migrate existing sites from other Linux systems to a Checkmk appliance. The Checkmk appliance offers a page for this purpose with which you can carry out this migration.

The following requirements need to be met:

  • You need to have a network connection between the source system and your device.

  • The Checkmk version of the source site needs to be installed on your device (architecture changes from 32-bit to 64-bit are possible).

  • The source site needs to be stopped during the migration.

Open the main menu of the web interface and click on Site Management. Then click on the Migrate Site button, which will take you to the below menu:

cma webconf site migrate start 2

On this page under Source Host you first need to configure the host address (host name, DNS name or IP address) of the source system which you want to migrate the site from. Next under Source site ID you need to enter the site ID of the site you want to migrate.

The migration of the site is done via SSH. To get access to the source site, you need to provide the credentials of a user which is able to connect to the source system and access all of the source site’s files. You can use the root user of the source system or, if you have configured a password for the site user, you can use the site user credentials.

Optionally you can choose to let the migration create the site with a new site ID on your device, or carry the original ID unchanged over to the new device.

Youa additionally have the option to skip the carrying-over of performance data (measurements, graphs) and historical data during the migration. This can be useful if you don’t need an exact copy of the source site and only want to duplicate it – e.g. for testing purposes.

After you have filled in the parameters and confirmed with Start, the progress of the migration will be displayed:

cma webconf site migrate progress 2

Once the migration has completed you can finalise the migration by clicking on Complete. You will be returned to the site management where you can start and manage the newly-imported site in the usual way.

cma webconf site migrate complete 2

4. Updating the firmware

You can update your device’s software, i.e. the firmware of the appliance, to a newer version during operation — or also change back to an older version. Before updating the firmware, you must obtain the new firmware.

You can download the appliance firmware as a CFW file from the customer portal for the Standard Edition and Managed Services Edition, or from the download page for the Free Edition. The CFW file can be found in the download area at the product Checkmk Appliance.

Note: Be sure to select firmware that matches the edition installed on your appliance. However, if you want to upgrade from the Free Edition to a full version of the Enterprise Editions, this is also possible by selecting the firmware for the full version and upgrading to it.

After downloading the CFW file, select Firmware Update from the main menu and on the following file selection dialogue page, select the CFW file from your hard drive:

cma webconf firmware upload 2

Confirm with a click on Upload & Install. The firmware will now be loaded onto your device. Depending on the network connection, this may take a few minutes.

Once the file has been recognised as valid firmware, the Confirm Firmware Update dialogue will be displayed. Depending on the differences between the current version and the one to be installed various messages will appear telling you what to do with your data during the update.

  • Change of the first digit (major release) of the version number: You must back up the data from your device manually and restore it after the update. An update cannot be performed without data migration.

  • Update to a higher number in the second digit (minor release): The update can be carried out without data migration. You are advised to back up your data beforehand in any case.

  • Downgrade to lower number in the second digit: You must back up the data from your device manually and restore it after the update. An update cannot be performed without data migration.

  • Change of the third digit (patch) of the version number: The update can be carried out without data migration. You are advised to back up your data beforehand in any case.

At this point, if necessary, you can cancel the dialogue with No and first make a backup. Then start a new attempt.

Important: If you confirm the Confirm Firmware Update dialogue with Yes!, the device will be rebooted immediately!

During the restart, the selected firmware will be installed. This will cause restarting to take much longer than usual. It will normally take less than 10 minutes however. Afterwards, another reboot will be carried out automatically, which then completes the firmware update. The console status view will show the newly-installed firmware version.

5. Device settings

5.1. Changing the language

During the basic configuration you specified the language for your device. You can change this at any time, either via the console configuration or via the device settings in the web interface. Like all other settings in this dialog changes will be effective immediately when saved.

5.2. Changing the network configuration

During the basic configuration you specified the network configuration of your device. You can change this at any time, either via the console configuration or via the device settings in the web interface. If you made an error when specifying the network configuration and the device is no longer accessible via the network you can only correct the settings on the console.

5.3. Configuring host and domain names

Host and domain names serve to identify a computer in the network. When sending emails for example, these names are used to form the sender address. In addition, the configured host name is added as a source host to all log entries that are sent to a syslog server. This makes it easier to assign the entries.

5.4. Configuring name resolution

In most environments DNS servers are used to translate IP addresses into host names and vice versa. Host names or FQDNs (Fully Qualified Domain Names) are frequently used for monitoring instead of IP addresses.

In order to use the name resolution on your device, you must configure the IP addresses of at least one DNS server in your environment. It is recommended to enter at least two DNS servers.

Only when you have configured this option can you use host and domain names (in the configuration of NTP or mail servers for example).

5.5. Configuring time synchronisation

The system time of the device is used for many purposes, such as for recording measurement data or writing log files. A stable system time is therefore very important. This is best ensured by using a time synchronisation service (NTP).

To activate the synchronisation enter the host address of at least one time server under NTP server.

5.6. Forwarding syslog entries

Log messages are generated on the device by the operating system and some permanently running processes. They are initially written into a local log via syslog.

You can also send these entries to a central or higher-level syslog server where they can be evaluated, filtered or archived.

Select the item Syslog to configure the forwarding.

In the dialogue box that appears next you can configure which protocol you wish to use for forwarding. Syslog via UDP is more widely used, but not as reliable as via TCP. So if your syslog server supports both protocols it is recommended to use TCP.

You also need to configure the host address of the syslog server that is to receive the log messages.

5.7. Changing the default web page

If you access the host address of the device directly via the web browser without entering a path by default you will be taken to the device’s start page. However it is also possible for you to be forwarded directly to a monitoring site of your choice.

You can configure this using the setting HTTP access without URL. Via this setting, select the monitoring site to open instead of the web interface. The Appliance home page can then be reached via the URL along with the path – for example 192.168.178.60/webconf.

5.8. Configuring outgoing emails

So that you can send emails from the device (in the case of events during monitoring for example), the forwarding of emails to one of your mail servers must be configured using Outgoing Emails.

In order for the sending of emails to work you must have at least configured the host address of your mail server as an SMTP relay server. This server will then receive the emails from your device and forward them.

However configuring the SMTP relay server is only sufficient as long as your mail server accepts emails via anonymous SMTP. If your mail server requires authentication, then you need to activate the appropriate login method under the Authentication item and indicate the access data of an account that can log onto the mail server.

If you do not even receive any emails after the configuration it is worth taking a look at the device’s system log. All attempts to send emails are logged here.

The device itself can send system emails if there are critical problems (e.g. a job cannot be executed or a hardware problem has been detected). In order to receive these emails you must configure an email address to which these emails are to be sent using Send local system mails to.

5.9. Changing access to Checkmk agents

A Checkmk agent is installed on the device and in the basic setting can only be queried by the device itself. You can use it to create an site on the device and directly add the device to the monitoring.

It is also possible to make the Checkmk agent accessible from another device, meaning the device can also be monitored by another Checkmk system (e.g. in a distributed environment by a central server). For this purpose, you can configure a list of IP addresses that are allowed to contact the Checkmk agent.

6. Remote access via SSH

6.1. Access options

You can activate various access types for the SSH remote management protocol. Basically

  • access to the console and

  • direct access to the sites

are supported. Access with the system user root is possible but not recommended or supported, because with that it is pretty easy to damage configurations or software.

6.2. Activating site login via SSH

You can activate access to the command line of the individual monitoring sites, enabling you to view and control the entire environment of the site.

This access is controlled via the site management. In the settings dialogue of each individual site you can activate and deactivate access as well as set a password to protect access.

cma webconf site edit pw 2

6.3. Activating console via SSH

It is possible to activate access to the console of the device via the network, enabling you to view and adjust the basic configuration of the device even without direct access to the device.

You can enable access via the configuration dialogue of the console. To do this, select the menu item Activate console via SSH.

cma console config ssh console 2

When you activate this option, you will be asked to enter a password. You must enter this password if you are connecting as a setup user via SSH. Access will be automatically enabled directly after confirming this dialogue.

You can now connect to the device as a setup user using an SSH Client (e.g. PuTTY).

You can check whether access is currently enabled by looking at the Access box of the console’s status screen.

6.4. Activating root access via SSH

It is possible to activate access to the device as a root system user. Once the device has initialised however this access will be deactivated. Once activated you can log onto the device as a root user via SSH.

cma caution

Commands you execute on the device as root can cause lasting alteration or damage, not only to your data, but also to the delivered system. The manufacturer shall accept no liability for alterations you make in this way. Only activate and use the root user if you are sure what you are doing and only for diagnostic purposes.

You can enable access via the configuration dialogue of the console. To do this, select the menu item Root access via SSH.

cma console config ssh root 1 2

Then set the option to enable.

cma console config ssh root 2 2

When you activate this option you will be asked to enter a password. You must enter this password if you are connecting as a root user via SSH. Access will be automatically enabled directly after confirming this dialogue.

cma console config ssh root 3 2

You can now connect to the device as a root user using an SSH Client (e.g. PuTTY).

You can check whether access is currently enabled by looking at the Access box of the console’s status screen.

7. Protecting the appliance-GUI via TLS

7.1. Setting up TLS access

By default the web interface of your device is accessed via HTTP in plain text. You can protect this access via HTTPS (TLS), so that data is transferred including encryption.

You can open the configuration by pressing the Web access type button in the device settings.

7.2. Installing a certificate

In order to encrypt data traffic the device next needs a certificate and a private key. There are several ways available for you to install these.

  • Create a new certificate and have it signed by a certification authority by sending a certificate signing request (CSR).

  • Upload an existing private key and certificate.

  • Create a new certificate and sign it yourself.

You can choose one of the options above that fits your requirements and possibilities. Certificates signed by certification authorities generally have the advantage that clients can automatically verify the authenticity of the host (device) at the time of access. This is normally the case with official certification authorities.

If a user accesses the web interface via HTTPS and the certificate is either self-signed or signed by a certification authority not trusted by the user, this will cause a warning to appear in the user’s web browser first.

Creating a new certificate and having it signed

To create a new certificate, select the option New certificate. In the dialogue box that follows, you now enter device and operator information, which is then stored on the certificate and can be used by both the certification authority and clients later on to verify the certificate.

cma webconf ssl csr 2

Once you have confirmed the dialogue box with Save, you can download the certificate signing request (CSR) file from the web access page. You must provide this file to your certification authority. You will then receive a signed certificate from your certification authority and, where necessary, a certificate chain (often consisting of intermediate and/or root certificates). You will usually receive these in the form of .pem or .crt files or directly in PEM-encoded text form.

cma webconf ssl upload 2

You can now transfer the signed certificate to the device via the Upload certificate dialogue. If you have received a certificate chain you can likewise upload it via this dialogue.

Once you have confirmed the dialogue with Upload you can continue configuring the types of access.

Creating a new certificate and signing it yourself

To create a new certificate select the option New certificate. In the dialogue box that follows you now enter device and operator information, which is then stored on the certificate, and which can later be used by clients to verify the certificate.

cma webconf ssl self signed 2

In the last section Signing method you now select Create a self-signed certificate. After that you can specify the maximum validity period of the certificate.

Once this validity period has expired you must generate a new certificate. This should be done in good time before the expiration so that there are no problems accessing your device.

Once you have confirmed the dialogue with Save you can continue configuring the types of access.

Uploading existing certificate

If you have an existing certificate along with a private key and wish to use this to protect HTTPS traffic, you can transfer these files to your device via the Upload certificate dialogue.

Once you have confirmed the dialogue with Upload you can continue configuring the types of access.

7.3. Configuring access types

Once you have installed a certificate you can now configure the access types according to your requirements.

If you wish to protect access to your device via HTTPS you are recommended to select the HTTPS enforced (incl. redirect from HTTP to HTTPS) option. The device will only respond via HTTPS, but will redirect all incoming HTTP requests to HTTPS. This means that users who inadvertently access the web interface via HTTP, either directly or via bookmarks, will automatically be redirected to HTTPS.

If it is very important that not a single request goes over the net in plain language, you can select the option HTTPS only. This setting will cause users accessing via HTTP to receive an error message.

You can also have a simultaneous configuration of HTTP and HTTPS. However this setting is only recommended in exceptional cases, for migration purposes or for testing.

If you ever want to deactivate HTTPS, you can do this by selecting the HTTP only option.

7.4. Displaying current configuration/certificates

On the access type configuration page you can see the types of access currently active as well as information regarding the current certificate.

cma webconf ssl info 2

8. Device control

8.1. Restarting / Shutting down

You can restart or shut down the device over both the web interface and the console.

In the web interface you will find the menu items Reboot device and Shutdown device under the point Control device in the main menu. The device will execute the action immediately after the command has been selected.

cma webconf control 2

In the console you can open the device control menu by pressing F2.

cma console actions 2
cma caution

You should only shut down your rail2 appliance if you have physical access to the system, since you can only restart the device by disconnecting and restoring the power.

8.2. Restoring factory the configuration

You can reset your device to its factory settings. This means that any changes you have made to the device (e.g. your device settings, monitoring configuration or recorded statistics and logs) will be deleted. When resetting the settings the firmware version currently installed will be retained – the firmware installed with the device as delivered will not be restored.

You can perform this action on the console. To do this press the F2 key on the status screen and select Factory Reset in the dialogue box that follows. Confirm the next dialogue box by clicking on yes. Your data will now be deleted from the device and the device then restarted immediately. The device will now start with a fresh configuration.

9. Backup

9.1. Basics

In order to preserve your monitoring data in case of a hardware failure or similar destruction, a backup of your data can be configured via your appliance’s web user interface.

To be certain the data really is backed up it must be saved to another device – a file server for example. For this, via mount management, first configure the network file sharing to be used for the backup. This will be defined as the target when configuring the data backup. Once this is completed a backup job can be created that at predefined intervals saves a backup of your system to the shared network.

The full backup includes all of the configurations defined on the system, installed files, and likewise your monitoring sites.

The backup is executed (online) during active operations. This can however first be fully-realised when all monitoring sites on the appliance use Checkmk 1.2.8p6, 1.4.0i1 or a Daily-Build from or newer than 22.07.2016. Active sites using older versions will be stopped before, and restarted after the backup.

9.2. Automatic backup

To set up an automatic data backup, configure one or more backup jobs. A backup data set must be created on the target system for each backup job. When each new backup is completed, the previous backup will be deleted – meaning that on the target system double the storage allocation will be temporarily required.

The backup does not manage multiple generations. If you require more copies over an extended time frame to be retained, you will need to create these yourself.

9.3. Configuring the backup

With help from the file system management first configure your network sharing. In our example a network sharing is configured under the file path /mnt/auto/backup.

Next, select the Device backup item in the web interface’s main menu, and in the next menu open the Backup target. Then create a New backup target. The title and the ID have a free syntax. Under the Target directory for backup item configure the mounted network sharing’s data path - in this case /mnt/auto/backup. The Is mountpoint option must be active if you are backing up to a network file sharing – this verifies to the backup that the file sharing really is mounted.

cma de backup target new 2

Once the backup target has been created, return to the Device backup page and from there select New job. Here again you can choose an ID and a title. Next, select the newly-created backup target and define the desired periods for running the backup.

cma de backup job new 2

After saving you will see an entry for your new backup job on the Appliance backup page. The scheduled time for the next execution will be shown at the end of this line. As soon as the job has started, or respectively, completed, its status will be shown in this view. Here you can also manually start, or if needed interrupt running backups.

cma de backup job list 2

To test your newly created job, click on the Play-icon. You will see in the table that your job is currently running. By clicking on the Log-icon you can display the job’s progress in the form of a log output.

cma de backup job log 2

As soon as the backup has completed this will also be shown in the table.

cma de backup list complete 2

9.4. Backup format

Every backup job creates a directory on the backup target. This directory’s name conforms to the following schema:

  • Appliance backups: Checkmk_Appliance-[HOSTNAME]-[LOCAL_JOB_ID]-[STATE]

  • Site backups: Checkmk-[HOSTNAME]-[SITE]-[LOCAL_JOB_ID]-[STATE]

In the wildcard character fields, any - (minus) characters are replaced by + so as not to be confused with the field separators.

During the backup the directory will be saved with the suffix: -incomplete. Once completed the directory is renamed and the suffix changed to: -complete.

A data set mkbackup.info containing the meta information pertaining to the backup is saved in the directory. Alongside this file a number of archives are saved to the directory.

The archive named system contains the appliance’s configuration, system-data contains the data file system’s data – excluding that of the monitoring sites. The monitoring sites are saved in separate archives that use the site-[SITENAME] naming schema.

Depending on the backup’s mode, these data sets are saved with the .tar file extension for uncompressed and unencrypted, .tar.gz for compressed but unencrypted, and .tar.gz.enc for compressed and encrypted archives.

9.5. Encryption

If you want to encrypt your backup you can configure this directly from the web user interface. Your backed-up data will then be completely encrypted before being transferred to the backup target. The encryption is achieved using a predefined encryption key. This key is protected by a password defined when creating the key, and with which the key must be securely retained, as only with these is it possible to retrieve the backed up data.

To this end, open the Device backup page and from there select the Backup keys page. Here you can create a new encryption key. When entering the password be sure to use a sufficiently complex character string – the longer and more complex your password, the harder it is for an attacker to decrypt your key and thus your backup.

cma de backup key new 2

Once you have created your key, download it and retain it in a secure location.

cma important

An encrypted backup can only be restored with the encryption key and its corresponding password.

Now, from the Device backup edit the backup job that is to create the encrypted backups – there activate the Encryption item and select the freshly-created encryption key.

cma de backup job edit encrypt 2

Once you have confirmed the dialogue, the next backup will be automatically encrypted.

9.6. Compression

It is possible to compress the data during the copy procedure. This can be useful if you need to save bandwidth or if space on the target system is limited.

But please be aware however that the compression requires noticeably more CPU time and therefore the backup procedure will take longer. As a rule it is advisable not to activate compression.

cma tip

Uncompressed backups are first supported from Checkmk-version 1.2.8p5. If you run monitoring sites with older versions, you must activate compression for the complete backup.

9.7. Recovery

Using the web user interface’s built-in functions you can only make a complete restore. Restoring individual data sets via the web interface is not provided. This is nevertheless possible via the command line and by manually unpacking from the backup.

If you wish to restore a complete backup on a currently running appliance, select the Restore item on the Device backup page, and on the next page select the backup target from where you want to source the backed-up data. Once the backup target has been selected a list of all of its available backups will be shown.

cma de backup restore list 2

Next, click on the arrow beside the backup data you wish to use and the restore will initiate – and following confirmation of a security query the restore will start.

While the restore is running you can view its progress by refreshing the Restore page that will be automatically displayed.

cma de backup restore log 2

At the end of the restore the appliance will automatically restart – following this new start the restore will be complete.

Disaster recovery

If you need to completely restore an appliance the disaster recovery runs the following steps:

  • You have an appliance with the factory default configuration (a new, identical appliance, or an appliance that has been reset to the factory default).

  • Ensure that the firmware version matches that of the backup.

Configure the following minimum settings on the console:

  • Network settings.

  • Access to the web interface.

In the web interface, configure:

  • the backup source from which you wish to restore.

  • for an encrypted backup upload the security key.

Now start the restore as described in the preceeding chapter.

9.8. Monitoring

From Checkmk version 1.4.0i1, for every configured backup job the Service Discovery on the appliance has a new service: Backup [JOB-ID]. This service notifies of potential problems with the backup, and displays useful values such as size and duration.

9.9. Special features with clusters

The complete configuration of the backups, including the encryption keys will be synchronised between the cluster nodes. The cluster nodes run the backups separately, and likewise save separate directories for their backups on the backup target.

The active cluster node backs up the complete appliance including the data from the data file system and from the monitoring site. The inactive cluster node saves only its local appliance configuration.

Thus, when restoring a backup, only an active cluster node’s backup can restore the monitoring sites.

10. Mounting network file systems

10.1. Introduction

If for example, you wish to make a backup on a shared resource, you must first configure the required network file system.

The network file system (NFS Version 3), the Windows Shared Resources (Samba or CIFS) and SSHFS (SFTP) are currently supported.

10.2. Mounting a network file system

In the web user interface’s main menu select the item Manage mounts and from here create a new file system. Enter an ID that will later be used in devices to identify the file system.

cma de mount new 2

Next select if and how the file system is to be mounted. Recommended is automatic mounting when accessed and, respectively, automatic unmounting when inactive.

Next configure the type of share to be mounted, and finally, depending on this, the necessary settings for mounting the share - for example the file server’s network address and the exported file path in the case of NFS.

Once saved the newly-configured file system and its current status can be viewed in the file system management. By clicking on the plug icon you can manually mount the file system to test that the configuration is correct.

cma de mount list 2

If there are problems you may possibly find error messages in the system log.

11. Failover cluster

11.1. Basics

You can combine two Checkmk appliances into a failover cluster. All configurations and data are synchronized between the two devices. The devices that are connected as a cluster are also called nodes. One of the nodes in the cluster assumes the active role, i.e. performs the tasks of the cluster. Both nodes continuously exchange information about their status. As soon as the inactive node recognizes that the active node can no longer fulfill its tasks – due to a failure for example – the inactive node takes over the tasks and becomes the active node.

The failover cluster is there to increase the availability of your monitoring installation by protecting a device or individual components against hardware failures. The clustering is not a substitute for data backups.

The cluster ensures a shorter downtime in the following situations:

  • There are two servers: one (active) server performs tasks, such as Monitoring, and the other (inactive) server simply checks that the first server is fulfilling its tasks.

  • If the active server can no longer access the network, it cannot perform its tasks (for example, Monitoring).

  • The inactive server notices this and takes over the tasks automatically.

  • The active server becomes inactive and the inactive server is now active – thus swapping their roles.

  • The server that is now active and performing the monitoring has also taken over the resources.

  • If you carry out a firmware update you can update the nodes individually. While one node is being updated the other node will continue to perform the monitoring.

11.2. Prerequisites

In order to build a cluster you will first need two compatible Checkmk appliances. The following models can be clustered with one another:

  • 2x Checkmk rack1

  • 2x Checkmk rack4

  • 2x Checkmk rail2

  • 2x Checkmk virt1

  • 1x Checkmk rack1 and 1x Checkmk virt1

In addition, the two devices must use a compatible firmware, and at least version 1.1.0.

The devices must be wired with at least two mutually-independent network connections. It is recommended to use as direct a connection as possible between the devices, and to make a further connection over your LAN.

To increase the availability of network connections, you should – instead of using two connections via individual network connectors – create a bonding configuration that uses all four of the Checkmk rack1’s network connectors. Use the LAN1 and LAN2 interfaces for the connection to your network, and the LAN3 and LAN4 interfaces for the direct connection between the devices.

Virtual machines: If you want to perform the cluster function with two ‚Checkmk virt1’ appliances and VirtualBox, for example for testing, you should do without the bonding configuration and with a total of four network interfaces - this becomes a gamble under VirtualBox, if it works at all. Even if both VMs run on the same machine and thus there are no multiple hardware lines, you still need two virtual network interfaces to be able to set up a separate channel to synchronize the data later. You can easily add these in the VirtualBox management interface of the virt1 machine.

So instead of setting up the bonding as shown below, simply activate the unused second network interface - but not for your normal LAN subnet (e.g., 192.168.178.0/24) – but a separate subnet (e.g. 192.168.100.0/24). For the actual clustering you simply select your two individual interfaces instead of the bundled bonding interfaces.

11.3. Migration of existing installations

Devices that were delivered and initialised with the firmware version 1.1.0 or higher can be clustered without migration.

Devices initialised with earlier firmware must first be updated to version 1.1.0 or higher. The device’s factory settings then need to be restored, preparing the device for clustering. Please note that, in order to prevent data loss during this procedure, you must first back up your data from the device and then restore it.

11.4. Configuration of the cluster

This guide assumes that you have already pre-configured both devices so that the web interface can be opened with a web browser.

Before actually setting up the cluster you must first prepare both devices. This mainly involves adapting the network configuration to fulfill clustering requirements (see prerequisites).

The configuration of a cluster with two Checkmk rack1 is shown in the following. A cluster is built which looks as shown in the diagram below.

The interface designations LAN1, LAN2 etc., used in the diagram correspond to the designations of the physical interfaces on the device. In the operating system, LAN1 corresponds to the device eth0, LAN2 to the device eth1 etc.

cluster

This configuration complies with the recommendations for the clustering of two Checkmk rack1s. You can of course use IP addresses suitable for your environment. Make sure however that the internal cluster network (bond1 in the diagram) uses a different IP network to the ‚external‘ network (bond0 in the diagram).

Network configuration

Open the web interface of the first node, select Device settings and Network settings at the top. You will now be on the network settings configuration page. There are two modes available to you here. The Simple Mode which you can only use to configure your device’s LAN1 is activated by default.

cma de net 1 2

The Advanced mode is required for clustering. In order to activate this mode click on the Advanced mode button at the top and confirm the security prompt.

On the following page you will see all of the network interfaces available in the device. Only the interface eth0 (corresponding to LAN1) – enp0s17 in the screenshot will currently have a configuration which was applied by the Simple Mode.

cma de net 2 3

Now create the first bonding interface bond0 by clicking on Create Bonding. For this purpose enter into the dialogue that follows all data as shown in the diagram below, and confirm the dialogue with Save.

cma de net 3 2

Now create the second bonding interface bond1 with the appropriate configuration.

cma de net 4 2

After you have created the two bonding interfaces, in the network configuration dialogue you will be able to review all of the settings for the network interfaces …​

cma de net 5 a

… and likewise the bondings:

cma de net 5 b

Once you have successfully completed all configuration steps, make the settings effective by clicking on Activate Changes. The new network settings will then be loaded. After few seconds the network interface configuration will look like this, with OK Statuses for the interfaces:

cma de net 6 a

And the bonding configuration will look like this:

cma de net 6 b

Now, with the appropriate settings, repeat the configuration of network settings on your second device.

Host names

Devices to be connected in a cluster must have different host names. You can specify these now in the device settings. In our example, we configure node1 as a host name on the first device and node2 on the second device.

Connecting the cluster

Having completed preparations you can now continue setting up the cluster. To do this open the Clustering module in the main menu of the first device (here node1) in the web interface and click on Create Cluster.

Now enter the appropriate configuration in the cluster creation dialogue and confirm the dialogue with Save. If you require more information about this dialogue, click on the icon beside the Checkmk logo in the top right-hand corner. Context help will then appear in the dialogue explaining the individual options.

cma de cluster 1 2

On the following page you can connect the two devices to form a cluster. To do this you need to enter the password of the web interface of the second device. This password is used once to establish the connection between the two devices. Then confirm the security prompt if you are sure that you want to overwrite the data of the target device with the IP address displayed.

cma de cluster 2 2

Once this connection is successful, cluster setup is commenced. You can have the current status displayed on the cluster page. As soon as the cluster has been successfully built, the synchronisation of monitoring data will start from the first to the second node. While this synchronisation is still taking place, all resources – including any monitoring sites you may have – will be started on the first node.

cma de cluster 4 2

From now on you can, using the cluster IP address (here 192.168.178.110), access the resources of the cluster (e.g., your monitoring sites), regardless of the node by which the resources are currently being held.

11.5. The state of the cluster

When the first synchronisation is complete, your cluster will be fully operational. You can view the state at any time on the cluster page.

cma de cluster 5 2

Using the status screen on the console you can also view the current state of the cluster in a summarised form in the Cluster box. The role of the respective node is shown after the current status with (M) for the main host und (S) for the subordinate host.

cma de cluster 6 2

11.6. Special cases in the cluster

Access to resources

All requests to the monitoring sites (e.g. web interface access) as well as incoming messages (e.g. SNMP traps or syslog messages to the event console or requests to Livestatus) should normally always be sent via the cluster IP address.

Only in exceptional cases (e.g. diagnostics or updates of a particular node) should you need to access the individual nodes directly.

Device settings

The settings (e.g. time synchronisation or name resolution settings) that have been made independently on the individual devices until now are synchronised between the two nodes in the cluster.

However, you can only execute these settings on the node that is active at the time. The settings are locked on the inactive node.

There are some device-specific settings, (e.g. those of the management interface of the Checkmk rack1) which you can adapt to the individual devices at any time.

IP addresses or host names of the nodes

To be able to edit the IP configuration of the individual nodes, you must first disable the connection between the nodes. To do this click on Disconnect cluster on the cluster page. You can then adapt the desired settings via the web interface of the individual nodes.

Once you have made the adjustments you must now select Reconnect cluster on the cluster page. If the nodes can be successfully reconnected the cluster will resume operation after a few minutes. You can see the status on the cluster page.

Administering Checkmk versions and monitoring sites

The monitoring sites and Checkmk versions are also synchronised between the two nodes. You can only modify these in the web interface of the active node.

If, to do this, you also access the cluster IP address directly, you will always be referred to the device with which you can configure these things.

11.7. Administrative tasks

Firmware updates in the cluster

The firmware version of a device is not synchronised in cluster operation. The update is thus carried out for each node. You have the advantage however that one node can continue performing the monitoring while the other node is updated.

When updating to a compatible firmware version, you should always proceed as follows:

First open the Clustering module in the web interface of the node to be updated.

Now click on the heart symbol in the column of this node and confirm the security prompt that follows. This will put the node into maintenance state.

Nodes that are in maintenance state release all resources currently active on the node, upon which the other node takes control of them.

While a node is in maintenance state, the cluster is not failsafe – so if the active node is now switched off, the inactive node in maintenance state will not take control of the resources. If you now additionally put the second node into maintenance state, all resources will be shut down. These will only be reactivated when a node is taken out of its maintenance state. You must always remove the maintenance state again manually.

If the cluster page shows the following you will see that the node is in maintenance state.

cma de cluster 7 2

You can now perform the firmware update on this node, and likewise on standalone devices.

After you have successfully performed the firmware update, open the cluster page once more and remove the maintenance state of the updated device. The device will then automatically merge into cluster operation, upon which the cluster again becomes fully functional.

cma de cluster 5 2

It is recommended to run the same firmware version on both nodes. You should therefore next repeat the same procedure for the other node.

11.8. Disbanding clusters

It is possible to disband the nodes from a cluster and continue running them separately. When doing so you can continue using the synchronised configuration on both devices, or reset one of the devices to factory settings and reconfigure it for example.

You can remove one or both nodes from the cluster during operation. If you wish to use both nodes, you must ensure that the data synchronisation is in good working order beforehand. You can verify this on the cluster page.

In order to disband a cluster, click on Disband Cluster on the cluster page of the web interface. Read the text of the confirmation prompt that follows. In the different situations, this text contains information as to the state the respective device is in following the disconnection.

cma de cluster 8 2

The disconnection of the devices must be carried out on both nodes separately, so that both devices can be run separately in future.

If you only wish to use one of the devices in future, disconnect the cluster on the device you intend to continue using and then restore the factory settings on the other device.

Once you have disconnected a node from the cluster the monitoring sites will not be started automatically. If you wish to start the monitoring sites, you will need to do so via the web interface.

Exchanging a device

If the hard drives of the old device are in good order, you can take these from the old device and insert them into the new device, wiring the new device in exactly the same way as the old device was wired and then switching it on. After starting the new device will merge into the cluster in the same way as the old device.

If you want to completely replace an old device with a new one, you should proceed in the same way as when disbanding the cluster completely (see previous chapter). To do this select one of the previous devices, disconnect this device from the cluster and create a new cluster with this device and the new device.

11.9. Diagnostics and troubleshooting

Logging

Cluster administration is a largely automatic function, whereby automatic processes on the nodes decide for which device which resources are to be started and stopped on. This behaviour is logged in the form of detailed log entries. You can access these entries from the cluster page by pressing the button Cluster Log.

Please note that these entries – just like the other system messages – are lost when restarting the device. If you would like to keep the messages for longer you can download the current log file over your browser or set up a permanent forwarding of log messages to a syslog server.

12. SMS notifications

12.1. Hardware

It is possible to attach a GSM modem to the device in order to have SMS notifications sent over it by Checkmk (in the event of critical problems for example).

At the moment it is not possible to order a UMTS/GSM modem together with your appliance nor later as an accessory. But there are several modems like the MTD-H5-2.0, which are compatible to the appliance.

12.2. Starting up the modem

In order to put the modem into operation you must insert a functioning SIM card, attach the modem to a free USB connector on your appliance using the enclosed USB cable, and connect the modem to the mains using the enclosed power adapter.

As soon as this has been done the device will automatically detect the modem and set it up. Open the device’s web interface and select the Manage SMS module. The current state of the modem as well as the connection with the mobile phone network will be displayed on this page.

appliance usage manage sms

If you need to enter a PIN to use your SIM card, you can specify this PIN under SMS settings.

appliance usage configure sms

12.3. Diagnostics

If sent messages do not reach you, you can view all sent or non-sent messages and messages awaiting sending on the page Manage SMS. The entries in these lists will be kept for a maximum of 30 days and then automatically deleted.

It is possible, via the menu item Send test SMS, to send a test SMS to a number of your choice. The telephone number must be entered without leading zeros and without a leading plus sign, e.g. 491512345678 for a mobile phone number in Germany.

cma de sms 3

You will find further information on possible SMS sending errors in the SMS Log.

13. Administering RAID on the racks

13.1. The RAID system

Your rack has two hard drive bays on the front. These are marked with numbers 1 and 2. The hard disks installed here are interconnected in a RAID-1 array (mirror) so that your data is stored redundantly on both hard disks. If one of the hard disks fails the data is still available on the second hard disk.

13.2. Administration in the web interface

You can view the state of the RAID in your device’s web interface. To do this select the item RAID-Setup in the main menu of the web interface. This screen also gives you the option to repair the RAID if necessary.

cma de rack1 raid ok

13.3. Exchanging a defective hard drive

If a hard drive is detected as being defective, this will be displayed in the web interface with defective. On the actual device – depending on the nature of the error – this will be shown by a blue flashing LED at the hard drive bay.

cma de rack1 raid broken

Moving the small lever on the left-hand side of the bay will unlock the fixing mechanism, enabling you to pull the frame out of the housing together with the hard drive. You can now loosen the screws on the underside of the frame and remove the defective hard drive. Now mount the new hard drive into the frame and push the frame back into the free bay of the device.

If the device is switched on while you are exchanging the hard drive, the RAID rebuild will start automatically. You can view this procedure’s progress in the web interface.

cma de rack1 raid repair

Failsafe operation is only restored once the RAID has been completely repaired.

The hard drive must be at least the same size as the RAID itself. You can verify this in the RAID status view.

13.4. Both hard drives defective

If the device detects that both hard drives are defective or have been removed from the device a restart is automatically triggered.

14. Management interface in the rack

Your rack has a built-in management interface that allows network access to the device even when it is not powered on. You can use the web interface of this management interface, for example, to control the device if it is not switched on or no longer accessible, and to remotely control the local console.

If you would like to use the management interface, you must first connect the dedicated IPMI LAN connector with your network.

cma important

For security reasons we recommend connecting the IPMI LAN with a dedicated management network where possible.

The management interface is delivered deactivated. You can activate and configure it via the Management Interface setting in the device settings.

appliance usage management interface in the rack

You must assign a separate IP address for the management interface and specify dedicated access data for the access to the management interface.

Once you have saved these settings you can open the management interface’s IP address with your web browser and log in there using the access data you just specified.

15. Diagnostics

15.1. Logs

Despite careful tests, it cannot be altogether ruled out that unexpected errors may occur, which are difficult to diagnose without looking at the operating system.

One option is to have the log entries that are generated on the system sent via syslog to a syslog server. However the log entries of the individual monitoring sites are not processed via syslog, meaning they are not forwarded and can only be viewed on the device.

In order to make diagnostics on the device easier there is a view that displays the device’s various log files. You can go to this view by clicking on the Log Files menu item in the web interface’s main menu.

cma webconf logs 2

You can select the device’s logs and view their current content here.

cma note

The system log is reinitialized each time the device is started up. If you would like to keep the log entries you must send them to a syslog server.

It is also possible to view the system log on the local console. The latest entries from the system log are displayed on the second terminal. You can access this terminal via the key combination CTRL+ALT+F2. All kernel messages are displayed on the third terminal. In the case of hardware problems, you will find the relevant messages here. This terminal can be accessed via the key combination CTRL+ALT+F3. The key combination CTRL+ALT+F1 will take you back to the status screen.

15.2. Available Memory

The system memory of the device is available to your monitoring sites, reduced by the amount of memory which is needed by the system processes of the Checkmk appliance.

To provide a stable system platform a fixed amount if memory is reserved for the mandatory system processes. The exact amount of reserved memory depends on your device configuration:

  • Standalone device (no cluster configuration): 100 MB

  • Clustered: 300 MB

If you want to know exactly how much memory is available to your monitoring sites and how much is currently being used, you can monitor your device using Checkmk. After a service discovery the host automatically monitors a service User_Memory which shows you the current and historical values.

In case your you monitoring sites are trying consume more memory than available, one of the processes of the monitoring sites is automatically killed. This is done by standard mechanisms in the Linux Kernel.

16. Service and support

16.1. Manual

If you encounter any problems during start-up or operation please consult this manual first.

16.2. Internet

You can get up-to-date support information from our website. You will find the latest version of the documentation here as well as general information which is regularly updated and more detailed than this manual.

16.3. Firmware

You will find the latest firmware versions on our website. You can access this firmware using the access data for your current support contract.

16.4. Hardware support

In the event of hardware failure please contact us by email at cma-support@checkmk.com, or call us on +49 89 99 82 097 - 20. The problem will be handled by the distributor directly and in accordance with the maintenance agreed upon.

16.5. Software support

In the case of a software fault – whether firmware or Checkmk monitoring software – please contact us via your company’s own support address. Support will be provided based on the agreed support contract.

On this page