1. OMD - The Open Monitoring Distribution
The Checkmk Monitoring System uses the Open Monitoring Distribution (OMD). Founded by Mathias Kettner, OMD is an open source project which revolves around the convenient and flexible installation of a monitoring solution made up of various components. The abreviation OMD might already be familiar to you as part of the RPM/DEB-Package installation.
An OMD-based installation is distinguished by a number of characteristics:
the ability to run multiple instances (or ‘sites’) in parallel
the ability to operate instances with differing versions of the monitoring software
an intelligent and easy to operate upgrade/downgrade-mechanism
uniform file paths — regardless of which Linux-platform is installed
a clear separation of data and software
a very simple installation — with no dependence on third-party software
a perfect preconfiguration of all components
2. Creating instances (or ‘sites’)
Perhaps the best thing about OMD is that it can manage any chosen number of monitoring instances on a server. These can also be referred to as Sites. Each ‘instance’ is a self-contained monitoring system which runs independently of the others.
An instance always has a distinct name, specified at its creation. This name is the same as that of the Linux-user which is created at the same time. The instance’s name conforms to the same conventions as user names under Linux.
The creation is performed with the
omd create command.
This must be executed as
Creating temporary filesystem /omd/sites/mysite/tmp...OK Updating core configuration... Generating configuration for core (type cmc)...Creating helper config...OK OK Restarting Apache...OK Created new site mysite with version 1.6.0. The site can be started with omd start mysite. The default web UI is available at http://myServer/mysite/ The admin user for the web applications is cmkadmin with password: lEnM8dUV For command line administration of the site, log in with 'omd su mysite'. After logging in, you can change the password for cmkadmin with 'htpasswd etc/htpasswd cmkadmin'.
When creating the
cmkadmin user a password
will be randomly-generated and issued.
What takes place during the creation of an instance ‘
An operating system user
mysite, and a group
mysitewill be created.
A new home directory
/omd/sites/mysitewill be created and assigned.
This home directory will be populated with configuration files and sub-directories.
A basic configuration will be created for the new instance.
Important: Avoid using a name which is already allocated in another service. A duplicated allocation can cause problems.
2.1. User and group IDs
In some cases it is also desirable to specify the user/group ID of the new user to be created.
This is performed with the
-g options, e.g.:
root@linux# omd create -u 6100 -g 180 mysite
An overview of the further options can be shown with
omd create --help.
The most important options are:
The new user will be created with the User-ID ‘UID’.
The new user’s group will be created with the Group-ID ‘GID’.
OMD assumes that the new user already exists, and does not create it.
The new site’s
3. Instance User (Site User)
The further administration of the instance is always best performed with
the rights of the newly-created user. Switching users is done with
root@linux# su - mysite
Please note that the ‘minus sign’ following the
su is essential.
It ensures that switching users processes ALL of the operations that take
place during a normal login. In particular, all environment variables will be
correctly set, and your session will continue as
mysite in the
home directory of the
As an instance-user you can execute all important operations affecting this site.
Entering the instance ID then of course becomes unnecessary when issuing the
4. Starting and stopping instances
Your instance is now ready to be started — which can be done as
omd start mysite. It is fundamentally better though to work with
the instance as the instance user (site user):
OMD[mysite]:~$ omd start Starting Livestatus Proxy-Daemon...OK Starting rrdcached...OK Starting CMC Rushing Ahead Daemon...OK Starting Check_MK Micro Core...OK Starting dedicated Apache for site mysite...OK Initializing Crontab...OK
Unsurprisingly, stopping is achieved with
OMD[mysite]:~$ omd stop Removing Crontab... Stopping dedicated Apache for site mysite....OK Stopping Check_MK Micro Core...killing 15085...OK Stopping CMC Rushing Ahead Daemon...killing 15071....OK Stopping rrdcached...waiting for termination...OK Stopping Livestatus Proxy-Daemon...killing 15049....OK
Starting and stopping an instance is nothing other than starting or stopping a collection of services. These can also be individually managed by specifying the name of the service, e.g.:
OMD[mysite]:~$ omd start apache Starting dedicated Apache for site mysite...OK
The names of the various services can be found in the
~/etc/init.d directory. Please note the leading tilde — this
represents the home directory for the instance-user (the site-directory).
This is not the same as
stop, there are also the
Reloading Apache is, for example, always necessary following a manual change
to the Apache-configuration. Please note that this does not apply to the global
Apache-process on the Linux-server, but rather the site’s own dedicated
OMD[mysite]:~$ omd reload apache Reloading dedicated Apache for site mysite....OK
In order to be able to maintain an overview of state of the site following all
of the starts and stops, simply use
OMD[mysite]:~$ omd status liveproxyd: stopped rrdcached: running cmcrushd: running cmc: stopped apache: running crontab: running ----------------------- Overall state: partially running
5. Deleting instances
Deleting an instance is as easy as creating one — with the
command. The instance will first be automatically stopped.
root@linux# omd rm mysite omd rm mysite omd rm mysite PLEASE NOTE: This action removes all configuration files and variable data of the site. In detail the following steps will be done: - Stop all processes of the site - Unmount tmpfs of the site - Remove tmpfs of the site from fstab - Remove the system user <SITENAME> - Remove the system group <SITENAME> - Remove the site home directory - Restart the system wide apache daemon (yes/NO): yes
It goes without saying that this action also deletes all of the instance’s data!
If you are no fan of confirmation prompts, or wish to perform the deletion
as part of a script, the deletion can be forced with the
Attention: here the
-f must be placed before the
root@linux# omd -f rm mysite
6. Configuring the components
As already mentioned, OMD is a system that integrates multiple software components
into a monitoring system. In so doing, some components are optional, and for some
there are alternatives or different operational settings. All of this can be
comfortably configured with
omd config. There are also scripting and
interactive modes. This latter can be simply opened by a site-user with:
OMD[mysite]:~$ omd config
If you alter a setting, the OMD will be immediately notified that the site must be stopped (if that is not already the case), and does this as needed:
Please don’t forget to restart the site following the completion of the work.
omd config will NOT do this for you automatically.
Those who don’t like the interactive mode, or prefer to work with scripts,
can set the individual variables using commands. For this there is the
omd config set command. The following example sets the
OMD[mysite]:~$ omd config set CORE cmc
As always, this can be performed as
root if the site’s name is added
as an argument:
root@linux# omd config mysite set CORE cmc
The current configuration of all variables can be viewed using
omd config show:
OMD[mysite]:~$ omd config show APACHE_MODE: own APACHE_TCP_ADDR: 127.0.0.1 APACHE_TCP_PORT: 5000 AUTOSTART: off CMCRUSHD: on CORE: cmc [...]
6.2. Commonly used settings
There are numerous settings in
omd config. The most important are:
Selection of the monitoring core. As well as the Checkmk Micro Core (CMC), the standard Nagios core is still available. In earlier versions this was set as the default.
Activates the Checkmk Event Console, with which the syslog messages, SNMP-Traps and other events can be processed
Enterprise Editions: Activates the notification spooler. Firstly, this forwards remotely-generated notifications to a central system. This will require mknotifyd on the central and remote sites respectively. An asynchronous delivery of messages can additionally be performed using this.
Set this to
Allows external access to the status data for this site. A distributed monitoring can be constructed with this. The status of this instance can be incorporated into the central instance. Please only activate it in a secure network.
7. Copying and renaming instances
It is sometimes useful to create a copy of an instance, for testing purposes
or for the preparation of an update. Of course one could simply copy the
/omd/sites/alt directory to
That will however not work because:
Many configuration files include the site’s name.
In addition, at numerous locations there are absolute data paths with the
Not least, a user and a group with the site’s name to which everything belongs, must be available.
To simplify the copying of an instance, there is the
omd cp command,
which takes all of these factors into consideration. Its use is very simple.
As argument simply enter the name of the existing site followed by the name
of the new one. For example:
root@linux# omd cp alt neu
The copy can only work if:
The site has been stopped.
No processes that belong to the instance user are running.
The above points ensure that at the time of the copy the instance is in a consistent state and cannot change during the action.
7.1. Limiting data volume
If a large number of hosts are being monitored, the volume of data to be copied
can be quite substantial. The greater part of this is the performance data which
is stored in RRD-files. But the log files containing historic events can also
produce larger data volumes. If the history is not required (for example,
if only testing is being performed), these can be omitted from the copy.
In such cases the following options can be added to
The copy will exclude performance data (RRDs)
All log files and remaining historic data will be excluded
This is an abreviation of `--no-rrds --nologs `
The order of the options is important:
root@linux# omd cp --no-rrds alt neu
7.2. Renaming instances
Renaming an instance is performed with the
omd mv command.
This functions similarly to the copy command and has the same prerequisites.
The options to restrict the data volume are not available since the data is only
being moved to another directory and is not being duplicated. For example:
root@linux# omd mv alt neu
7.3. Further options for cp and mv
Both operations will create new Linux-users in exactly the same way as
create does, thus some of the options for
omd create are also
available for use:
The new user will be created with the User-ID UID.
The new user’s group will be created with the Group-ID GID.
OMD assumes that the new user already exists and does not create it.
The new site’s
8. Showing changes with omd diff
When creating a new Checkmk-instance the
omd create command populates
etc directory with numerous predefined configuration files.
A number of directories will also be created under
Now it is probably the case that in the course of time a number of the files
will have been customised. When after a time you wish to determine which files
are no longer in the condition as originally supplied, the
command can provide the answer. Amongst other things, this is useful before an
update of Checkmk, since your changes could conflict with changes in
the default files.
In a request without additional arguments, all changed files will be listed:
OMD[mysite]:~$ omd diff * Deleted var/log/nagios.log * Changed content var/check_mk/wato/auth/auth.php * Changed content etc/htpasswd ! Changed permissions etc/htpasswd * Changed content etc/diskspace.conf * Changed content etc/auth.secret * Changed content etc/apache/apache.conf
You can also enter a query for a specific directory:
OMD[mysite]:~$ omd diff etc/apache * Changed content etc/apache/apache.conf
If you wish to see the changes in detail, simply enter the complete file name:
OMD[mysite]:~$ omd diff etc/apache/apache.conf --- /dev/fd/63 2017-01-24 09:14:46.248968199 0100# [green]#++ /omd/sites/mysite/etc/apache/apache.conf 2017-01-24 09:12:37.705355164 +0100 @@ -66,8 +66,8 @@ StartServers 1 MinSpareServers 1 MaxSpareServers 5 -ServerLimit 128 -MaxClients 128 +ServerLimit 64 +MaxClients 64 MaxRequestsPerChild 4000 ###
9. Backing-up and restoring instances
9.1. Backing-up instances with omd backup
The site management in Checkmk has a built-in mechanism for backing up and
restoring Checkmk-instances. The
omd backup and
commands are the basics for packing all of an instance’s data into a
tar archive, and respectively, extracting that data for a restore.
From Version 1.4.0 Checkmk additionally uses the Backup WATO-module which makes a backup and restore possible without the command line, and which also enables the setting-up of regular backup jobs.
Backing up an instance with
omd backup does not require
root-permissions. An instance user can perform this.
Simply enter as an argument the name for the backup file to be created:
OMD[mysite]:~$ omd backup /tmp/mysite.tar.gz
Please note however:
The created file type is a gzip-compressed tar archive. Therefore use
.tgzas the file extension.
Do not store the backup in the instance directory, since this will of course be completely backed up – thus every subsequent backup will contain a copy of ALL of its predecessors!
If the backup’s target directory is not writable for an instance user,
the backup can otherwise be performed as a
In this case an additional argument is always required specifying the name
of the instance to be backed up:
root@linux# omd backup mysite /var/backups/mysite.tar.gz
The backup contains all of the instance’s data — except for the volatile data
tmp/. With the
tar tzf command one can easily have a
look at the file’s contents:
OMD[mysite]:~$ tar tvzf /tmp/mysite.tar.gz | less lrwxrwxrwx mysite/mysite 0 2017-01-24 09:02 mysite/version -> ../../versions/2017.01.16.cee drwxr-xr-x mysite/mysite 0 2017-01-24 09:12 mysite/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/styles/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/scripts/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/templates/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/gadgets/
9.2. Backup without history
The lion’s share of an instance’s data is the performance data
retained in the RRDs. The monitoring history can also be very large. If neither
of these are absolutely required, with the following options the history data
can be omitted, thus making the backup smaller and faster running.
The options must be coded after the word
Omits backing up the RRD-databases (performance data)
Omits the monitoring history stored in the log files
An abreviation of
OMD[mysite]:~$ omd backup -N /tmp/mysite.tar.gz
9.3. Backing up a running instance
A backup does not require the instance to be stopped, and therefore can be
executed while the system is running. In order to ensure a consistent condition
of the RRDs used for recording the performance data,
omd backup command automatically alters the Round-Robin-Cache
to a mode with which the running updates are written only to the journal,
and no longer to the RRDs. The journal files are the last to be backed up — thus it can be achieved that as much as possible of the performance data that
has been generated during the backup is also included in the backup.
The restoring of a backup is as simple as the backup itself.
omd restore command restores an instance from a backup.
This is even possible for a user. The instance must be stopped for this
procedure. The instance will not be newly-generated (which would require
root-permissions), rather it will be completely emptied
and then refilled:
OMD[mysite]:~$ omd stop OMD[mysite]:~$ omd restore /tmp/mysite.tar.gz
Following the restore the instance can be restarted:
OMD[mysite]:~$ omd start
A restore can also be performed by a
root-user. If an instance with the
same name already exists, this must first be deleted. This can be performed
either with an
omd rm, or by simply including
--kill additionally ensures that the existing instance is first
stopped. It is not necessary to use the instance’s name with
restore, since this is contained in the backup:
root@linux# omd restore --reuse --kill /var/backup/mysite.tar.gz root@linux# omd start mysite
When operating as
root, you can restore the instance with a different
name from that in the backup. Include the desired alternative name as an
argument following the
root@linux# omd restore mysite2 /var/backup/mysite.tar.gz Restoring site mysite2 from /tmp/mysite.tar.gz... * Converted ./.modulebuildrc * Converted ./.profile * Converted .pip/pip.conf * Converted etc/logrotate.conf
The long list of conversions found here has the same function as for the renaming of instances described earlier: The instance’s name is included in numerous configuration files, and with this these occurrences will be replaced automatically by the new name.
9.5. Live migration of instances with backup & restore
omd backup and
omd restore commands can — in the good old
Unix tradition — instead of files, also work with the standard input/output.
Instead of a data path for the tar file, simply enter a hyphen (
In this way a pipe can be constructed and the data ‘streamed’ directly to another computer without requiring intermediate files. The larger the backup, the more advantageous this will be since no temporary space in the backed up server’s file system will be needed.
The following command backs up an instance to another computer using SSH:
root@linux# omd backup mysite - | ssh user@otherserver "cat > /var/backup/mysite.tar.gz"
If you want to reverse the SSH-access — by which you prefer to log in TO the Checkmk-instance FROM the backup server — that is also possible, as shown in the following example. For this, first an SSH-Login as an instance user must be permitted:
root@otherserver# ssh mysite@checkmkserver "omd backup -" > /var/backup/mysite.tar.gz
If you are clever, and combine the above with an
omd restore which
reads the data from the standard input, you can copy a complete,
running instance from one server to another — and without needing any
additional space for a backup file:
root@otherserver# *ssh mysite@checkmkserver "omd backup -" | omd restore - *
And now, the same procedure with a reversed SSH-access — but this time from the source system to the target system:
root@linux# omd backup mysite - | ssh root@otherserver "omd restore -"