Checkmk
to checkmk.com

1. OMD - The Open Monitoring Distribution

The Checkmk monitoring system uses the Open Monitoring Distribution (OMD). Founded by Mathias Kettner, OMD is an Open Source project which revolves around the user-friendly and flexible installation of a monitoring solution made up of various components. The abreviation OMD might already be familiar to you as part of the RPM/DEB-Package installation.

An OMD-based installation is distinguished by a number of characteristics:

  • The ability to run multiple monitoring sites in parallel

  • The ability to operate sites with differing versions of the monitoring software

  • An intelligent and convenient mechanism for updating the software

  • Uniform file paths — regardless of which Linux-platform is installed

  • A clear separation of data and software

  • A very simple installation — with no dependence on third-party software

  • A perfect preconfiguration of all components

OMD is managed on the command line, using the omd command — more precisely, a set of omd commands for the various actions used for the management of the monitoring sites, for example, omd create for creating a site. The most important omd commands are presented in this article.

The first command is omd help, which displays an overview of the available omd commands. You can get help for any of these commands by adding the --help option after the command, e.g. omd create --help. The two dashes before help are important here, because without them omd create help would have already created your first site with the name help.

2. Creating sites

Perhaps the best thing about OMD is that it can manage any number of monitoring sites simultaneously on a single server. Each site is a self-contained monitoring system which runs independently of the others.

A site always has a distinctive name, specified at its creation. This name is the same as that of the Linux user which is created at the same time. The site’s name conforms to the same conventions as user names under Linux.

A creation is performed with the omd create command. This must be executed as the root user:

root@linux# omd create mysite
Adding /opt/omd/sites/mysite/tmp to /etc/fstab.
Creating temporary filesystem /omd/sites/mysite/tmp...OK
Updating core configuration...
Generating configuration for core (type nagios)...
Precompiling host checks...OK
Executing post-create script "01_create-sample-config.py"...OK
Restarting Apache...OK
Created new site mysite with version 2.1.0p17.cre.

  The site can be started with omd start mysite.
  The default web UI is available at http://linux/mysite/

  The admin user for the web applications is cmkadmin with password: YzxfoFZh
  For command line administration of the site, log in with 'omd su mysite'.
  After logging in, you can change the password for cmkadmin with 'cmk-passwd cmkadmin'.

When creating the cmkadmin user a password will be randomly-generated and issued.

What takes place during the creation of a site named mysite?

  • An operating system user mysite, and a group mysite will be created.

  • A new home directory /omd/sites/mysite will be created and assigned. This directory is also referred to as the site directory.

  • This home directory will be populated with configuration files and sub-directories.

  • A basic configuration will be created for the new site.

Note: It is not possible to create a new site with a name that is already assigned on the server as the name of a 'normal' user.

2.1. User and group IDs

In some cases it is also desirable to specify the user/group ID of the new user to be created. This is performed with the -u and -g options, e.g.:

root@linux# omd create -u 6100 -g 180 mysite

An overview of the further options can be shown with omd create --help. The most important options are:

-u UID

The new user will be created with the User-ID UID.

-g GID

The new user’s group will be created with the Group-ID GID.

--admin-password mypassword

The new user is created with the password mypassword — instead of the randomly generated one.

--reuse

OMD assumes that the new user already exists, and does not create it. The home directory of this user must be located below /omd/sites/ and must be empty.

-t SIZE

The new site’s temporary file system will be created with the SIZE value. SIZE has the suffix M (Megabyte), G (Gigabyte) or % (percentage of RAM). Example: -t 4G

3. Site user

You can execute the omd commands as root user or as site user. Under root you have more possibilities. For example, only root can create a site, which is logical, because of course first a site must be created before a user can be created for it. Since you can execute a command on root that applies simultaneously to all existing sites, you must include the name of the particular site you are interested in with the omd command.

Once the new site has been created, you should run any other omd commands only as the site user. As a site user you can execute all important operations affecting this site.

Switching users is done with su:

root@linux# su - mysite

Note that the minus sign following the su is essential. It ensures that switching users processes all of the operations that take place during a normal login. In particular, all environment variables will be correctly set, and your session will start as mysite in the /omd/sites/mysite site directory:

OMD[mysite]:~$

Once you are logged in as a site user, you usually don’t need to include a site name with omd commands, since such a command is applied only to the site you are logged in to.

If you have multiple Checkmk versions installed on your Checkmk server, the corresponding OMD version is also installed with each of these versions. This can result in a long list of software versions over time. Since omd commands can also differ between versions, it is sometimes interesting to know which OMD version you are currently working with.

  • As a site user, you always use the omd commands for the Checkmk version currently installed on the site, which you can display with omd version.

  • As a root user, you execute the commands for the default version that is also used when creating a site — this is usually the latest version installed on the server. You can display the default version with omd version and change it with omd setversion.

4. Starting and stopping sites

Your site is now ready to be started — which can be done as root with omd start mysite. It is however, fundamentally better to work with the site as the site user:

OMD[mysite]:~$ omd start
Creating temporary filesystem /omd/sites/mysite/tmp...OK
Starting agent-receiver...OK
Starting mkeventd...OK
Starting rrdcached...OK
Starting npcd...OK
Starting nagios...OK
Starting apache...OK
Starting redis...OK
Initializing Crontab...OK

Unsurprisingly, stopping is done with omd stop:

OMD[mysite]:~$ omd stop
Removing Crontab...OK
Stopping redis...killing 484382...OK
Stopping apache...killing 484371...OK
Stopping nagios...OK
Stopping npcd...OK
Stopping rrdcached...waiting for termination...OK
Stopping mkeventd...killing 484279...OK
Stopping agent-receiver...killing 484267...OK
Stopping 1 remaining site processes...OK

Starting and stopping a site is nothing more than starting or stopping a collection of services. These can also be individually managed by specifying the name of the service, for example:

OMD[mysite]:~$ omd start apache
Temporary filesystem already mounted
Starting apache...OK

The names of the various services can be found in the ~/etc/init.d directory. Note the tilde (~) prefixing the path name — this represents the home directory for the site user (the site directory). ~/etc/init.d and /etc/init.d are different directories.

Alongside start and stop, there are also the omd commands restart, reload and status. Reloading Apache is, for example, always necessary following a manual change to the Apache configuration:

OMD[mysite]:~$ omd reload apache
Reloading apache

Note that this does not apply to the global Linux server’s Apache process, but rather the site’s own dedicated Apache process.

In order to be able to maintain an overview of the state of the site following all of the starts and stops, simply use omd status:

OMD[mysite]:~$ omd status
agent-receiver: stopped
mkeventd:       stopped
rrdcached:      stopped
npcd:           stopped
nagios:         stopped
apache:         running
redis:          stopped
crontab:        stopped
-----------------------
Overall state:  partially running

5. Configuring the components

As already mentioned, OMD integrates multiple software components into a monitoring system. In so doing, some components are optional, and for some there are alternatives or different operational settings. All of this can be conveniently configured with the omd config command. There are also interactive and scripting modes.

5.1. Interactive configuration

As the site user you can simply call the interactive mode with:

OMD[mysite]:~$ omd config
Main menu of `omd config`.
In the omd config menu you navigate with the cursor and enter keys

As soon as you change a setting while the site is running, OMD will inform you that your site must be stopped first and does this as needed:

Hint when changing a setting while the site is running.
The configuration can only be changed when the site is not running

Don’t forget to restart the site following the completion of the work. omd config will not do this for you automatically.

5.2. Configuration via script mode

Those who don’t like the interactive mode, or prefer to work with scripts, can set the individual settings as variables via the command line. For this there is the omd config set command. The following example sets the AUTOSTART variable to off:

OMD[mysite]:~$ omd config set AUTOSTART off

This can be also performed as root if the site’s name is added as an argument:

root@linux# omd config mysite set AUTOSTART off

The current assignment of all variables can be viewed using omd config show:

OMD[mysite]:~$ omd config show
ADMIN_MAIL:
AGENT_RECEIVER: on
AGENT_RECEIVER_PORT: 8005
APACHE_MODE: own
APACHE_TCP_ADDR: 127.0.0.1
APACHE_TCP_PORT: 5008
AUTOSTART: off
[...]

The command output above is abreviated for clarity and shows only the first entries.

5.3. Commonly used settings

There are numerous settings in omd config. The most important are:

VariableDefaultMeaning

AUTOSTART

on

Set this to off if you want to prevent an automatic starting of the site when the computer is started. This is primarily of interest for test installations that should not normally start by themselves.

CORE

nagios (Raw Edition),
cmc (Enterprise Editions)

Selection of the monitoring core. In the CEE Checkmk Enterprise Editions the Nagios core can be selected instead of the Checkmk Micro Core (CMC). The CRE Checkmk Raw Edition has only nagios as its monitoring core.

MKEVENTD

on

Activates the Event Console with which the syslog messages, SNMP traps and other events can be processed.

LIVESTATUS_TCP

off

Allows external access to the status data for this site. This can be used to set up a distributed monitoring. The status of this (remote) site can be incorporated into the central site. Enable this setting only in a secure network.

Note: You can also see these variables under the same names in the interactive mode.

6. Copying and renaming sites

6.1. Copying sites

It is sometimes useful to create a copy of a site for testing purposes or when preparing for an update. Of course one could simply copy the /omd/sites/mysite_old directory to /omd/sites/mysite_new. That will however not work as desired, because:

  • many configuration files include the site’s name,

  • in several places absolute paths which start with /omd/sites/mysite_old also appear,

  • and not least, at the operating system level there must be a user, including its associated group that owns the site and by default has the same name as the site.

To simplify copying a site, there is instead the omd cp command, which takes all of this into account. Run the command as root and simply enter the name of the existing site followed by the name of the new one. For example:

root@linux# omd cp mysite_old mysite_new

The copy can only work if:

  • the site is stopped and

  • no processes are running that belong to the site user.

Both of these ensure that the site is in a consistent state at the time of copying and does not change during the operation.

6.2. Migrating the configuration

OMD could originally only handle the files that were actually created during the creation of the site with omd create, and which also contained the site’s ID ($OMD_SITE). These files can be found in the site directory ~/etc with this command:

OMD[mysite]:~$ grep -r $OMD_SITE etc

Previously, OMD could not do anything with configuration files that were created later via work with the Checkmk site (the configurations of hosts that had been added at a later date, for example). From a purely technical point of view, this behavior corresponds exactly to the scope of OMD. However, the expectation of most users is that an omd cp creates a completely new site that can continue to be used productively — including its own monitoring configuration.

From Checkmk version 2.1.0 OMD can now also customize the most important elements of the Checkmk configuration. By the way, you don’t have to do anything, the whole migration described below takes place automatically.

A typical example: In a host’s properties you can use the Monitored on site attribute to manually specify which site this host should be monitored on, for example mysite_old. After an omd cp mysite_old mysite_new the value will change to mysite_new accordingly. (Previously this procedure would have resulted in the entry Unknown site (mysite_old)).

The actual technical implementation of this migration is as follows: OMD detects changes to the site ID and then executes the post-rename-site -v -o mysite_new command. The individual migration steps are subsequently processed completely automatically via the so-called rename actions plugins, which you can find in the Git repository at cmk/post_rename_site/plugins/actions.

Migration also includes informing you about anything that cannot be migrated automatically.

Here’s a concrete example: you are using distributed monitoring and rename both the central site and a remote site.

Central site: The sites.py plug-in detects that this is a central site and updates, among other things, the URL prefix value, which can be found in the connection settings of the local site under Setup > General > Distributed Monitoring.

Remote site: The warn_remote_site.py plug-in recognizes that it is a remote site and accordingly indicates that the central site must be checked and manually customized if necessary. This in turn means that in the distributed monitoring settings on the central site, the remote site’s new name must be entered in the connection settings to the renamed remote site — OMD of course cannot do this from a remote computer.

OMD itself informs you in detail about the whole procedure in the terminal. Here you can see an example of the migration messages from the omd cp output when renaming a central site — separated into success and warning messages. The processed rename action plugins are numbered individually. First the output from the automatically performed migration tasks (shortened here):

...
Executing post-cp script "01_cmk-post-rename-site"...
-|  1/6 Distributed monitoring configuration...
-|  2/6 Hosts and folders...
-|  3/6 Update core config...
...

The second part of the output now contains tips regarding settings you may need to configure manually (here heavily abbreviated):

...
-|  4/6 Warn about renamed remote site...
-|  5/6 Warn about new network ports...
-|  6/6 Warn about configurations to review...
...

The Warn about configurations to review…​ item includes general notes on individual aspects that will generally need to be reviewed manually during a migration, such as hardcoded filters for views:

...
-| Parts of the site configuration cannot be migrated automatically. The following
-| parts of the configuration may have to be reviewed and adjusted manually:
-|
-| - Custom bookmarks (in users bookmark lists)
-| - Hard coded site filters in custom dashboards, views, reports
-| - Path in rrdcached journal files
-| - NagVis maps or custom NagVis backend settings
-| - Notification rule "site" conditions
-| - Event Console rule "site" conditions
-| - "site" field in "Agent updater (Linux, Windows, Solaris)" rules (CEE/CME only)
-| - Alert handler rule "site" conditions (CEE/CME only)
-|
-| Done

Here is an overview of the six currently active plug-ins — the order here corresponds to the numbering in the above output:

Plug-inFunction

sites.py

Changes the site ID in various configuration files.

hosts_and_folders.py

Changes the site attribute of host and folder properties.

update_core_config.py

Updates the core configuration (cmk -U).

warn_remote_site.py

Warns when renaming a remote site.

warn_changed_ports.py

Notices of problems with multiple ports.

warn_about_not_migrated_configs.py

General tips for elements that should be checked manually.

6.3. Limiting data volume

If you are monitoring a large number of hosts with the site, the volume of data to be copied can be quite substantial. Most of this is produced by the measured values stored in the Round Robin Databases (RRD). But the log files containing historic events can also produce larger data volumes.

If the history is not required (for example, because you just want to test something quickly), these can be omitted from the copy. In such cases the following options can be added to omd cp:

--no-rrds

Copies the site without the RRDs.

--no-logs

Copies the site without log files and other historical data.

-N

Does both: -N is an abbreviation for --no-rrds --nologs.

The order of the option(s) is important:

root@linux# omd cp --no-rrds mysite_old mysite_new

6.4. Renaming sites

Renaming a site is performed with the omd mv command. This is done similarly to copying a site, has the same prerequisites and is also done including configuration migration. The options to restrict the data volume are not available since the data is only being moved to another directory and is not being duplicated.

Example:

root@linux# omd mv mysite_old mysite_new

6.5. Other options

As with creating a site, copying and renaming each creates a new Linux user. Therefore omd cp and omd mv also have some of the same options as omd create, e.g. to specify user and group IDs. For more detailed information, use the omd cp --help and omd mv --help commands.

7. Showing changes in configuration files

When creating a site, the omd create command fills the ~/etc directory with numerous predefined configuration files. A number of directories will also be created under ~/var and ~/local.

Now it will probably be the case that over the course of time a number of the files will have been customized. When after a time you wish to determine which files are no longer in the condition as originally supplied, the omd diff command can provide the answer. Amongst other things, this is useful before an update of Checkmk, since your changes could conflict with changes in the default files.

When called without further arguments, all changed files below the current directory will be listed:

OMD[mysite]:~$ omd diff
 * Changed content var/check_mk/wato/auth/auth.php
 ! Changed permissions var/check_mk/wato/auth/auth.php
 * Changed content etc/htpasswd
 * Changed content etc/diskspace.conf
 ! Changed permissions etc/diskspace.conf
 * Changed content etc/auth.secret
 * Changed content etc/mk-livestatus/xinetd.conf
 * Changed content etc/omd/allocated_ports
 * Changed content etc/apache/apache.conf
 * Deleted etc/apache/apache-own.conf

You can also enter a query for a specific directory:

OMD[mysite]:~$ omd diff etc/apache
 * Changed content etc/apache/apache.conf
 * Deleted etc/apache/apache-own.conf

If you wish to see the changes in detail, simply enter the path to the file:

OMD[mysite]:~$ omd diff etc/apache/apache.conf
74,75c74,75
< ServerLimit 64
< MaxClients 64
---
> ServerLimit 128
> MaxClients 128

8. Updating sites

The omd update command is used to update the monitoring software installed on the site to a later version. This is presented in detail in the Updating Checkmk article. Other useful omd commands related to software updates are also shown there as examples:

  • omd versions to list all installed software versions,

  • omd sites to list all existing sites with the versions installed on them,

  • omd version to display the default version used when creating a site,

  • omd setversion to set a different default version.

By the way, omd update is also used to upgrade to another edition, e.g. from the Free Edition to the Standard Edition.

9. Backing up and restoring sites

9.1. Creating a backup

The site management in Checkmk has a built-in mechanism for backing up and restoring Checkmk sites. The omd backup and omd restore commands are the basics for packing all of the site’s data into a tar archive, and respectively, extracting that data for a restore.

Note: Checkmk also offers the possibility of performing backups and restores without using the command line, via the GUI under Setup > Maintenance > Backups. There you can also create encrypted backups and scheduled backup jobs. See the Backups article to learn how to do this.

Backing up a site with omd backup does not require root permissions. A site user can perform this. Simply enter as an argument the name of the backup file to be created:

OMD[mysite]:~$ omd backup /tmp/mysite.tar.gz

Note that:

  • The created file type is a gzip-compressed tar archive. Therefore use .tar.gz or .tgz as the file extension.

  • Do not store the backup in the site directory, since this will of course be completely backed up – thus every subsequent backup will contain a copy of all of its predecessors.

  • If you create the backup as a site user, only the site user and their group will get read and write access to the tar archive.

If the backup’s target directory is not writable for a site user, you can also do the backup as root. In this case an additional argument is required, as always, specifying the site name to be backed up:

root@linux# omd backup mysite /var/backups/mysite.tar.gz

The backup contains all of the site’s data — except for the volatile data under ~/tmp/. With the tar tzf command one can easily have a look at the file’s contents:

OMD[mysite]:~$ tar tvzf /tmp/mysite.tar.gz  | less
lrwxrwxrwx mysite/mysite     0 2022-07-25 11:59 mysite/version -> ../../versions/2.1.0p8.cre
drwxr-xr-x mysite/mysite     0 2022-07-25 17:25 mysite/
-rw------- mysite/mysite   370 2022-07-26 17:09 mysite/.bash_history
-rw-r--r-- mysite/mysite  1091 2022-07-25 11:59 mysite/.bashrc
-rw-r--r-- mysite/mysite    63 2022-07-25 11:59 mysite/.modulebuildrc
-rw-r--r-- mysite/mysite  2066 2022-07-25 11:59 mysite/.profile
drwxr-xr-x mysite/mysite     0 2022-07-25 11:59 mysite/.version_meta/
drwxr-xr-x mysite/mysite     0 2022-07-20 11:40 mysite/.version_meta/skel/
-rw-r--r-- mysite/mysite  1091 2022-06-26 02:03 mysite/.version_meta/skel/.bashrc
-rw-r--r-- mysite/mysite    52 2022-07-20 09:02 mysite/.version_meta/skel/.modulebuildrc
-rw-r--r-- mysite/mysite  2055 2022-06-26 02:03 mysite/.version_meta/skel/.profile
drwxr-xr-x mysite/mysite     0 2022-07-20 11:40 mysite/.version_meta/skel/etc/
drwxr-xr-x mysite/mysite     0 2022-07-20 11:40 mysite/.version_meta/skel/etc/apache/
-rw-r--r-- mysite/mysite  1524 2022-06-26 02:03 mysite/.version_meta/skel/etc/apache/apache-own.conf

9.2. Backup without history

The lion’s share of the data to be moved during a site backup are the measured values and the log files with historical events. This is just as true when backing up as when copying a site. If you do not absolutely need this data, you can omit it and thus make the backup much faster and the resulting output file much smaller.

omd backup provides the same options to omit this data as omd cp does when copying. In the following example, the backup is created without measurement data and without the history stored in the log files:

OMD[mysite]:~$ omd backup -N /tmp/mysite.tar.gz

9.3. Backing up a running site

A backup can also be created from a running site. To ensure a consistent state of the Round Robin Databases (RRD) used for recording the measurement data, the omd backup command automatically alters the Round Robin cache to a mode with which the running updates are written only to the journal, and no longer to the RRDs. The journal files are the last to be backed up — thus it can be achieved that as much as possible of the measurement data that has been generated during the backup is also included in the backup.

9.4. Restore

Restoring a backup is as simple as creating a backup. The omd restore command restores a site from a backup — in the Checkmk version that was used to backup the site. Therefore, for the restore to work, this same version must be installed on the server.

The site is completely emptied and refilled. Before omd restore the site must be stopped and afterwards it must be restarted:

OMD[mysite]:~$ omd stop
OMD[mysite]:~$ omd restore /tmp/mysite.tar.gz
OMD[mysite]:~$ omd start

A restore can also be performed by a`root` user. Unlike when called by the site user, the site will be recreated with the backup.

So if there is still a site with the same name, you will need to delete it before the restore. This can be performed either with an omd rm, or by simply including the --reuse option with the omd restore. A --kill additionally ensures that the already existing site is stopped before the restore proceeds. You do not need to specify the site name in the command, because it is contained in the backup:

root@linux# omd restore --reuse --kill /var/backup/mysite.tar.gz
root@linux# omd start mysite

As root, you can also restore a site with a different name from that in the backup. To do this, specify the desired name as an argument after the word restore:

root@linux# omd restore mysite2 /var/backup/mysite.tar.gz
Restoring site mysite2 from /tmp/mysite.tar.gz...
 * Converted ./.modulebuildrc
 * Converted ./.profile
 * Converted etc/xinetd.conf
 * Converted etc/logrotate.conf

The long list of conversions that happen here has the same function as for copying and renaming sites described earlier. The site name is included in numerous configuration files, and with this procedure any such occurrences will be replaced automatically by the new name.

9.5. Live backup & restore to another server

The omd backup and omd restore commands can — in good old Unix tradition — also work via standard input/output instead of files. Instead of a path for the tar file, simply enter a hyphen (-).

In this way a pipe can be constructed and the data ‘streamed’ directly to another computer without requiring intermediate files. The larger the backup, the more advantageous this will be since no temporary space in the backed up server’s file system will be needed.

The following command backs up a site to another computer using SSH:

root@linux# omd backup mysite - | ssh user@otherserver "cat > /var/backup/mysite.tar.gz"

If you want to reverse the SSH access, i.e. prefer to connect from the backup server to the Checkmk site, this is also possible, as the following example shows. For this, first an SSH login as a site user must be permitted.

root@otherserver# ssh mysite@checkmkserver "omd backup -" > /var/backup/mysite.tar.gz

If you are clever, and combine the above with an omd restore which reads the data from the standard input, you can copy a complete, running site from one server to another — and without needing any additional space for a backup file:

root@otherserver# ssh mysite@checkmkserver "omd backup -" | omd restore -

And now again the whole thing with reversed SSH access — this time again from the source system to the target system:

root@linux# omd backup mysite - | ssh user@otherserver "omd restore -"

10. Deleting sites

Deleting a site is as easy as creating one — with the omd rm command as root. The site will first be automatically stopped.

root@linux# omd rm mysite
PLEASE NOTE: This action removes all configuration files
             and variable data of the site.

In detail the following steps will be done:
- Stop all processes of the site
- Unmount tmpfs of the site
- Remove tmpfs of the site from fstab
- Remove the system user <SITENAME>
- Remove the system group <SITENAME>
- Remove the site home directory
- Restart the system wide apache daemon
 (yes/NO): yes

Attention: It goes without saying that this action also deletes all of the site’s data!

If you are no fan of confirmation prompts, or wish to perform the deletion as part of a script, the deletion can be forced with the -f option. Attention: Here the -f must be placed before the rm:

root@linux# omd -f rm mysite

11. Files and directories

PathDescription

/omd/sites/mysite

Site directory for the site mysite.

~/etc/

The site’s configuration files are stored in this directory.

On this page