1. Introduction
In Checkmk you configure parameters for hosts and services by using rules. This feature makes Checkmk very effective in complex environments, and also brings a number of advantages to smaller installations. In order to clarify the principle of rule-based configuration we will compare it to the classic method.
1.1. The classic approach
As an example, let’s take the configuration of the thresholds for WARN and CRIT in the monitoring of file systems. With a data base-oriented configuration, for each file system one would enter a line into a table:
Host | File system | Warning | Critical |
---|---|---|---|
|
|
90 % |
95 % |
|
|
90 % |
95 % |
|
|
90 % |
95 % |
|
|
85 % |
90 % |
|
|
85 % |
90 % |
|
|
85 % |
95 % |
|
|
100 % |
100 % |
This is relatively straightforward — but only because the table in this example is short. In practice there tend to be hundreds or thousands of file systems. Tools like copy & paste, and bulk operations can simplify the work, but the basic problem remains — how can you identify and implement a standard policy? What is the general rule? How should thresholds for future hosts be preset?
1.2. Rules-based is better!
A rules-based configuration however consists of the policy!
We will replace the logic of the above table with a set of four rules.
If we assume that myserver001
is a test system,
and that in each case the first relevant rule applies to every file system,
the result will be the same thresholds as in the table above:
File systems with the mount point
/var/trans
have a 100/100 % threshold.The
/sapdata
file system onmyserver002
has a 85/95 % threshold.File systems on test systems have a 90/95 % threshold.
All (unspecified) file systems have a 85/90 % threshold.
Granted, for only two hosts that doesn’t achieve much, but with only a few more hosts it can quickly make quite a big difference. The advantages of the rules-based configuration are obvious:
The policy is clearly recognizable and can be reliably implemented.
You can change the policy at any time without needing to handle thousands of data sets.
Exceptions are always still possible, but are documented in the form of rules.
The incorporation of new hosts is simple and less fault-prone.
In summary, then: less work — more quality! For this reason, with Checkmk you will find an abundance of rules for customizing hosts and services — such as thresholds, monitoring settings, responsibilities, notifications, agent configuration and many more.
1.3. Types of rule sets
Within Setup Checkmk organizes rules in rule sets. Every rule set has the task of defining a specific parameter for hosts or services. Checkmk contains more than 700 rule sets! Here are some examples:
Host check command — defines how to determine whether hosts are UP.
Alternative display name for services — defines alternative names for services’ displays.
JVM memory levels — sets thresholds and other parameters for monitoring the memory usage of Java virtual machines (VM).
Every rule set is responsible either for hosts or for services — never for both. If a parameter can be defined for hosts as well as services, there is a pair of applicable rules — e.g., Normal check interval for host checks and Normal check interval for services checks.
A few rule sets, strictly-speaking, don’t actually define parameters, rather they create services. An example are the rules for active checks, which can be found at Setup > Services > HTTP, TCP, Email, …. With these you can, e.g., set up an HTTP check for specific hosts. These rules are classified as host rules — due to the fact that if such a check exists on a host it is deemed to be a property of the host.
Further, there are rule sets that control the service discovery. With these you can, for example, via Windows service discovery define for which Windows services automatic checks should be created if they are found on a system. These are also host rules.
The bulk of the rule sets determine parameters for specific check plug-ins. An example is Network interfaces and switch ports. The settings in these rules are tailored very specifically to their appropriate plug-in. Such rule sets fundamentally only find use with those services that are based on this plug-in. In case you are uncertain which rule set is responsible for which services, then you can best find out by navigating directly via the service to the relevant rule. How to do this will be explained later.
1.4. Host tags
One thing we have so far not mentioned: In the above example there is a rule for all test systems. Where is it actually defined that a host is a test system?
In Checkmk, something like test system is known as a host tag. You can see which tags are available via Setup > Hosts > Tags. Some tags are already predefined — for example, for a Test system defined in the Criticality group.
Applying tags to hosts is done either explicitly in the properties of the host, or through inheritance in the folder hierarchy. How to do this is explained in the article on hosts. How to create your own tags, and what the predefined tags are about will be explained in the article on host tags.
2. Determining the correct rule sets
2.1. Host rule sets
If you wish to create a new rule that defines a parameter for one or more hosts, there are several ways to this end. The direct way is via the corresponding group in the setup menu, in this case Setup > Hosts > Host monitoring rules:
In the following view, all rule sets relevant for host monitoring are displayed. The numbers following the names of these rule sets show the number of rules that have already been defined:
However, you can reach your goal somewhat faster via the search field.
To do this, of course, you need to know approximately what the rule set is called.
Here is the result of a search for host checks
as an example.
Another way is via the menu item Hosts > Effective parameters in the properties of an existing host in the Setup or via the icon in the list of hosts of a folder.
There you will find not only all the rule sets that affect the host, but also the parameter currently effective for this host. In the example of Host check command no rule applies for the shown host, and it is therefore set to the Smart PING (only with Checkmk Micro Core) default value of the commercial editions. In the Raw Edition the default value is PING (active check with ICMP echo request).
Click on Host check command in order to see the complete rule set.
If a rule already exists, instead of the Default value the number of the rule defining this parameter appears.
Clicking on this takes you directly to the rule.
2.2. Service rule sets
The path to the rule sets for services is similar. The general access is via the Setup menu, in this case Setup > Services > Service monitoring rules or, more appropriately via the search field.
If you are not yet very experienced with the names of the rule sets, then the path via the service is simpler. Similarly to the hosts, there is also a page in which all of a service’s parameters are shown and where you have the possibility of directly accessing the applicable rule sets. You can access this parameter page with the icon in a host’s list of services in the Setup. The icon takes you directly to the rule set that defines the parameter for the check plug-in for this service.
By the way — the icon for the parameter page is also found in the monitoring in every service’s action menu:
2.3. Enforced services
In the Setup menu you will also find an entry for Enforced Services. As the name suggests, you can use these rule sets to force services to be created on your hosts. Details can be found in the article about services. A small number of rule sets — such as Simple checks for BIOS/Hardware errors — can only be found under the enforced services. These are services which do not result from the service discovery, but are created manually by you.
2.4. Rule sets in use
In each of the aforementioned lists of rule sets — whether in the Host monitoring rules or the Service monitoring rules — you can use Related > Used rulesets in the menu bar, to display only the rule sets in which you have defined at least one rule. This is often a convenient way to get started if you want to make adjustments to your existing rules. Incidentally, some of the rules will have been generated by default when creating the Checkmk site and are a part of the sample configuration. These are also displayed here.
2.5. Ineffective rules
Monitoring is a complex matter. It can happen that there are rules which do not match a single host or service — either because you have made a mistake or because the matching hosts and services have disappeared. Such ineffective rules can be found in the aforementioned rule set listings via Related > Ineffective rulesets in the menu bar.
2.6. Obsolete rule sets
Checkmk is under constant development. Occasionally things are standardized and it can occur that some rule sets are replaced by others. If you have such rule sets in use, the easiest way to find them is do a rule search. Open it via Setup > General > Rule search. Then click in the menu bar on Rules > Refine search, select Search for deprecated rulesets as the option for Deprecated and select Search for rule sets that have rules configured as the option for Used. After an additional click on Search you get the desired overview.
3. Creating and editing rules
The following image shows the Filesystems (used space and growth) rule set with four rules configured:
New rules are created either with the Create rule in folder button, or by cloning an existing rule with . Cloning creates an identical copy of the rule that you can then edit with . A new rule created using the Create rule in folder button will always appear at the end of the list of rules, whereas a cloned rule will be displayed as a copy below the original rule from which it was cloned.
The sequence in which the rules are listed can be changed with the button. The sequence is important because rules positioned higher in the list always have priority over those located lower.
The rules are stored in the same folders from which you also manage the hosts. The rules’ authorities are restricted to the hosts in this folder or in subfolders. In the case of conflicting rules, the rule lower in the folder structure has priority. In this way, for example, users with rights limited to certain authorized folders can create rules for their hosts without affecting the rest of the system. In a rule’s properties you can change its folder and thus ‘relocate’ it.
3.1. The analysis mode with ‘traffic lights’
When you access a rule set for a host or service in Setup, Checkmk will show you this rule set in the analysis mode. You can get there by clicking on the icon in the action menu in the Setup in the host or service list. The following Effective parameters of page shows the list of rules that apply to the host/service. To go to the analysis mode, click on the name of a rule set for which at least one rule exists, i.e. a set which is not set to the Default value:
This mode has two features. Firstly, a second button for setting rules appears — Add rule for current host bzw. Add rule for current host and service.
With this you can create a new rule which has the appropriate current host or service already preselected. You can create an exceptional rule very easily and directly in this way. Secondly, a ‘traffic light’ icon appears in every line, the color of which shows whether and/or how this rule affects the current host, or respectively, service. The following conditions are possible:
This rule has no effect on the current host or service. |
|
This rule matches and defines one or more parameters. |
|
The rule matches. But because another rule higher in the hierarchy has priority this rule is ineffective. |
|
This rule matches. Another rule higher in the hierarchy in fact has priority but doesn’t define all parameters, so that at least one parameter is defined by this lower rule. |
In the last condition — the rule is a partial match — can only occur for rule sets in which a rule can define multiple parameters by selecting individual check boxes. Theoretically, every parameter for another rule can also be set individually here. More on this later.
4. Rule characteristics
Each rule consists of three blocks. The first block contains general information about the rule, such as the rule’s name. The second block defines what the rule is supposed to do, i.e. which actions it is to perform. The third block contains the information on what, i.e. on which hosts or services, the rule is to be applied.
4.1. Rule properties
Everything in the first block, Rule Properties, is optional, and serves primarily for documentation:
The Description will be shown in the table of all rules in a rule set.
The Comment field can be used for a longer description. It only appears in a rule’s edit mode. Via the icon you can insert a date stamp and your login name in the text.
The Documentation URL is intended for a link to internal documentation that you maintain in another system (e.g., a CMDB). It will appear as the clickable icon in the rules table.
With the Do not apply this rule check box you can temporarily disable this rule. It will then be flagged as in the table and is thus ineffective.
4.2. The defined parameters
The second section is different for each rule, but always specifies what should be done. The following illustration shows a widely-used type of rule (DB2 Tablespaces). You can use checkboxes to determine which individual parameters the rule should define. As described above, Checkmk determines which rule defines each individual parameter separately. The rule from the illustration therefore only defines the one value and leaves all other settings unaffected:
You can also control the values in this and other rules on a time/calendar basis. For example, you can set threshold values so that tablespace usage during business hours differs from that on weekends.
First click the Enable timespecific parameters button and then click Add new element, you will see the time-dependent options:
Now select a time period in the Match only during time period list, and then select the parameters for which this time period should apply.
Some of the rule sets do not set a parameter, but only decide which hosts are in and which are not. An example is the rule set Hosts to be monitored, whose parameter range looks like this:
By selecting one of the two available values, you decide what to do with the affected hosts. Selecting Positive match (Add matching hosts to the set) will add the affected hosts to the set of hosts to be monitored. Selecting Negative match (Exclude matching hosts from the set) removes the affected hosts from the monitoring. The Positive match or Negative match refers to the content of the current rule. It is not an additional filter criterion for selecting hosts. You filter the set of affected hosts exclusively with the following Conditions.
4.3. Conditions
In the previous section, you defined how all those hosts or services that are affected by your rule are to be processed. In the third section Conditions you now define which hosts or services are to be acted on by the rule — and thus its effects. There are different types of conditions that must all be fulfilled for the rule to take effect. The conditions are therefore logically AND-linked:
Condition type
Here you have the option of using normal conditions as well as predefined conditions. These are managed via Setup > General > Predefined conditions. Here you simply give fixed names to the rule matches that you need again and again, and from then on simply refer to them in the rules. You can even later change the content of these conditions centrally and all the rules will be automatically-adjusted to suit. In the following example the predefined condition No VM has been selected:
Folder
With the Folder condition you define that the rule only applies to hosts in this folder — or a subfolder. If the setting is Main, this condition is applicable to all hosts. As described above, the folders have an effect on the rule’s sequence. Rules in lower folders always have priority over higher ones.
Host tags
Host tags restrict rules to hosts according to whether they have — or do not have — specific host tags. Here as well, AND-links are always used. Every other host tag condition in a rule reduces the number of hosts affected by the rule.
If you wish to make a rule applicable for two possible values for a tag, (e.g. for Criticality both Productive system and Business critical), you cannot do this with a single rule. You will require a copy of the rule for each variant. Sometimes a negation can also help here. You can also define that a tag is not present as a condition (e.g., not Test system). The so-called auxiliary tags are another possibility.
Because some users really use many host tags, we have designed this dialog so that not all host tag groups are displayed by default. You have to specifically select the one needed for the rule. It works like this:
In the selection box choose a host tag group.
Click Add tag condition — an entry for this group will then be added.
Select is or is not.
Select the desired tag as a comparison value.
Labels
You can also use the labels for conditions in rules. Include conditions with Add label condition — choose either has or has not to formulate a positive or negative condition, and then enter the label in the usual form key:value. Pay attention to the exact spelling including case-sensitivity here — otherwise the condition will not work correctly.
Note: If you have defined neither Host tags nor Host labels, the rule in question will always be applied to all hosts or services. If you have created several rules, subsequent rules may no longer be evaluated, see Types of rule evaluation.
Explicit hosts
This type of condition is intended for exception rules. Here you can list one or more host names. The rule will apply only to these hosts. Note that if you check the Explicit hosts box but enter no hosts, then the rule will be completely ineffective.
Via the Negate option you can define a reversed-exception. With this you can exclude explicitly-named hosts from the rule. This rule will then apply to all hosts except the ones mentioned here.
Important: All host names entered here will be checked for exact congruence. Checkmk is fundamentally case-sensitive in host names!
You can change this behavior to regular expressions by prefixing host names with a tilde (~
).
In this case, as always in the Setup:
The match is applied to the beginning of the host name.
The match is not case-sensitive.
A point-asterisk (.*
) in regular expressions allows an arbitrary sequence of characters following the point.
The following example shows a condition which all hosts will match whose names contain the character sequence my
(or My
, MY
, mY
etc.):
Explicit services
For rules that are applicable to services there is a last type of condition that defines a match on a service’s name, or respectively — for rules that set check parameters — the check item’s name. With what exactly the match will be made can be seen in the caption. In our example it is the name (Instance) of a Tablespace:
A match with regular expressions fundamentally applies here.
The sequence .*temp
matches all tablespaces containing temp
because the match is always applied to the start of the name.
The dollar sign at the end of transfer$
represents the end and thereby forces an exact match.
A tablespace with the name transfer2
will thus not match.
Don’t forget:
for rules concerning Explicit services a match with the service name is required (e.g. Tablespace transfer
).
For check parameter rules a match with the item applies (e.g. transfer
).
The item is in fact the variable part of of the service name, and determines to which tablespace it applies.
There are incidentally services without an item. An example is CPU load. This exists only once for each host — so no item is required. It follows then that rules for such check types are also without conditions.
5. Rule analysis
Now we have described how rules are created. However, simply creating rules is not enough. As shown by the example in the Rules-based is better! section at the start of this article, a single rule is not sufficient to achieve the desired result. A more complex system of logically-sequenced rules is required for this. For that reason an understanding of how multiple rules interact also becomes important.
5.1. Types of rule analysis
In the introduction to the concept of rules, you saw that the first rule that applies always determines the final outcome. This is not the whole truth. There are a total of three different types of evaluation:
Evaluation | Action |
---|---|
First rule |
The first matching rule defines the value. Further rules are not evaluated. This is the normal case for rules that define simple parameters. |
First rule per parameter |
Each single parameter is defined by the first rule where that parameter is defined (checkbox ticked). This is the normal case for all rules with sub-parameters that are activated with checkboxes. |
All rules |
All matching rules will add elements to the resulting list. This type is used, for example, when matching hosts and services to host, service and contact groups. |
The information on how the rule is evaluated is displayed for each rule set directly below the menu bar:
5.2. Rule evaluation in the practice explained
Now, how will it evaluated concretely if one has created several rules that are to be applied to several hosts? To illustrate this, let’s take a simple example:
Let’s say you have three hosts and you want to set different periodically-repeated notifications for each of them (and also for all hosts added in the future) with the Periodic notifications during host problems rule:
Rule A: Host-1 every 10 minutes
Rule B: Host-2 every 20 minutes
Rule C: all hosts every 30 minutes (general rule to cover both Host-3 and any hosts added in the future).
If you now activate your configuration, Checkmk runs through the chain of rules from top to bottom. This results in the following evaluation:
Rule A applies to Host-1. The notification for Host-1 takes place every 10 minutes. This completes the processing for Host-1.
Rule A does not apply to Host-2. We continue with rule B. This applies to Host-2 so that Host-2 is notified every 20 minutes. This completes the processing for Host-2.
Rule A does not apply to Host-3, neither does Rule B. But rule C fits and is applied: notification for Host-3 is at 30-minute intervals. This also completes the processing for Host-3.
Please note here: Since 'The first matching rule defines the parameter' applies to this rule set, the processing of the rule chain is always terminated after the first match. The order of the rules is therefore decisive for the result! This becomes apparent when the order of the rules is changed and rules B and C are swapped:
Rule A: Host-1 every 10 minutes
Rule C: all hosts every 30 minutes
Rule B: Host-2 every 20 minutes
If the rule chain is now run through again from top to bottom for the individual hosts, the result also changes: Rule C now applies not only to Host-3, but also to Host-2, so that the notification for both hosts takes place every 30 minutes. This completes the processing for both hosts. Although Rule B would be relevant for Host-2, and was even written for this host, it will no longer be evaluated and applied. In the analysis mode, the process will then look like this:
By combining the various settings explained in this article — keeping in mind the correct processing order — you can use them to build complex rule chains for entire host complexes.