1. Mode of operation
IT-operations distinguish between two types of outage: planned and unplanned. The monitoring system intially cannot know if a detected outage was planned or not. With the concept of scheduled downtimes the system can be informed of a host’s or service’s planned outages by defining a scheduled downtime for the corresponding object. If a host or service is in such a maintenance period, it has the following effects:
A symbol appears in the views for the relevant host/service.
Problem notifications are deactivated during the downtime.
The affected hosts/services are not identified as having a problem in the Overview.
Scheduled downtimes are specially taken into account in the availability analysis.
For information a special alarm will be triggered at the start and end of a scheduled downtime.
2. Entering scheduled downtimes
Defining scheduled downtimes is achieved via commands. All actions pertaining to scheduled downtimes are available here in a single menu:
The Downtime Comment field must always be filled out. You can include a URL, such as http://www.example.com in this field, which will be replaced by a clickable link. There are multiple ways to define start and end time ranges. From the simple 2 hours, which defines the downtime as starting from now, to the entry of an explicit time range where a future maintenance can also be defined.
2.1. Flexible with max. duration
This option is useful when, e.g., you know that a host will enter a DOWN state for a few minutes, but the exact time of the event cannot be predicted. With this option the scheduled downtime does not begin automatically at a nominated time, rather first when a real Problem status appears for the host.
Example: You define a scheduled downtime as being from 14:00 to 16:00, and activate the flexible option with a duration of 30 minutes. At 14:00 the scheduled downtime will not activate, but rather is in a standby position. As soon as the host enters a DOWN or UNREACH state, the scheduled downtime (30 min) will begin and the ‘blue moon’ will appear. This will remain so for the duration of the time nominated in the option, regardless of the actual status of the host, and if need be beyond the end-time specified for the downtime.
Therefore with flexible scheduled downtimes the start/end time is only the time window in which the scheduled downtime can begin. If no problem status occurs within this time window the scheduled downtime will simply be skipped. These conditions of course also apply for services.
2.2. Also set downtime on child hosts
This option is useful for routers and switches, and also e.g., for virtualisation hosts. In this way Checkmk will also automatically set a scheduled downtime on all directly-connected hosts, and also on indirectly-connected hosts if the Do this recursively box is selected.
2.3. Repeat this downtime on a regular basis…
Here you can enter scheduled downtimes that repeat regularly. This will be explained in more detail in a separate section below.
2.4. Schedule downtimes on hosts
For example, if you set downtimes in the Services of Host view, the scheduled downtime would actually apply directly to all or all selected services. If this is not desired, you do not have to navigate to the Status of Host view first, but can use this option to set the downtime directly to the host.
3. Displaying and deleting scheduled downtimes
Scheduled downtimes have their own view in Checkmk — this is accessed via Views > Other > Downtimes:
As in every view, you can narrow the selection with the filter. With the commands, in this view you can remove one or more downtime(s), and even alter them retroactively (only in the Checkmk Enterprise Editions), e.g., if the times need to be extended when the downtime is proving to be longer than anticipated.
The Monitor > History > Downtime history view does not display the current scheduled downtimes, rather their histories — thus all events with which a scheduled downtime began or ended (with a natural end or via a delete command).
4. Regularly-scheduled downtimes
Some maintenance is performed regulary — a once-weekly automatic restart of a server, for example. Manually entering a scheduled downtime for each occasion would be time-consuming. If you would only like to silence the notifications, you could configure time periods and the Notification period for Hosts/Services rules set. These have various restrictions however — one important restriction being that global configuration permissions are required for setting time periods.
For this purpose the Checkmk Enterprise Editions offer the concept of automatic, periodically-recurring, scheduled downtimes. These can be set in two different ways.
4.1. Setting via commands
The first method works in the manner we are already familiar with for one-off scheduled downtimes — via a command — but with the additional Repeat this downtime on a regular basis every … check box.
With this you select the period when the maintenance should repeat. The first occurrence is entered as usual. The Custom time range button is available here. The periods are calculated from the start-time entered. The following possibilities are available:
The scheduled downtime repeats hourly at the the same time
Daily, at the the same time every day
Recurs every seven days on the same weekday and time of day as on the first occasion
Same as for weekly, but every 14 days
Same as for weekly, but now every 28 days
same nth weekday (from beginning)
With this you can achieve results such as “every second Monday in the month”. Here Checkmk takes the day of the week as the starting point, checks which day in the month it is, and bases the period on this day. If the starting date is the second Monday in the month, then a maintenance will be scheduled for the second Monday in every subsequent month.
same nth weekday (from end)
This is similar, except that it is calculated from the end of the month — for example “every last Friday in the Month”.
same day of the month
In this case the weekday is irrelevant. Here the date in the month is used. So, if the starting date is the 5th, the downtime will be scheduled to occur on the 5th of each month.
4.2. Definition using rules
An elegant alternative method for the configuration of periodic scheduled downtimes is to define them using rules. With Host Tags you can define things such as e.g., Every production Windows-server has a scheduled downtime every Sunday from 22:00 to 22:10.
You can in fact achieve almost the same results by using the host search to find all the affected servers, and then entering the scheduled downtime via a command. But this functions only with existing servers. If in the future a new host is added to the monitoring it will not be covered by this entry. Alternatively, if you work with rules this will not be a problem. A further advantage with rules is that the maintenance policy can be altered very easily at a later date — simply by modifying the rules.
The rules for recurring scheduled downtimes can be found under Setup > Host monitoring rules > Recurring downtimes for hosts respectivly Setup > Service monitoring rules > Recurring downtimes for services.
5. Scheduled downtimes and availability
As mentioned at the beginning, scheduled downtimes have an effect when evaluating the availability analysis. By default all scheduled downtimes are calculated in their own ‘pot’ and shown in the Downtime column.
Precisely how scheduled downtimes are to be assessed can be defined via Availability > Change computation options:
Honor scheduled downtimes
Scheduled downtimes are included in the availability graphs and displayed as a separate column. This is the standard procedure.
Exclude scheduled downtimes
Scheduled downtimes are ignored completely when calculating availability. All availability statistics refer only to the remaining time. Therefore — excluding scheduled downtimes, for what percentage of the time was the object available?
Ignore scheduled downtimes
Scheduled downtimes will not be factored in — only the object’s actual states are relevant.
Under Phases there is the additional Treat phases of UP/OK as non-downtime option. If this option is selected, then if the object, despite being in maintenance, still has an OK or UP state, the times are not treated as scheduled downtimes. Thus only the maintenance time that resulted in a real outage will be included in the calculations.