1. Introduction
If you use unix-type operating systems, you are probably already familiar with the so-called cronjobs.
However, cron
is actually a daemon, which manages recurring processes in the background and makes sure that they are executed at predefined intervals.
Of course, this does not have to be done by the cron
program — it is merely the most common method for automating recurring jobs reliably under Linux, and similarly under AIX or Solaris.
Some of these jobs are essential for safe operation, so they should be included in the host monitoring.
In Checkmk you can achieve this with the mk-job
script.
This small script is placed in front of the actual job and then carries out this task.
In the process, mk-job
records a variety of measurement data and delivers it to Checkmk.
The most important measurement data include when the job was last executed and whether it was executed successfully.
The script mk-job
is — as is often the case in Checkmk — a simple shell script which you can check at any time.
You therefore have maximum transparency and control at all times, even for the important jobs on your host.
2. Setting up the plug-in
2.1. Setting up the mk-job program
First, set up the small script on the monitored host to be able to use it.
The easiest way to do this is to fetch the program directly from your Checkmk server with wget
and make it executable — in the following example for a Linux server:
root@linux# wget -O /usr/local/bin/mk-job https://myserver/mysite/check_mk/agents/mk-job
root@linux# chmod +x /usr/local/bin/mk-job
If you want to install the script under AIX or Solaris, load mk-job.aix
or mk-job.solaris
instead.
If the wget
program is not available, you can of course download the file in another way, for example by copying with scp
.
2.2. Monitoring the first job
To monitor the first job, let’s again take cron
as a common example.
Let’s say you have a cronjob like this one:
5 0 * * * root /usr/local/bin/backup >/dev/null
This backup job is executed daily at 12:05 am under the user ID root
.
To monitor this job, use an editor you trust to prefix the command in the line with the script mk-job
together with a name.
The name will later be used as the service name in Checkmk and must therefore be unique on this host:
# Syntax:
# <minute> <hour> <day> <month> <day of week> <user> mk-job <service name> <command>
5 0 * * * root mk-job nightly-backup /usr/local/bin/backup >/dev/null
When executing the newly defined cronjob, mk-job
will try to store the results from the measurement in the /var/lib/check_mk_agent/job/root
directory.
Since the directory job
also belongs to the user root
, it is no problem for mk-job
to create the user directory root
if this does not already exist.
With each call, the agent will look in the directories under /var/lib/check_mk_agent/job/
, and add them to the output.
Such a result could look like the following, whereby for the sake of clarity only the relevant part of the agent output is shown here:
<<<job>>>
==> nightly-backup <==
start_time 1613509201
exit_code 0
real_time 2:06.03
user_time 0.62
system_time 0.58
reads 200040
writes 35536
max_res_kbytes 28340
avg_mem_kbytes 0
invol_context_switches 1624
vol_context_switches 2086
3. Adding the service to Checkmk
In Checkmk you will automatically find the service once the job has been executed and the results saved accordingly. So, as usual, go to the service discovery and activate the service:
On the one hand, you will find all of the measurement points shown above as metrics and in predefined time series graphs. On the other hand, you will also find the measuring points in the summary and in the details for the service:
4. Monitoring the jobs of non-privileged users
If you also want to monitor jobs from users other than root
,
in the job
directory /var/lib/check_mk_agent/job/
first create a user directory
and assign the rights to the appropriate user — in the following command example to the user myuser
:
root@linux# cd /var/lib/check_mk_agent/job/ && mkdir myuser && chown myuser:myuser myuser
This is what makes it possible in the first place for mk-job
to store the results in this directory on behalf of the user.
5. Diagnostic options
If the set-up does not work, you have several options for tracking down the problem(s). Basically, always begin at the starting point of the chain and first check whether you have correctly included the 'mk-job' script, as described in the first steps.
Possible sources of error are:
mk-job
cannot be found bycron
because it is stored in a path which is not recognized bycron
. In this case, specify the full path tomk-job
.The service name contains spaces and has not been enclosed in quotes (").
The job is being run with a user which does not yet have its own directory for storing the results.
If the measurement results have been recorded and saved correctly, check on the Checkmk server whether the results are also passed on correctly by the agent.
You can display the agent output with the following command and forward the output to the less
command:
OMD[mysite]:~$ cmk -d myhost | less
Usually the relevant section <<job>>
is located very far down in the output.
If the results are not visible here, the reason may be that the agent does not have the necessary permissions in order to read the corresponding files. This can occur, for example, if you do not call the agent with the user 'root', but the results may not be read by other users:
root@linux# ls -l /var/lib/check_mk_agent/job/myUser/
total 5
-rw-rw---- 1 myUser myUser 186 Jul 21 11:58 nightly-backup
In such cases, either add a permission for all users to read the results:
root@linux# chmod 664 /var/lib/check_mk_agent/job/myUser/nightly-backup
Or you create a group and assign this group to all of the job files. With the following command you only change the group authorization. The owner is not changed because it remains untouched if nothing is entered before the colon:
root@linux# chown :myJobGroup /var/lib/check_mk_agent/job/myUser/nightly-backup
Make sure that you have created the group beforehand, and as a member of which have added the user with which the agent is called up.
6. Files and directories
Path | Meaning |
---|---|
|
The script |
|
The usual directory under which the results are stored, sorted by user. Note that the path under AIX is different: |