Checkmk
to checkmk.com

1. Introduction

The monitoring of tasks that are processed regularly or on demand, as well as permanently running processes, can be done, for example, by analyzing log or status files. However, this is often associated with additional effort as it is often necessary to read large volumes of data in order to extract little information.

To reduce this effort, Checkmk offers the option for a program to write output directly to a file in Checkmk agent format. Stored in the so-called spool directory, the agent collects all of these files and integrates their content into the agent output. The method using the spool directory is useful, for example, for

  • the regular analysis of log files,

  • monitoring automatic backups,

  • the creation and checking of usage statistics from a database,

  • the control of cronjobs in general,

  • the development of own checks for testing sample output.

In Checkmk, the spool directory is supported by the agents in the following operating systems: Windows, Linux, AIX, FreeBSD, OpenWRT, and Solaris.

To ensure that you can use this feature with no problems, there are a few points to keep in mind.

2. File directory paths

The default file path to the spool directory is /var/lib/check_mk_agent/spool/ on Linux and other Unix systems, and C:\ProgramData\checkmk\agent\spool\ on Windows. For Linux and Unix you can customize the parent directory path with the Agent rules > Installation paths for agent files (Linux, UNIX) rule and the Base directory for variable data (caches, state files) option available there.

If you are working on a host in monitoring, you can filter out the spool directory configured there from the agent output:

user@host:~$ check_mk_agent | grep SpoolDirectory
SpoolDirectory: /var/lib/check_mk_agent/spool

Checkmk uses a single spool directory, which by default is owned by root. Multiple directories with different owners are not provided. However, you can of course create (initially empty) files in the spool directory and make another user the owner, who can then overwrite the contents of his file.

3. File names and contents

Spool files can contain text output in any of the formats processed by Checkmk. The files are appended to each other in the order present in the spool directory. The file extension used does not matter.

If you want to use a numeric scheme for sorting, prefix the filename with an underscore (_), since filenames starting with numbers serve as age check. Files starting with a dot are ignored.

To avoid confusion in the concatenated file contents, each spool file should

  • start with a section header, i.e. a line enclosed in <<< and >>> — even if only the local checks format is used in the file,

  • be terminated with a newline.

Thus, a local check that provides a service immediately may look like this:

/var/lib/check_mk_agent/spool/spooldummy.txt
<<<local>>>
0 "Spool Test Dummy" - This static service is always OK

Similarly, you can drop outputs that require a check plug-in on the Checkmk side:

/var/lib/check_mk_agent/spool/poolplugin.txt
<<<waterlevels>>>
rainbarrel 376
pond 15212
pool 123732

3.1. Piggyback section termination

If you use piggyback sections in a file, terminate this file with the line <<<<>>>>. This is the only way to ensure that if the readout order changes, the output following the piggyback output is reassigned to the host itself.

4. Age check of files

If a program can write correctly to its output file, everything is great — regardless of whether it has processed successfully or not. But what if a program aborts before writing to a disk, or a file system error prevents new files from being written?

For this situation you have the option to prefix the file name with an integer, for example 600MyCronjob. In this case the number will be interpreted as the maximum age of the file in seconds. If the file is older, it will be ignored by the agent and the associated service in Checkmk will change to the UNKNOWN state because of the missing output. In the example of a file named 3900_hourly_cleaner.txt, the number is thus appropriately chosen for a cronjob running every hour, for which an execution time of less than five minutes is expected.

5. A practical example

Let’s assume you run a service where users log in and log out. In the log files for the service, you will find lines of the following three types:

/var/log/dummyapp.log
21/Oct/2022:12:42:09 User harrihirsch logged in from 12.34.56.78
21/Oct/2022:12:42:23 User zoezhang logged out after 10 min idle
21/Oct/2022:13:00:00 Current user count: 739

The process does not write the line with Current user count after each login/logout, but at fixed intervals. If the number of lines in the log file is low enough to read quickly, you can program a local check. This check reads the whole log file line by line each time and always sets the user count to the displayed value on occurrences of the line Current user count. On occurrences of the lines logged in and logged out it increases or decreases the user count. At the end, your check prints a line similar to the following:

1 "Frobolator User Count" count=1023 Watch out! Limit nearly used up.

As your service becomes more popular, the local check runs for longer and longer, until at some point this solution, and even running it as an asynchronous agent plug-in is no longer practical. This is where proper use of the spool directory can dramatically increase the efficiency of check execution. In the next few paragraphs, we will show you how to modify the program that has been providing your local check so that it makes effective use of the spool directory.

First, the program should not terminate itself when it reaches the end of the file, but write a spool file that contains the currently determined state of the service when the log file is fully evaluated:

/var/lib/check_mk_agent/spool/1800_frobolator.txt
<<<local>>>
2 "Frobolator User Count" count=1200 Maximum number of users reached!

Then let your program wait for a certain period of time — whether a few seconds or several minutes should depend on the frequency of newly added log entries. It should then evaluate only the newly added lines and recalculate the state. If the numbers have changed, it will rewrite the spool file.

You program this procedure as an endless loop. To detect a possible crash of this program, you should name the spool file accordingly, for example 1800_frobolator.txt — if 30 minutes without updating the spool file indicates problems with the service or the evaluating program.

Instead of the agent’s plugin directory, now start the program as a daemon using your operating system’s options. Pay attention here to automatic restart, if the program crashes or is terminated. Many server applications also provide the option to pipe log output to another program instead of or in addition to writing normal log files. Using this mechanism for an evaluation script that writes spool files is also a good idea.

6. Some fine points to consider

Reading files may have other pitfalls compared to the regular starting of processes by the agent. Consider the following points for a trouble-free operation.

In principle, files in the spool directory can also be soft links or named pipes. It should be noted that the age check via file names does not work here because the age of the soft link or the named pipe itself is evaluated and not the age of the written data. For named pipes, you must also ensure that the process writing to the pipe always supplies data. If no data is supplied, the Checkmk agent will wait forever and eventually timeout.

If you need to allow unprivileged users to write to spool files, create empty files for these users and set their ownership accordingly. These users can then set a soft link on their own or write directly to the spool file.

6.2. Locking and buffering

When writing longer programs that write multiple status lines to a spool file, it is tempting to open the output file writing when the program starts. However, in this case the file will remain completely empty until the write buffer is first emptied and written to the destination, and it will be incomplete until the writing program has closed the file. Similar applies, if a file is locked exclusively for the duration of a long write operation.

For this reason, you should either write to the output file only when the entire contents to be written are available, or write to a temporary file which you then copy to the spool directory — or respectively, use cat to transfer the contents of a temporary file to an existing one in the spool directory.

6.3. Keeping an overview

Another problem can occur when different programs try to write to files with the same name. With many spool files, it is easy to lose track of which program actually writes to which spool file. In particular, if an incorrectly formatted spool file results in part of the agent output becoming unusable, this is very annoying and can result in a time-consuming search.

You can keep things tidy here by creating a file of the same name — prefixed by a dot — for each spool file, which contains information about the job and a contact person, if applicable. The contents of this hidden file are not transferred with it.

On this page