Checkmk
to checkmk.com
Tip

This article is currently under construction and is being expanded on a regular basis.

1. Essentials for monitoring log files

The history of log file monitoring is a history full of misunderstandings. The misunderstandings already begin when we look at what log entries are and what, on the other hand, the services running under Checkmk display. Lines or entries in log files are 'by nature' event based. Checkmk, on the other hand, displays states. Read more about the difference between events and states in the Basic principles of monitoring with Checkmk article.

In Checkmk we circumvent this problem by defining when a service that maps one or more log files assumes a critical state. As a rule, we define 'become critical' when the log file contains messages that are

  • new,

  • not acknowledged, and

  • critical.

You should also use moderation when using logwatch. Logwatch is suitable for limited use and not for processing gigabytes or terabytes of log files. There are certainly more suitable tools for this. We strongly recommend using logwatch only on an ad hoc basis and not routinely. As you will see later in this article, it is easy to carry out an major part of the pre-filtering on the monitored host.

2. Prerequisites

Logwatch is a Python program and therefore requires a Python environment on the host. Python will already be installed in most Linux distributions and Solaris has also included Python 3.x for some time. If you want to monitor log files on a Windows host, there are various ways of achieving this.

CEE Users of our commercial editions can configure logwatch conveniently via the GUI and, using the Agent Bakery, have the plug-in inserted into the agent package. As soon as Checkmk notices that you are configuring a Python-based agent plug-in for a Windows host, the agent will also be given a virtual Python environment (venv).

If you are using one of our commercial editions but not the Agent Bakery, consult the following section for your Windows hosts.

2.1. Python for Windows in Checkmk Raw

Installing Checkmk Python (venv)

The installation package for the Windows agent from Checkmk Raw does not contain a Python environment. However, a corresponding cabinet file is already available on your Checkmk server. You can find this file called python-3.cab in the directory ~share/check_mk/agents/windows or in Checkmk via Setup > Agents > Windows > Windows Agent. Copy this file to your Windows host in the directory C:\Program Files (x86)\checkmk\service\install. There is already a file with this name and a file size of 0 byte. You must overwrite this file with the version from the Checkmk server. Then restart the Checkmk agent service. In Windows PowerShell with administrator rights, you can do this with the following command:

net stop checkmkservice; net start checkmkservice

Once the Windows service has been restarted, the virtual Python environment will have been automatically installed.

Installing a complete Python

Alternatively, you can also download and install a current Python package from python.org. Make sure to activate the following options during installation:

  • Install Python 3.x for all users. This will also automatically activate the Precompile standard library option, which is a good thing.

  • Add Python to environment variables

If you want to start testing immediately after installing Python, it is essential to restart the checkmkservice either via the Windows Task Manager or with the commands specified above, otherwise the service will not know about the new environment variables.

3. Monitoring log files

3.1. Installation on the host

Start by installing the agent plug-in. To do this, copy the file ~/share/check_mk/agents/plugins/mk_logwatch.py from your Checkmk server to the host in the directory /usr/lib/check_mk_agent/plugins/ (Linux) or C:\ProgramData\checkmk\agent\plugins (Windows). Make sure that the file is executable on the host. Further information on this step can be found in the section 'Manual installation' in the articles Monitor Linux and Monitor Windows.

CEE Users of our commercial editions can select Text logfiles (Linux, Solaris, Windows) during Configuration of the rule Deploy the Logwatch plugin and its configuration to automatically deploy the agent plug-in with the agent.

3.2. Configuring logwatch

In line with the initial considerations, logwatch will not monitor anything without being configured. Therefore, after installing the agent plug-in, it is essential to create a configuration file for the host to be monitored.

Configuration via the Agent Bakery

CEE In the commercial editions, first call up the rule for the agent plug-in Setup > Agents > Windows, Linux, Solaris, AIX > Agent rules > Text logfiles (Linux, Solaris, Windows). The default setting Deploy the Logwatch plugin and its configuration should normally be left as it is. However, if you want or need to transfer the configuration file logwatch.cfg to the host in a different way, the Deploy the Logwatch plugin without configuration option is still available here.

Continue with the option Retention period. The default setting here is one minute, which also corresponds to the preset check interval in Checkmk. This value should always be at least equal to the check interval. This option is primarily responsible for ensuring that no log messages are lost due to a service detection or the manual execution of cmk -d myhost. Further details can be found in the inline help for the option and in the Werk #14451 with which this option was introduced.

Now comes the section of the rule where things really get going — Configure a logfile section. We will start directly with the biggest stumbling block of recent years. In the Patterns for logfiles to monitor field, you will need to specify the log files that you want to monitor. You can do this individually and explicitly or with so-called glob patterns (glob for short). We are using the Python module glob here, for which there is a detailed documentation on docs.python.org. However, we would like to provide you with a few helpful examples right here.

For example, if you enter /var/log/my.log here, logwatch will monitor just this single log file. If you instead enter /var/log/*log here, logwatch will monitor all files that end with the character string log and which are located directly in the /var/log directory. If you want to monitor log files in all direct subdirectories of /var/, you can do this with the following glob, for example: /var/*/*log. We explicitly do not offer the glob ** for recursively searching a directory structure here, because we would end up with far too large a hit list far too quickly and leave the actual purpose of logwatch behind us.

The following table gives you a few more helpful examples of how you can use globs to actually monitor the files that require monitoring without having to specify them all individually:

Glob Pattern Explanation Example

/var/log/*

All files in /var/log.

/var/log/mylog /var/log/my.log

/var/log/*/*

All files in all direct subdirectories of /var/log/.

/var/log/foo/mylog /var/log/bar/mylog

/var/log/mylog?.log

All files in /var/log where the name begins with mylog, followed by a singe character and ending in .log.

/var/log/mylog1.log /var/log/mylog9.log

/var/log/mylog[123].log

All files in /var/log where the name begins with mylog, followed by either a 1, 2 or 3 and ending in .log.

/var/log/mylog1.log /var/log/mylog3.log

So when it comes to which files are 'matched' in the first step, we use no regular expressions and this may be enough for you to reach all the files you want.

However, if you now need to filter further, you can use the Regular expression for logfile filtering option to apply regular expressions to the hits from step 1 in a second step.

If you have collected all files /var/log/ and its direct subdirectories in the first step with /var/log/ and /var/log/*/*, you could use the regular expression error.log$|err$ to reduce the hit list to all files that end with err.log or err. Caution: The 'dot' (.) is again now an arbitrary character. This could, for example, leave the files /var/log/apache2/error.log, /var/log/mail.err and /var/log/cups/error_log.

As you can see, we have already provided you with two good and powerful tools for selecting the files to be monitored, so that logwatch can also check the other parameters and contents very quickly in the next step using a manageable file list. You can deepen your knowledge of the latter in the Regular expressions in Checkmk article.

With the Restrict the length of the lines option you can instruct logwatch to truncate excessively long lines after the specified number of characters.

The following option Watch the total size of the log file is useful for recognizing a defective log rotation. If you set 100 MiB here, you will receive a warning each time a particular log file has exceeded the specified size again.

The maximum number of lines that logwatch checks per run and file can be restricted with Restrict number of processed messages per cycle and with Restrict runtime of logfile parsing you can ensure that logwatch does not spend too long on a single file that may have been flooded with thousands and thousands of new entries since the last check.

If you activate one of the latter two options, you must also specify what should happen if the specified limit is exceeded. With our default setting, the associated service becomes critical and you receive a message that lines have been skipped or that the maximum runtime has been exceeded.

Handling of context messages is an option with which the volume of transferred data can become very large very quickly. So think carefully about whether only the log message that you think should generate a CRIT or WARN is important to you, or whether all lines that have been added since the last run of logwatch should be transferred to the Checkmk server. For small log files that only grow by a few lines every minute, the setting Do transfer context is certainly unproblematic. However, if 50 log files are monitored on a host, which suddenly contain 100,000 new lines with a length of 500 characters, we are already in the gigabyte range. In such an event, it may be enough to see that a large number of new messages have been added since the last check in order to initiate a check directly on the host concerned.

If you do need the context — i.e. the lines before and after the log message that is important to you — you can limit this to a certain number of lines before and after with the option Limit the amount of context data sent to the monitoring server.

With Limit the amount of data sent to the monitoring server you can limit the size of the transferred data in general.

Process new logfiles from the beginning is switched off by default. This sometimes leads to astonishment, because logwatch does not 'recognize' problems that are in log files and passes these on to the Checkmk server. In our opinion, nothing is older than yesterday’s newspaper and so are the log messages that were already in a log file before the first run of logwatch. During this very first run, logwatch does nothing more than note how many lines are already contained in the respective log. Only during the second run are the files checked for their content — i.e. the newly added lines.

Logwatch relies on actually being able to read the log files. Under the hood, logwatch goes to great lengths to recognize the coding of each log file. However, character encodings that are too exotic can lead to problems. If you can specify the character encoding of the monitored log files, UTF-8 is a very good choice. If this is not possible and logwatch does not manage to find out the encoding, you can make an explicit specification with Codec that should be used to decode the matching files.

With Duplicated messages management, if you activate this option, you can again save a bit of bandwidth, and the subsequent output in Checkmk will also be more readable. If you activate Filter out consecutive duplicated messages in the agent output, logwatch counts how often a line was repeated and writes this accordingly in the output instead of repeating the lines.

Finally, the lines in the log files that are of interest to you are now described using a regular expression, and assigned a state. If you want every line containing the word panic to lead to a CRIT in Checkmk, it is sufficient to enter panic in the Pattern(Regex) field after clicking on Add message pattern below Regular expressions for message classification. The functions of the other options offered are already described in great detail in the inline help at this point and are not duplicated here.

One point to note: The OK state may seem confusing at first glance. This is used to first transfer lines from a log file to the Checkmk server in order to then carry out the final classification. This brings us to an important point that shows how flexible logwatch can be when used correctly.

All of the options explained in this section become entries in the configuration file already mentioned, which is stored on the respective host. If you now want to make changes to the classification of certain messages, you may first have to edit the rule, then bake the agent and install it.

Alternatively, you can first transfer all of the interesting messages to the Checkmk server (for example with the OK state), and then on the Checkmk server (re-)classify them with the Logfile patterns rule. In this way you can save yourself the trouble of baking and rolling out the new agent, and after customizing the above-mentioned rule accordingly, will need to — onetime only — quickly activate the changes. You can find out exactly how to do this below in the chapter Reclassifying with log file patterns.

Manual configuration

CRE In the CRE Checkmk Raw you configure the agent plug-in as usual via a text file. As a rule, logwatch searches for a file called logwatch.cfg in the directories /etc/check_mk (Linux) or c:\ProgramData\checkmk\agent\config\ (Windows). An (almost) minimal configuration could look like this:

/etc/check_mk/logwatch.cfg
"/var/log/my.log" overflow=C nocontext=True
 C a critical message
 W something that should only trigger a warning

First, always enter a glob pattern here, followed by all the options to be applied. This is followed — with an indentation of one space — by a letter representing the desired state or function, and finally a regular expression that is compared with each line of the log file. With the above configuration, all new lines that have been added to the file /var/log/my.log since the last run of logwatch would be checked for the two patterns, a critical message and something that should only trigger a warning.

You can find a very comprehensive example configuration applicable for a site user in the ~/share/check_mk/agents/cfg_examples/logwatch.cfg file.

As all of the options that you can specify in such a configuration file have already been explained in the section Configuration via the Agent Bakery, only a list and brief description follows here. Refer to the above section for a detailed explanation.

Option in logwatch.cfg Counterpart Example Remark

regex

Regular expression for logfile filtering

regex='error.log$|err$'

encoding

Codec that should be used to decode the matching files

encoding=utf-8

maxlines

Restrict number of processed messages per cycle

maxlines=500

maxtime

Restrict runtime of logfile parsing

maxtime=23

overflow

In the case of an overflow

overflow=W

maxlinesize

Restrict the length of the lines

maxlinesize=123

maxoutputsize

Limit the amount of data sent to the monitoring server

maxoutputsize=10485760

Size given in bytes

skipconsecutiveduplicated

Duplicated messages management

skipconsecutiveduplicated=True

nocontext

Handling of context messages

nocontext=True

maxcontextlines

Limit the amount of context data sent to the monitoring server

maxcontextlines=55,66

4. Grouping of log files

The check belonging to the logwatch agent plug-in normally creates a separate service for each log file. By defining groupings using the Logfile Grouping rule, you can switch to the logwatch_groups check.

Further information will be added soon. Until then, consult the inline help for the Logfile Grouping rule.

5. Reclassifying with log file patterns

This section will be added soon. Until then, consult the inline help of the rule Logfile patterns.

6. Forwarding to the Event Console

In addition to the direct processing of log messages in Checkmk and a possible reclassification with the Logfile patterns rule, there is also the option of forwarding log lines obtained by logwatch to the Event Console. This is done using the Logwatch Event Console Forwarding rule, and is described in the The Event Console article.

7. Logwatch in monitoring

In monitoring, the display differs depending on the check plug-in used.

If you use either logwatch or logwatch_groups, you will find - after the necessary service detection - one service per log file or per grouping of log files (see Grouping of log files) that begins with Log. This is followed by the full path of the file or the group name.

If you forward your log messages to the Event Console, you will see one service per forwarding, depending on the setting of the rule logwatch Event Console Forwarding, which informs you about the number of forwarded log messages. In the case of bundled forwarding by the logwatch_ec plugin, this service is called Log Forwarding. If you use the Separate check option and thus the logwatch_ec_single plugin, the service name also starts with Log followed by the path of the log file. This service also informs you of the number of messages forwarded and if a log file cannot be read.

8. Files and directories

All paths for the Checkmk server are specified relative to the instance directory (e.g. /omd/sites/mysite).

Location Path Meaning

Checkmk server

~/share/check_mk/agents/cfg_examples/logwatch.cfg

example configuration file

Checkmk server

~/share/check_mk/agents/plugins/mk_logwatch.py

Python 3 agent plugin including explanations

Checkmk server

~/share/check_mk/agents/plugins/mk_logwatch_2.py

Python 2 agent plugin including explanations

Linux host

/etc/check_mk/logwatch.cfg

Configuration file - created by the Agent Bakery or manually

Linux host

/var/lib/check_mk_agent/logwatch.state.*

State files of mk_logwatch

Linux host

/var/lib/check_mk_agent/logwatch-batches/*

Location of the individual batches that mk_logwatch creates per query

Windows host

c:\ProgramData\checkmk\agent\config\logwatch.cfg

Configuration file - created by the Agent Bakery or manually

Windows host

c:\ProgramData\checkmk\agent\state

Storage location for the state files of mk_logwatch

Windows host

c:\ProgramData\checkmk\agent\state\logwatch-batches

Storage location of the individual batches that mk_logwatch creates per query

On this page