Checkmk
to checkmk.com
Tip

This article is currently under construction and is being expanded on a regular basis. Additionally, this is a machine translation based on the German version of the article.

1. Essentials for monitoring log files

The history of log file monitoring is a history full of misunderstandings. The misunderstandings already begin when we look at what log entries are and what, on the other hand, services display in Checkmk. Lines or entries in log files are "by nature" event based. Checkmk, on the other hand, displays states. Read more about the difference between events and states in the article Basic principles of monitoring with Checkmk - States and events.

In Checkmk we circumvent this problem by defining when a service that maps one or more log files assumes a critical state. As a rule, we define "become critical when the log file contains messages that are

  • new,

  • not acknowledged and

  • critical".

You should also use moderation when using Logwatch. Logwatch is suitable for metered use and not for processing gigabytes or terabytes of log files. There are certainly more suitable tools for this. We strongly recommend using Logwatch only on an ad hoc basis and not across the board. As you will see later in the article, it is easy to carry out an important part of the prefiltering on the monitored host.

2. Prerequisites

Logwatch is a Python program and therefore requires a Python environment on the host. Python will already be installed in most Linux distributions and Solaris has also included Python 3.x for some time. If you want to monitor log files on a Windows host, there are different ways to achieve this.

CEE Users of our commercial editions can configure Logwatch conveniently via the GUI and have the plugin inserted into the agent package with the agent bakery. As soon as Checkmk notices that you are configuring an agent plugin based on Python for a Windows host, the agent is also given a virtual Python environment (venv).

If you are using one of our commercial editions but not the agent bakery, you can consult the following section for your Windows hosts.

2.1. Python for Windows in Checkmk Raw

Install Checkmk Python (venv)

The installation package of the Windows agent from Checkmk Raw does not contain a Python environment. However, a corresponding cabinet file is already available on your Checkmk server. You can find this file called python-3.cab in the directory ~share/check_mk/agents/windows or in Checkmk via Setup > Agents > Windows > Windows Agent. Copy this file to your Windows host in the directory C:\Program Files (x86)\checkmk\service\install. There is already a file with this name and a file size of 0 bytes. You must overwrite this file with the version from the Checkmk server. Then restart the Checkmk agent service. In Windows PowerShell with administrator rights, you can do this with the following command:

net stop checkmkservice; net start checkmkservice

When the Windows service is restarted, the virtual Python environment is installed automatically.

Install Python completely

Alternatively, you can also download and install a current Python package from python.org. Make sure to activate the following options during installation:

  • Install Python 3.x for all users. This will also automatically activate the Precompile standard library option, which is a good thing.

  • Add Python to environment variables

If you want to start testing immediately after installing Python, it is essential to restart the CheckmkService either via the Windows Task Manager or with the commands specified above. Otherwise the service will not know about the new environment variables.

3. Monitor log files

3.1. Installation on the host

Start by installing the agent plugin. To do this, copy the file ~/share/check_mk/agents/plugins/mk_logwatch.py from your Checkmk server to the host in the directory /usr/lib/check_mk_agent/plugins/ (Linux) or C:\ProgramData\checkmk\agent\plugins (Windows). Make sure that the file is executable on the host. Further information on this step can be found in the section "Manual installation" in the articles Monitor Linux and Monitor Windows.

CEE Users of our commercial editions can select Text logfiles (Linux, Solaris, Windows) during Configuration of the rule Deploy the Logwatch plugin and its configuration to automatically deploy the agent plugin with the agent.

3.2. Configuration of Logwatch

In line with the initial considerations, Logwatch does not monitor anything without configuration. Therefore, after installing the agent plugin, it is essential to create a configuration file for the monitored host.

Configuration via the Agent Bakery

CEE In the commercial editions, first call up the rule for the agent plugin Setup > Agents > Windows, Linux, Solaris, AIX > Agent rules > Text logfiles (Linux, Solaris, Windows). The default setting Deploy the Logwatch plugin and its configuration should normally be left as it is. However, if you want or need to transfer the configuration file logwatch.cfg to the host in a different way, the option Deploy the Logwatch plugin without configuration is still available here.

Continue with the option Retention period. The default setting here is one minute, which also corresponds to the preset check interval in Checkmk. The value should always be at least equal to the check interval. This option is primarily responsible for ensuring that no log messages are lost due to a service detection or the manual execution of cmk -d myhost. Further details can be found in the inline help for the option and in the Werk #14451 with which this option was introduced.

Now comes the section of the rule where things finally get going - Configure a logfile section. And we start directly with the biggest stumbling block of recent years. In the Patterns for logfiles to monitor field, you have to name the log files that you want to monitor. You can do this individually and explicitly or with so-called glob patterns (glob for short). We are using the Python module glob here, for which there is a detailed documentation on docs.python.org. However, we would like to provide you with a few helpful examples right here.

For example, if you enter /var/log/my.log here, Logwatch will monitor just this one log file. If you enter /var/log/*log here instead, Logwatch will monitor all files that end with the character string log and are located directly in the directory /var/log. If you want to monitor log files in all direct subdirectories of /var/, you can do this with the following glob, for example: /var/*/*log. We explicitly do not offer the glob ** for recursively searching a directory structure here, because we would end up with far too large a hit list far too quickly and leave the actual purpose of Logwatch behind us.

The following table gives you a few more helpful examples of how you can use globs to actually monitor the files that require monitoring without having to specify them all individually:

Glob Pattern

Explanation

Example

/var/log/*

All files in /var/log

/var/log/mylog /var/log/my.log

/var/log/*/*

All files in all direct subdirectories of /var/log/

/var/log/foo/mylog /var/log/bar/mylog

/var/log/mylog?.log

All files in /var/log where the name begins with mylog, followed by a singe character and ending in .log

/var/log/mylog1.log /var/log/mylog9.log

/var/log/mylog[123].log

All files in /var/log where the name begins with mylog, followed by either a 1, 2 or 3 and ending in .log

/var/log/mylog1.log /var/log/mylog3.log

So when it comes to which files are "matched" in the first step, we use no regular expressions and this may be enough for you to reach all the files you want.

However, if you now need to filter further, you can use the option Regular expression for logfile filtering to apply regular expressions to the hits from step 1 in a second step.

If you collected all files /var/log/ and its direct subdirectories in the first step with /var/log/ and /var/log/*/*, you could use the regular expression error.log$|err$ to reduce the hit list to all files that end with err.log or err. Caution: The dot is now an arbitrary character again. This could, for example, leave the files /var/log/apache2/error.log, /var/log/mail.err and /var/log/cups/error_log.

As you can see, we have already provided you with two good and powerful tools for selecting the monitored files, so that Logwatch can also check the other parameters and contents very quickly in the next step using a manageable file list. You can deepen your knowledge of the latter in the article regular expressions in Checkmk.

With the option Restrict the length of the lines you can instruct Logwatch to cut off excessively long lines after the specified number of characters.

The following option Watch the total size of the log file is useful for recognizing a defective log rotation. If you set 100 MiB here, you will receive a warning each time a particular log file has grown by the set size again.

The maximum number of lines that Logwatch checks per run and file can be restricted with Restrict number of processed messages per cycle and with Restrict runtime of logfile parsing you can ensure that Logwatch does not spend too long on a single file that may have been flooded with thousands and thousands of new entries since the last check.

If you activate one of the latter two options, you must also specify what should happen if the specified limit is exceeded. With our default setting, the associated service becomes critical and you receive a message that lines have been skipped or that the maximum runtime has been exceeded.

Handling of context messages is an option with which the amount of transferred data can become very large very quickly. So think carefully about whether only the log message that you think should generate a CRIT or WARN is important to you, or whether all lines that have been added since the last run of Logwatch should be transferred to the Checkmk server. For small log files that only grow by a few lines every minute, the setting Do transfer context is certainly unproblematic. However, if 50 log files are monitored on a host, which suddenly contain 100,000 new lines with a length of 500 characters, we are already in the gigabyte range. In such an event, it may be enough to see that a large number of new messages have been added since the last check in order to initiate a check directly on the host concerned.

If you do need the context - i.e. the lines before and after the log message that is important to you - you can limit this to a certain amount of lines before and after with the options Limit the amount of context data sent to the monitoring server.

With Limit the amount of data sent to the monitoring server you can limit the size of the transferred data in general.

Process new logfiles from the beginning is switched off by default. This sometimes leads to astonishment, because Logwatch does not "recognize" problems that are in log files and passes them on to the Checkmk server. In our opinion, nothing is older than yesterday’s newspaper and so are the log messages that were already in a log file before the first run of Logwatch. During this very first run, Logwatch does nothing more than note how many lines are already contained in the respective log. Only during the second run are the files checked for their content - i.e. the newly added lines.

Logwatch relies on actually being able to read the log files. Under the hood, Logwatch goes to great lengths to recognize the coding of each log file. However, character encoding that are too exotic can lead to problems. If you can influence the character encoding of the monitored log files, UTF-8 is a very good choice. If this is not possible and Logwatch does not manage to find out the encoding, you can make an explicit specification with Codec that should be used to decode the matching files.

With Duplicated messages management you can save a bit of bandwidth again and the subsequent output in Checkmk will also be more readable if you activate this option. If you activate Filter out consecutive duplicated messages in the agent output, Logwatch counts how often a line was repeated and writes this accordingly in the output instead of repeating the lines.

Finally, the lines in the log files that are of interest to you are now described using a regular expression and assigned a status. If you want every line containing the word panic to lead to a CRIT in Checkmk, it is sufficient to enter panic in the Pattern(Regex) field after clicking on Add message pattern below Regular expressions for message classification. The functions of the other options offered are already described in great detail in the inline help at this point and are not duplicated here.

Just this much: The status OK may seem confusing at first glance. It is used to first transfer lines from a log file to the Checkmk server in order to then carry out the final classification. And this brings us to an important point that shows how flexible Logwatch can be when used correctly.

All the options explained in this section become entries in the configuration file already mentioned, which is stored on the respective host. If you now want to make changes to the classification of certain messages, you may first have to edit the rule, then bake the agent and install it.

Alternatively, you can first transfer all interesting messages to the Checkmk server (for example with the status OK) and then (re-)classify them with the rule Logfile patterns on the Checkmk server. In this way, you can save yourself the trouble of baking and rolling out the new agent and only have to quickly activate the changes once after adjusting the above-mentioned rule accordingly. You can find out exactly how to do this below in the chapter Reclassify with logfile patterns.

Manual configuration

CRE In the CRE Checkmk Raw Edition you configure the agent plugin as usual via a text file. As a rule, Logwatch searches for a file called logwatch.cfg in the directories /etc/check_mk (Linux) or c:\ProgramData\checkmk\agent\config\ (Windows). An (almost) minimal configuration could look like this:

/etc/check_mk/logwatch.cfg
"/var/log/my.log" overflow=C nocontext=True
 C a critical message
 W something that should only trigger a warning

First, always enter a glob pattern here, followed by all the options to be applied. This is followed - with an indentation of one space - by a letter representing the desired status or function and finally a regular expression that is compared with each line of the log file. With the above configuration, all new lines that have been added to the file /var/log/my.log since the last run of Logwatch would be checked for the two patterns "a critical message" and "something that should only trigger a warning".

You can find a very extensive example configuration as an instance user in the file ~/share/check_mk/agents/cfg_examples/logwatch.cfg.

As all the options that you can specify in such a configuration file have already been explained in the section Configuration via agent bakery, only a list and reference follows here. Please refer to the above section for an explanation.

option in logwatch.cfg

counterpart

example

remark

regex

Regular expression for logfile filtering

regex='error.log$|err$'

encoding

Codec that should be used to decode the matching files

encodig=utf-8

maxlines

Restrict number of processed messages per cycle

maxlines=500

maxtime

Restrict runtime of logfile parsing

maxtime=23

overflow

In case of an overflow

overflow=W

maxlinesize

Restrict the length of the lines

maxlinesize=123

maxoutputsize

Limit the amount of data sent to the monitoring server

maxoutputsize=10485760

Size given in byte

skipconsecutiveduplicated

Duplicated messages management

skipconsecutiveduplicated=True

nocontext

Handling of context messages

nocontext=True

maxcontextlines

Limit the amount of context data sent to the monitoring server

maxcontextlines=55,66

4. Grouping of log files

The check belonging to the agent plug-in called logwatch normally creates a separate service for each log file. By defining groupings using the Logfile Grouping rule, you can switch to the logwatch_groups check.

Further information will follow soon. Until then, please consult the inline help of the rule Logfile Grouping.

5. Reclassify with logfile patterns

This section will follow soon. Until then, please consult the inline help of the rule Logfile patterns.

6. Forwarding to the Event Console

In addition to the direct processing of log messages in Checkmk and a possible reclassification with the rule Logfile patterns, there is also the option of forwarding log lines obtained by Logwatch to the Event Console. This is done using the rule Logwatch Event Console Forwarding and is described in the article The Event Console.

7. Logwatch in monitoring

In monitoring, the display differs depending on the check plug-in used.

If you use either logwatch or logwatch_groups, you will find - after the necessary service detection - one service per log file or per grouping of log files (see Grouping of log files) that begins with Log. This is followed by the full path of the file or the group name.

If you forward your log messages to the Event Console, you will see one service per forwarding, depending on the setting of the rule Logwatch Event Console Forwarding, which informs you about the number of forwarded log messages. In the case of bundled forwarding by the logwatch_ec plugin, this service is called Log Forwarding. If you use the Seperate check option and thus the logwatch_ec_single plugin, the service name also starts with Log followed by the path of the log file. This service also informs you of the number of messages forwarded and if a log file cannot be read.

8. Files and directories

All paths for the Checkmk server are specified relative to the instance directory (e.g. /omd/sites/mysite).

location path meaning

Checkmk server

~/share/check_mk/agents/cfg_examples/logwatch.cfg

example configuration file

Checkmk server

~/share/check_mk/agents/plugins/mk_logwatch.py

Python 3 agent plugin including explanations

Checkmk server

~/share/check_mk/agents/plugins/mk_logwatch_2.py

Python 2 agent plugin including explanations

Linux host

/etc/check_mk/logwatch.cfg

Configuration file - created by the agent bakery or manually

Linux host

/var/lib/check_mk_agent/logwatch.state.*

State files of mk_logwatch

Linux host

/var/lib/check_mk_agent/logwatch-batches/*

Location of the individual batches that mk_logwatch creates per query

Windows host

c:\ProgramData\checkmk\agent\config\logwatch.cfg

Configuration file - created by the agent bakery or manually

Windows host

c:\ProgramData\checkmk\agent\state

Storage location for the state files of mk_logwatch

Windows host

c:\ProgramData\checkmk\agent\state\logwatch-batches

Storage location of the individual batches that mk_logwatch creates per query

On this page