This article is currently under construction and is being expanded on a regular basis. |
1. Essentials for monitoring log files
The history of log file monitoring is a history full of misunderstandings. The misunderstandings already begin when we look at what log entries are and what, on the other hand, the services running under Checkmk display. Lines or entries in log files are 'by nature' event based. Checkmk, on the other hand, displays states. Read more about the difference between events and states in the Basic principles of monitoring with Checkmk article.
In Checkmk we circumvent this problem by defining when a service that maps one or more log files assumes a critical state. As a rule, we define 'become critical' when the log file contains messages that are
new,
not acknowledged, and
critical.
You should also use moderation when using logwatch. Logwatch is suitable for limited use and not for processing gigabytes or terabytes of log files. There are certainly more suitable tools for this. We strongly recommend using logwatch only on an ad hoc basis and not routinely. As you will see later in this article, it is easy to carry out an major part of the pre-filtering on the monitored host.
2. Prerequisites
Logwatch is a Python program and therefore requires a Python environment on the host. Python will already be installed in most Linux distributions and Solaris has also included Python 3.x for some time. If you want to monitor log files on a Windows host, there are various ways of achieving this.
Users of our commercial editions can configure logwatch conveniently via the GUI and, using the Agent Bakery, have the plug-in inserted into the agent package.
As soon as Checkmk notices that you are configuring a Python-based agent plug-in for a Windows host, the agent will also be given a virtual Python environment (venv
).
If you are using one of our commercial editions but not the Agent Bakery, consult the following section for your Windows hosts.
2.1. Python for Windows in Checkmk Raw
Installing Checkmk Python (venv
)
The installation package for the Windows agent from Checkmk Raw does not contain a Python environment.
However, a corresponding cabinet file is already available on your Checkmk server.
You can find this file called python-3.cab
in the directory ~share/check_mk/agents/windows
or in Checkmk via Setup > Agents > Windows > Windows Agent.
Copy this file to your Windows host in the directory C:\Program Files (x86)\checkmk\service\install
.
There is already a file with this name and a file size of 0 byte.
You must overwrite this file with the version from the Checkmk server.
Then restart the Checkmk agent service.
In Windows PowerShell with administrator rights, you can do this with the following command:
net stop checkmkservice; net start checkmkservice
Once the Windows service has been restarted, the virtual Python environment will have been automatically installed.
Installing a complete Python
Alternatively, you can also download and install a current Python package from python.org. Make sure to activate the following options during installation:
Install Python 3.x for all users. This will also automatically activate the Precompile standard library option, which is a good thing.
Add Python to environment variables
If you want to start testing immediately after installing Python, it is essential to restart the checkmkservice
either via the Windows Task Manager or with the commands specified above, otherwise the service will not know about the new environment variables.
3. Monitoring log files
3.1. Installation on the host
Start by installing the agent plug-in.
To do this, copy the file ~/share/check_mk/agents/plugins/mk_logwatch.py
from your Checkmk server to the host in the directory /usr/lib/check_mk_agent/plugins/
(Linux) or C:\ProgramData\checkmk\agent\plugins
(Windows).
Make sure that the file is executable on the host.
Further information on this step can be found in the section 'Manual installation' in the articles Monitor Linux and Monitor Windows.
Users of our commercial editions can select Text logfiles (Linux, Solaris, Windows) during Configuration of the rule Deploy the Logwatch plugin and its configuration to automatically deploy the agent plug-in with the agent.
3.2. Configuring logwatch
In line with the initial considerations, logwatch will not monitor anything without being configured. Therefore, after installing the agent plug-in, it is essential to create a configuration file for the host to be monitored.
Configuration via the Agent Bakery
In the commercial editions, first call up the rule for the agent plug-in Setup > Agents > Windows, Linux, Solaris, AIX > Agent rules > Text logfiles (Linux, Solaris, Windows).
The default setting Deploy the Logwatch plugin and its configuration should normally be left as it is.
However, if you want or need to transfer the configuration file logwatch.cfg
to the host in a different way, the Deploy the Logwatch plugin without configuration option is still available here.
Continue with the option Retention period.
The default setting here is one minute, which also corresponds to the preset check interval in Checkmk.
This value should always be at least equal to the check interval.
This option is primarily responsible for ensuring that no log messages are lost due to a service detection or the manual execution of cmk -d myhost
.
Further details can be found in the inline help for the option and in the Werk #14451 with which this option was introduced.
Now comes the section of the rule where things really get going — Configure a logfile section.
We will start directly with the biggest stumbling block of recent years.
In the Patterns for logfiles to monitor field, you will need to specify the log files that you want to monitor.
You can do this individually and explicitly or with so-called glob patterns (glob for short).
We are using the Python module glob
here, for which there is a detailed documentation on docs.python.org.
However, we would like to provide you with a few helpful examples right here.
For example, if you enter /var/log/my.log
here, logwatch will monitor just this single log file.
If you instead enter /var/log/*log
here, logwatch will monitor all files that end with the character string log
and which are located directly in the /var/log
directory.
If you want to monitor log files in all direct subdirectories of /var/
, you can do this with the following glob, for example: /var/*/*log
.
We explicitly do not offer the glob **
for recursively searching a directory structure here, because we would end up with far too large a hit list far too quickly and leave the actual purpose of logwatch behind us.
The following table gives you a few more helpful examples of how you can use globs to actually monitor the files that require monitoring without having to specify them all individually:
Glob Pattern | Explanation | Example |
---|---|---|
|
All files in |
|
|
All files in all direct subdirectories of |
|
|
All files in |
|
|
All files in |
|
So when it comes to which files are 'matched' in the first step, we use no regular expressions and this may be enough for you to reach all the files you want.
However, if you now need to filter further, you can use the Regular expression for logfile filtering option to apply regular expressions to the hits from step 1 in a second step.
If you have collected all files /var/log/
and its direct subdirectories in the first step with /var/log/
and /var/log/*/*
, you could use the regular expression error.log$|err$
to reduce the hit list to all files that end with err.log
or err
.
Caution: The 'dot' (.) is again now an arbitrary character.
This could, for example, leave the files /var/log/apache2/error.log
, /var/log/mail.err
and /var/log/cups/error_log
.
As you can see, we have already provided you with two good and powerful tools for selecting the files to be monitored, so that logwatch can also check the other parameters and contents very quickly in the next step using a manageable file list. You can deepen your knowledge of the latter in the Regular expressions in Checkmk article.
With the Restrict the length of the lines option you can instruct logwatch to truncate excessively long lines after the specified number of characters.
The following option Watch the total size of the log file is useful for recognizing a defective log rotation. If you set 100 MiB here, you will receive a warning each time a particular log file has exceeded the specified size again.
The maximum number of lines that logwatch checks per run and file can be restricted with Restrict number of processed messages per cycle and with Restrict runtime of logfile parsing you can ensure that logwatch does not spend too long on a single file that may have been flooded with thousands and thousands of new entries since the last check.
If you activate one of the latter two options, you must also specify what should happen if the specified limit is exceeded. With our default setting, the associated service becomes critical and you receive a message that lines have been skipped or that the maximum runtime has been exceeded.
Handling of context messages is an option with which the volume of transferred data can become very large very quickly. So think carefully about whether only the log message that you think should generate a CRIT or WARN is important to you, or whether all lines that have been added since the last run of logwatch should be transferred to the Checkmk server. For small log files that only grow by a few lines every minute, the setting Do transfer context is certainly unproblematic. However, if 50 log files are monitored on a host, which suddenly contain 100,000 new lines with a length of 500 characters, we are already in the gigabyte range. In such an event, it may be enough to see that a large number of new messages have been added since the last check in order to initiate a check directly on the host concerned.
If you do need the context — i.e. the lines before and after the log message that is important to you — you can limit this to a certain number of lines before and after with the option Limit the amount of context data sent to the monitoring server.
With Limit the amount of data sent to the monitoring server you can limit the size of the transferred data in general.
Process new logfiles from the beginning is switched off by default. This sometimes leads to astonishment, because logwatch does not 'recognize' problems that are in log files and passes these on to the Checkmk server. In our opinion, nothing is older than yesterday’s newspaper and so are the log messages that were already in a log file before the first run of logwatch. During this very first run, logwatch does nothing more than note how many lines are already contained in the respective log. Only during the second run are the files checked for their content — i.e. the newly added lines.
Logwatch relies on actually being able to read the log files. Under the hood, logwatch goes to great lengths to recognize the coding of each log file. However, character encodings that are too exotic can lead to problems. If you can specify the character encoding of the monitored log files, UTF-8 is a very good choice. If this is not possible and logwatch does not manage to find out the encoding, you can make an explicit specification with Codec that should be used to decode the matching files.
With Duplicated messages management, if you activate this option, you can again save a bit of bandwidth, and the subsequent output in Checkmk will also be more readable. If you activate Filter out consecutive duplicated messages in the agent output, logwatch counts how often a line was repeated and writes this accordingly in the output instead of repeating the lines.
Finally, the lines in the log files that are of interest to you are now described using a regular expression, and assigned a state.
If you want every line containing the word panic
to lead to a CRIT in Checkmk, it is sufficient to enter panic
in the Pattern(Regex) field after clicking on Add message pattern below Regular expressions for message classification.
The functions of the other options offered are already described in great detail in the inline help at this point and are not duplicated here.
One point to note: The OK state may seem confusing at first glance. This is used to first transfer lines from a log file to the Checkmk server in order to then carry out the final classification. This brings us to an important point that shows how flexible logwatch can be when used correctly.
All of the options explained in this section become entries in the configuration file already mentioned, which is stored on the respective host. If you now want to make changes to the classification of certain messages, you may first have to edit the rule, then bake the agent and install it.
Alternatively, you can first transfer all of the interesting messages to the Checkmk server (for example with the OK state), and then on the Checkmk server (re-)classify them with the Logfile patterns rule. In this way you can save yourself the trouble of baking and rolling out the new agent, and after customizing the above-mentioned rule accordingly, will need to — onetime only — quickly activate the changes. You can find out exactly how to do this below in the chapter Reclassifying with log file patterns.
Manual configuration
In the Checkmk Raw you configure the agent plug-in as usual via a text file.
As a rule, logwatch searches for a file called logwatch.cfg
in the directories /etc/check_mk
(Linux) or c:\ProgramData\checkmk\agent\config\
(Windows).
An (almost) minimal configuration could look like this:
"/var/log/my.log" overflow=C nocontext=True
C a critical message
W something that should only trigger a warning
First, always enter a glob pattern here, followed by all the options to be applied.
This is followed — with an indentation of one space — by a letter representing the desired state or function, and finally a regular expression that is compared with each line of the log file.
With the above configuration, all new lines that have been added to the file /var/log/my.log
since the last run of logwatch would be checked for the two patterns, a critical message
and something that should only trigger a warning
.
You can find a very comprehensive example configuration applicable for a site user in the ~/share/check_mk/agents/cfg_examples/logwatch.cfg
file.
As all of the options that you can specify in such a configuration file have already been explained in the section Configuration via the Agent Bakery, only a list and brief description follows here. Refer to the above section for a detailed explanation.
Option in logwatch.cfg
|
Counterpart | Example | Remark |
---|---|---|---|
|
Regular expression for logfile filtering |
|
|
|
Codec that should be used to decode the matching files |
|
|
|
Restrict number of processed messages per cycle |
|
|
|
Restrict runtime of logfile parsing |
|
|
|
In the case of an overflow |
|
|
|
Restrict the length of the lines |
|
|
|
Limit the amount of data sent to the monitoring server |
|
Size given in bytes |
|
Duplicated messages management |
|
|
|
Handling of context messages |
|
|
|
Limit the amount of context data sent to the monitoring server |
|
4. Grouping of log files
The check belonging to the logwatch
agent plug-in normally creates a separate service for each log file.
By defining groupings using the Logfile Grouping rule, you can switch to the logwatch_groups
check.
Further information will be added soon. Until then, consult the inline help for the Logfile Grouping rule.
5. Reclassifying with log file patterns
This section will be added soon. Until then, consult the inline help of the rule Logfile patterns.
6. Forwarding to the Event Console
In addition to the direct processing of log messages in Checkmk and a possible reclassification with the Logfile patterns rule, there is also the option of forwarding log lines obtained by logwatch to the Event Console. This is done using the Logwatch Event Console Forwarding rule, and is described in the The Event Console article.
7. Logwatch in monitoring
In monitoring, the display differs depending on the check plug-in used.
If you use either logwatch
or logwatch_groups
, you will find - after the necessary service detection - one service per log file or per grouping of log files (see Grouping of log files) that begins with Log
.
This is followed by the full path of the file or the group name.
If you forward your log messages to the Event Console, you will see one service per forwarding, depending on the setting of the rule logwatch Event Console Forwarding, which informs you about the number of forwarded log messages.
In the case of bundled forwarding by the logwatch_ec
plugin, this service is called Log Forwarding.
If you use the Separate check option and thus the logwatch_ec_single
plugin, the service name also starts with Log
followed by the path of the log file.
This service also informs you of the number of messages forwarded and if a log file cannot be read.
8. Files and directories
All paths for the Checkmk server are specified relative to the instance directory (e.g. /omd/sites/mysite
).
Location | Path | Meaning |
---|---|---|
Checkmk server |
|
example configuration file |
Checkmk server |
|
Python 3 agent plugin including explanations |
Checkmk server |
|
Python 2 agent plugin including explanations |
Linux host |
|
Configuration file - created by the Agent Bakery or manually |
Linux host |
|
State files of mk_logwatch |
Linux host |
|
Location of the individual batches that mk_logwatch creates per query |
Windows host |
|
Configuration file - created by the Agent Bakery or manually |
Windows host |
|
Storage location for the state files of mk_logwatch |
Windows host |
|
Storage location of the individual batches that mk_logwatch creates per query |