This article is currently under construction and is being expanded on a regular basis. Additionally, this is a machine translation based on the German version of the article. |
1. Essentials for monitoring log files
The history of log file monitoring is a history full of misunderstandings. The misunderstandings already begin when we look at what log entries are and what, on the other hand, services display in Checkmk. Lines or entries in log files are "by nature" event based. Checkmk, on the other hand, displays states. Read more about the difference between events and states in the article Basic principles of monitoring with Checkmk - States and events.
In Checkmk we circumvent this problem by defining when a service that maps one or more log files assumes a critical state. As a rule, we define "become critical when the log file contains messages that are
new,
not acknowledged and
critical".
You should also use moderation when using Logwatch. Logwatch is suitable for metered use and not for processing gigabytes or terabytes of log files. There are certainly more suitable tools for this. We strongly recommend using Logwatch only on an ad hoc basis and not across the board. As you will see later in the article, it is easy to carry out an important part of the prefiltering on the monitored host.
2. Prerequisites
Logwatch is a Python program and therefore requires a Python environment on the host. Python will already be installed in most Linux distributions and Solaris has also included Python 3.x for some time. If you want to monitor log files on a Windows host, there are different ways to achieve this.
Users of our commercial editions can configure Logwatch conveniently via the GUI and have the plugin inserted into the agent package with the agent bakery. As soon as Checkmk notices that you are configuring an agent plugin based on Python for a Windows host, the agent is also given a virtual Python environment (venv).
If you are using one of our commercial editions but not the agent bakery, you can consult the following section for your Windows hosts.
2.1. Python for Windows in Checkmk Raw
Install Checkmk Python (venv)
The installation package of the Windows agent from Checkmk Raw does not contain a Python environment.
However, a corresponding cabinet file is already available on your Checkmk server.
You can find this file called python-3.cab
in the directory ~share/check_mk/agents/windows
or in Checkmk via Setup > Agents > Windows > Windows Agent.
Copy this file to your Windows host in the directory C:\Program Files (x86)\checkmk\service\install
.
There is already a file with this name and a file size of 0 bytes.
You must overwrite this file with the version from the Checkmk server.
Then restart the Checkmk agent service.
In Windows PowerShell with administrator rights, you can do this with the following command:
net stop checkmkservice; net start checkmkservice
When the Windows service is restarted, the virtual Python environment is installed automatically.
Install Python completely
Alternatively, you can also download and install a current Python package from python.org. Make sure to activate the following options during installation:
Install Python 3.x for all users. This will also automatically activate the Precompile standard library option, which is a good thing.
Add Python to environment variables
If you want to start testing immediately after installing Python, it is essential to restart the CheckmkService either via the Windows Task Manager or with the commands specified above. Otherwise the service will not know about the new environment variables.
3. Monitor log files
3.1. Installation on the host
Start by installing the agent plugin.
To do this, copy the file ~/share/check_mk/agents/plugins/mk_logwatch.py
from your Checkmk server to the host in the directory /usr/lib/check_mk_agent/plugins/
(Linux) or C:\ProgramData\checkmk\agent\plugins
(Windows).
Make sure that the file is executable on the host.
Further information on this step can be found in the section "Manual installation" in the articles Monitor Linux and Monitor Windows.
Users of our commercial editions can select Text logfiles (Linux, Solaris, Windows) during Configuration of the rule Deploy the Logwatch plugin and its configuration to automatically deploy the agent plugin with the agent.
3.2. Configuration of Logwatch
In line with the initial considerations, Logwatch does not monitor anything without configuration. Therefore, after installing the agent plugin, it is essential to create a configuration file for the monitored host.
Configuration via the Agent Bakery
In the commercial editions, first call up the rule for the agent plugin Setup > Agents > Windows, Linux, Solaris, AIX > Agent rules > Text logfiles (Linux, Solaris, Windows).
The default setting Deploy the Logwatch plugin and its configuration should normally be left as it is.
However, if you want or need to transfer the configuration file logwatch.cfg
to the host in a different way, the option Deploy the Logwatch plugin without configuration is still available here.
Continue with the option Retention period.
The default setting here is one minute, which also corresponds to the preset check interval in Checkmk.
The value should always be at least equal to the check interval.
This option is primarily responsible for ensuring that no log messages are lost due to a service detection or the manual execution of cmk -d myhost
.
Further details can be found in the inline help for the option and in the Werk #14451 with which this option was introduced.
Now comes the section of the rule where things finally get going - Configure a logfile section.
And we start directly with the biggest stumbling block of recent years.
In the Patterns for logfiles to monitor field, you have to name the log files that you want to monitor.
You can do this individually and explicitly or with so-called glob patterns (glob for short).
We are using the Python module glob
here, for which there is a detailed documentation on docs.python.org.
However, we would like to provide you with a few helpful examples right here.
For example, if you enter /var/log/my.log
here, Logwatch will monitor just this one log file.
If you enter /var/log/*log
here instead, Logwatch will monitor all files that end with the character string log
and are located directly in the directory /var/log
.
If you want to monitor log files in all direct subdirectories of /var/
, you can do this with the following glob, for example: /var/*/*log
.
We explicitly do not offer the glob **
for recursively searching a directory structure here, because we would end up with far too large a hit list far too quickly and leave the actual purpose of Logwatch behind us.
The following table gives you a few more helpful examples of how you can use globs to actually monitor the files that require monitoring without having to specify them all individually:
Glob Pattern |
Explanation |
Example |
|
All files in |
|
|
All files in all direct subdirectories of /var/log/ |
|
|
All files in |
|
|
All files in |
|
So when it comes to which files are "matched" in the first step, we use no regular expressions and this may be enough for you to reach all the files you want.
However, if you now need to filter further, you can use the option Regular expression for logfile filtering to apply regular expressions to the hits from step 1 in a second step.
If you collected all files /var/log/
and its direct subdirectories in the first step with /var/log/
and /var/log/*/*
, you could use the regular expression error.log$|err$
to reduce the hit list to all files that end with err.log
or err
.
Caution: The dot is now an arbitrary character again.
This could, for example, leave the files /var/log/apache2/error.log
, /var/log/mail.err
and /var/log/cups/error_log
.
As you can see, we have already provided you with two good and powerful tools for selecting the monitored files, so that Logwatch can also check the other parameters and contents very quickly in the next step using a manageable file list. You can deepen your knowledge of the latter in the article regular expressions in Checkmk.
With the option Restrict the length of the lines you can instruct Logwatch to cut off excessively long lines after the specified number of characters.
The following option Watch the total size of the log file is useful for recognizing a defective log rotation. If you set 100 MiB here, you will receive a warning each time a particular log file has grown by the set size again.
The maximum number of lines that Logwatch checks per run and file can be restricted with Restrict number of processed messages per cycle and with Restrict runtime of logfile parsing you can ensure that Logwatch does not spend too long on a single file that may have been flooded with thousands and thousands of new entries since the last check.
If you activate one of the latter two options, you must also specify what should happen if the specified limit is exceeded. With our default setting, the associated service becomes critical and you receive a message that lines have been skipped or that the maximum runtime has been exceeded.
Handling of context messages is an option with which the amount of transferred data can become very large very quickly. So think carefully about whether only the log message that you think should generate a CRIT or WARN is important to you, or whether all lines that have been added since the last run of Logwatch should be transferred to the Checkmk server. For small log files that only grow by a few lines every minute, the setting Do transfer context is certainly unproblematic. However, if 50 log files are monitored on a host, which suddenly contain 100,000 new lines with a length of 500 characters, we are already in the gigabyte range. In such an event, it may be enough to see that a large number of new messages have been added since the last check in order to initiate a check directly on the host concerned.
If you do need the context - i.e. the lines before and after the log message that is important to you - you can limit this to a certain amount of lines before and after with the options Limit the amount of context data sent to the monitoring server.
With Limit the amount of data sent to the monitoring server you can limit the size of the transferred data in general.
Process new logfiles from the beginning is switched off by default. This sometimes leads to astonishment, because Logwatch does not "recognize" problems that are in log files and passes them on to the Checkmk server. In our opinion, nothing is older than yesterday’s newspaper and so are the log messages that were already in a log file before the first run of Logwatch. During this very first run, Logwatch does nothing more than note how many lines are already contained in the respective log. Only during the second run are the files checked for their content - i.e. the newly added lines.
Logwatch relies on actually being able to read the log files. Under the hood, Logwatch goes to great lengths to recognize the coding of each log file. However, character encoding that are too exotic can lead to problems. If you can influence the character encoding of the monitored log files, UTF-8 is a very good choice. If this is not possible and Logwatch does not manage to find out the encoding, you can make an explicit specification with Codec that should be used to decode the matching files.
With Duplicated messages management you can save a bit of bandwidth again and the subsequent output in Checkmk will also be more readable if you activate this option. If you activate Filter out consecutive duplicated messages in the agent output, Logwatch counts how often a line was repeated and writes this accordingly in the output instead of repeating the lines.
Finally, the lines in the log files that are of interest to you are now described using a regular expression and assigned a status.
If you want every line containing the word panic
to lead to a CRIT in Checkmk, it is sufficient to enter panic
in the Pattern(Regex) field after clicking on Add message pattern below Regular expressions for message classification.
The functions of the other options offered are already described in great detail in the inline help at this point and are not duplicated here.
Just this much: The status OK may seem confusing at first glance. It is used to first transfer lines from a log file to the Checkmk server in order to then carry out the final classification. And this brings us to an important point that shows how flexible Logwatch can be when used correctly.
All the options explained in this section become entries in the configuration file already mentioned, which is stored on the respective host. If you now want to make changes to the classification of certain messages, you may first have to edit the rule, then bake the agent and install it.
Alternatively, you can first transfer all interesting messages to the Checkmk server (for example with the status OK) and then (re-)classify them with the rule Logfile patterns on the Checkmk server. In this way, you can save yourself the trouble of baking and rolling out the new agent and only have to quickly activate the changes once after adjusting the above-mentioned rule accordingly. You can find out exactly how to do this below in the chapter Reclassify with logfile patterns.
Manual configuration
In the Checkmk Raw Edition you configure the agent plugin as usual via a text file.
As a rule, Logwatch searches for a file called logwatch.cfg
in the directories /etc/check_mk
(Linux) or c:\ProgramData\checkmk\agent\config\
(Windows).
An (almost) minimal configuration could look like this:
"/var/log/my.log" overflow=C nocontext=True
C a critical message
W something that should only trigger a warning
First, always enter a glob pattern here, followed by all the options to be applied.
This is followed - with an indentation of one space - by a letter representing the desired status or function and finally a regular expression that is compared with each line of the log file.
With the above configuration, all new lines that have been added to the file /var/log/my.log
since the last run of Logwatch would be checked for the two patterns "a critical message" and "something that should only trigger a warning".
You can find a very extensive example configuration as an instance user in the file ~/share/check_mk/agents/cfg_examples/logwatch.cfg
.
As all the options that you can specify in such a configuration file have already been explained in the section Configuration via agent bakery, only a list and reference follows here. Please refer to the above section for an explanation.
option in |
counterpart |
example |
remark |
regex |
Regular expression for logfile filtering |
|
|
encoding |
Codec that should be used to decode the matching files |
|
|
maxlines |
Restrict number of processed messages per cycle |
|
|
maxtime |
Restrict runtime of logfile parsing |
|
|
overflow |
In case of an overflow |
|
|
maxlinesize |
Restrict the length of the lines |
|
|
maxoutputsize |
Limit the amount of data sent to the monitoring server |
|
Size given in byte |
skipconsecutiveduplicated |
Duplicated messages management |
|
|
nocontext |
Handling of context messages |
|
|
maxcontextlines |
Limit the amount of context data sent to the monitoring server |
|
4. Grouping of log files
The check belonging to the agent plug-in called logwatch
normally creates a separate service for each log file.
By defining groupings using the Logfile Grouping rule, you can switch to the logwatch_groups
check.
Further information will follow soon. Until then, please consult the inline help of the rule Logfile Grouping.
5. Reclassify with logfile patterns
This section will follow soon. Until then, please consult the inline help of the rule Logfile patterns.
6. Forwarding to the Event Console
In addition to the direct processing of log messages in Checkmk and a possible reclassification with the rule Logfile patterns, there is also the option of forwarding log lines obtained by Logwatch to the Event Console. This is done using the rule Logwatch Event Console Forwarding and is described in the article The Event Console.
7. Logwatch in monitoring
In monitoring, the display differs depending on the check plug-in used.
If you use either logwatch
or logwatch_groups
, you will find - after the necessary service detection - one service per log file or per grouping of log files (see Grouping of log files) that begins with Log
.
This is followed by the full path of the file or the group name.
If you forward your log messages to the Event Console, you will see one service per forwarding, depending on the setting of the rule Logwatch Event Console Forwarding, which informs you about the number of forwarded log messages.
In the case of bundled forwarding by the logwatch_ec
plugin, this service is called Log Forwarding.
If you use the Seperate check option and thus the logwatch_ec_single
plugin, the service name also starts with Log
followed by the path of the log file.
This service also informs you of the number of messages forwarded and if a log file cannot be read.
8. Files and directories
All paths for the Checkmk server are specified relative to the instance directory (e.g. /omd/sites/mysite
).
location | path | meaning |
---|---|---|
Checkmk server |
|
example configuration file |
Checkmk server |
|
Python 3 agent plugin including explanations |
Checkmk server |
|
Python 2 agent plugin including explanations |
Linux host |
|
Configuration file - created by the agent bakery or manually |
Linux host |
|
State files of mk_logwatch |
Linux host |
|
Location of the individual batches that mk_logwatch creates per query |
Windows host |
|
Configuration file - created by the agent bakery or manually |
Windows host |
|
Storage location for the state files of mk_logwatch |
Windows host |
|
Storage location of the individual batches that mk_logwatch creates per query |