1. Introduction
Check plug-ins that work with SNMP are developed in a similar way to their agent-based relatives. The difference lies both in the service discovery process and in the check itself. With the agent-based check plug-ins, the agent plug-in is used to determine which data is sent to the Checkmk site, and pre-filtering (but no evaluation) often already takes place on the host. In contrast, with SNMP you must specify exactly which data fields you require and explicitly request these. With SNMP, these areas (branches of a tree) or individual data fields (leaves) are identified by OIDs (object identifiers).
A complete transfer of all data would theoretically be possible (using the so-called SNMP walk), However, even with fast devices this takes minutes, and with complex switches it can take over an hour. This is therefore already a problem during a discovery and even more so during the check itself. Here Checkmk takes a more targeted approach. Nevertheless, SNMP walks are available in Checkmk for debugging existing checks and developing your own checks.
If you do not yet have experience with SNMP, we recommend that you read the article on Monitoring via SNMP.
1.1. What works differently in SNMP?
Compared to a check plug-in for the Checkmk agent, there are some special features to note with SNMP. With a check plug-in for SNMP, a service discovery is divided into two phases.
As a first step, the SNMP detection function is used to detect the device.
This serves to determine whether the check plug-in is of any interest to the respective device and is carried out for every device that is monitored via SNMP.
For this purpose, a few OIDs are retrieved — individual ones, without an SNMP walk.
The most important of these is the sysDescr
(OID: 1.3.6.1.2.1.1.1.0
).
Under this OID, each SNMP device provides a description of itself, for example Flintstones, Inc. Fred Router rev23
.
In the second step, the necessary monitoring data is retrieved for each of these candidates using SNMP walks.
These are then summarized in a table and provided to the check plug-in’s discovery function in the section
argument, which then determines the items to be monitored.
A service is then generated for each of these items.
During the check it is then already known whether the plug-in should be executed for the device and thus a new SNMP detection is not necessary. The current monitoring data required for the plug-in is retrieved here via SNMP walks.
So what do you have to do differently with a check plug-in for SNMP compared to an agent-based one?
You do not need an agent plug-in.
You define the OIDs required for SNMP detection and the texts they should contain.
You decide which branches and leaves of the SNMP tree need to be fetched for monitoring.
1.2. Don’t be afraid of MIBs!
In this brief introduction we would like to discuss the notorious SNMP MIBs, about which there are many prejudices. The good news: Checkmk does not need MIBs! However, they can be an important aid when developing a check plug-in or troubleshooting existing check plug-ins.
What are MIBs? MIB literally means Management Information Base, which contains little more information than the abbreviation itself. Basically, an MIB is a human-readable text file that describes the branches and leaves in an SNMP data tree.
OIDs can identify branches or leaves. The branch description contains information on the system and subsystem information provided by the branch. If an OID references a leaf, the information in the MIB contains information on the data type (character string, fixed-point number, hex string, …), the value range and the representation. For example, temperatures can be stored as a fixed-point number with a sign on the Celsius scale with a resolution of 0.1° or without a sign in steps of 1.0° on the Kelvin scale.
Checkmk provides a series of freely accessible MIB files. These describe very general fields in the global OID tree, but do not contain any manufacturer-specific fields. They are therefore not much help for self-developed check plug-ins.
So try to find the relevant MIB files for your specific device somewhere on the manufacturer’s website or even on the device’s management interface.
Install these files in the Checkmk site in the ~/local/share/check_mk/mibs/
directory.
You can then translate OID numbers into names using SNMP walks and thus more quickly find the data of interest for purposes of monitoring.
As already mentioned, well-maintained MIBs also contain interesting information in their comments.
You can easily view a MIB file with a text editor or the pager less
.
2. Finding the correct OIDs
The crucial prerequisite for developing an SNMP-based check plug-in is that you know which OIDs contain the relevant information. For the example scenario presented, we have assumed that you have just commissioned a batch of routers of the type Flintstones, Inc. Fred Router rev23. You will often come across this fictitious device in manufacturer documentation and MIB comments. However, you have forgotten to enter the contact and location information for some devices. A self-written check plug-in for Checkmk should now help to identify these devices.
Tip: The example plug-in we have prepared is written in such a way that you can run it with almost any SNMP-capable device. You only need to adapt the character string to be compared. If you do not have a device at hand, you will find various simulation options in the chapter on Troubleshooting.
The first step is to carry out a complete SNMP walk. This involves retrieving all of the available data via SNMP. This can be done very easily for you with Checkmk. First include the device for which you want to develop a check plug-in in the monitoring. Make sure that it can be monitored in the basic functions. At the very least, the SNMP Info and Uptime services must be found and probably also at least one Interface. This will ensure that the SNMP access works properly.
Then switch to the Checkmk site’s command line.
Here you can execute a complete walk with the following command — in the following example for the device with the host name mydevice01
.
We recommend that you also use the -v
option (for verbose):
OMD[mysite]:~$ cmk -v --snmpwalk mydevice01
mydevice01:
Walk on ".1.3.6.1.2.1"...3898 variables.
Walk on ".1.3.6.1.4.1"...6025 variables.
Wrote fetched data to /omd/sites/mysite/var/check_mk/snmpwalks/mydevice01.
As already mentioned, a complete SNMP walk can take minutes or even hours (even if the latter is a rare occurrence), so don’t get nervous if it takes a while to complete.
The walk’s results is saved in the ~/var/check_mk/snmpwalks/mydevice01
file.
This is a text file that is easy to read and starts like this:
.1.3.6.1.2.1.1.1.0 Flintstones, Inc. Fred Router rev23
.1.3.6.1.2.1.1.2.0 .1.3.6.1.4.1.424242.2.3
.1.3.6.1.2.1.1.3.0 546522419
.1.3.6.1.2.1.1.4.0 barney@example.com
.1.3.6.1.2.1.1.5.0 big-router-01
.1.3.6.1.2.1.1.6.0 Server room 23, Stonestreet 52, Munich
.1.3.6.1.2.1.1.7.0 72
.1.3.6.1.2.1.1.8.0 0
Each line contains an OID and then its value.
You will find the most important one in the very first line, namely sysDescr
.
This should be a unique identifier for a hardware model.
The second line is also interesting:
Below 1.3.6.1.4.1
there are branches that hardware manufacturers can assign themselves, here Flintstones, Inc. has the fictitious manufacturer ID 424242
.
Below this, the company has assigned 2
for routers and 3
for the same model.
You will then find device-specific OIDs within this branch.
These OIDs are however not very meaningful. If the correct MIBs are installed, you can translate these into names in a second step. It is best to redirect the output from the following command, which would otherwise be displayed in the terminal, to a file:
OMD[mysite]:~$ cmk --snmptranslate mydevice01 > /tmp/translated
Once this file has been translated
it reads like the original walk, but additionally shows the name of the OID in each line after the -->
:
.1.3.6.1.2.1.1.1.0 Flintstones, Inc. Fred Router rev23 --> SNMPv2-MIB::sysDescr.0
.1.3.6.1.2.1.1.2.0 .1.3.6.1.4.1.424242.2.3 --> SNMPv2-MIB::sysObjectID.0
.1.3.6.1.2.1.1.3.0 546522419 --> DISMAN-EVENT-MIB::sysUpTimeInstance
.1.3.6.1.2.1.1.4.0 barney@example.com --> SNMPv2-MIB::sysContact.0
.1.3.6.1.2.1.1.5.0 big-router-01 --> SNMPv2-MIB::sysName.0
.1.3.6.1.2.1.1.6.0 Server room 23, Stonestreet 52, Munich --> SNMPv2-MIB::sysLocation.0
.1.3.6.1.2.1.1.7.0 42 --> SNMPv2-MIB::sysServices.0
.1.3.6.1.2.1.1.8.0 27 --> SNMPv2-MIB::sysORLastChange.0
In the above output, for example, the OID 1.3.6.1.2.1.1.4.0
has the value barney@example.com
and the name SNMPv2-MIB::sysContact.0
.
The additional information showing the names of the OIDs provides important information for identifying the OIDs of interest.
For the example presented, the OIDs 1.3.6.1.2.1.1.4.0
to 1.3.6.1.2.1.1.6.0
are sufficient.
3. Writing a simple check plug-in
You have now completed the preparatory work: Now you have a list of the OIDs that you want to read and evaluate. The task now is to use these notes to teach Checkmk which services are generated and when they should go to WARN or CRIT. The programming of a check plug-in in Python used for this has many parallels to an agent-based check plug-in. As there are some subtleties to consider, we will show the complete structure with all of the functions that are used.
3.1. Preparing the file
You will find a directory prepared for your own check plug-ins in the local
hierarchy of the site directory.
This is ~/local/lib/check_mk/base/plugins/agent_based/
.
Here in the file path, base
stands for the part of Checkmk that is responsible for the actual monitoring and notifications.
The agent_based
stands for all plug-ins that relate to the Checkmk agent, as well as SNMP agents (i.e. non-notification plug-ins).
The easiest way to work is to switch to this:
OMD[mysite]:~$ cd local/lib/check_mk/base/plugins/agent_based
This directory belongs to the site user and can therefore be edited by you. You can edit your check plug-in using any text editor installed on the Linux system.
So here create the flintstone_setup_check.py
file for the new check plug-in to be created.
The convention is that the file name reflects the name of the check plug-in as defined in the registration function register.check_plugin()
.
It is mandatory that the .py
file suffix is used, because from version 2.0.0 of Checkmk the check plug-ins are always real Python modules.
An executable basic framework (download from GitHub), which you will expand step by step in the following tutorial, looks like this:
#!/usr/bin/env python3
from .agent_based_api.v1 import register, Result, Service, startswith, SNMPTree, State
def parse_flintstone(string_table):
return {}
def discover_flintstone(section):
yield Service()
def check_flintstone(section):
yield Result(state=State.OK, summary="Everything is fine")
register.snmp_section(
name = "flintstone_base_config",
parse_function = parse_flintstone,
detect = startswith(".1.3.6.1.2.1.1.1.0", "Flintstone"),
fetch = SNMPTree(base='.1.3.6.1.2.1.1', oids=['4.0']),
)
register.check_plugin(
name = "flintstone_setup_check",
sections = ["flintstone_base_config"],
service_name = "Flintstone setup check",
discovery_function = discover_flintstone,
check_function = check_flintstone,
)
First you will need to import the functions and classes required for the check plug-ins from Python modules.
The simplest method for this is with an import *
, but you should avoid this command, as it obscures which namespaces are actually made available.
For our example, we will only import what will be used in the rest of the article:
from .agent_based_api.v1 import register, Result, Service, startswith, SNMPTree, State
New here in comparison to an agent-based check plug-in are startswith
and SNMPTree
.
The SNMPTree
option is self-explanatory — it is a class for the representation of SNMP trees.
The startswith()
function compares the content of an SNMP leaf with a character string.
More on this later.
3.2. Registering the SNMP section
Once you have obtained the correct OIDs, the actual development of the check plug-in can begin. When registering the SNMP section, you specify two things:
You identify the devices for which the check plug-in is to be executed.
In the following example, this is done with thestartswith()
function, which compares a character string with the start of the content of an OID leaf. Further assignment options are shown below.You declare which OID branches or leaves are to be retrieved for monitoring.
This is done with the constructor of theSNMPTree
class.
Extend the prepared example file so that the plug-in is only executed for a small number of devices, here the Flintstones, Inc. Fred Router
models.
The OIDs for contact, device name and location are then retrieved for these devices.
These three OIDs are provided by each device.
If you want to test the example with real SNMP-capable devices, it is therefore sufficient to customize the model name to be recognized.
register.snmp_section(
name = "flintstone_base_config",
parse_function = parse_flintstone,
detect = startswith(
".1.3.6.1.2.1.1.1.0",
"Flintstones, Inc. Fred Router",
),
fetch = SNMPTree(
base = '.1.3.6.1.2.1.1',
oids = ['4.0', '5.0', '6.0'],
),
)
The example also contains the name
parameter with which the generated SNMP section is identified and a parse function, which we will discuss later.
The SNMP detection
Use the detect
parameter to specify the conditions under which the discovery function should be executed.
In our example, this is the case if the value of the OID 1.3.6.1.2.1.1.1.0
(i.e. the sysDescr
) begins with the text Flintstones, Inc. Fred Router
(case-insensitive).
In addition to startswith
, there is a whole range of other possible functions for identification.
There is also a negated form of each, which begins with not_
.
Note that each function must be separately specified in the import
statement.
Attribute | Function | Negation |
---|---|---|
|
The value of the OID matches the text |
|
|
The value of the OID contains the text |
|
|
The value of the OID begins with the text |
|
|
The value of the OID ends with the text |
|
|
The value of the OID corresponds to the regular expression |
|
|
The OID is available on the device. Its value may be empty. |
|
There is also the option of linking several attributes with all_of
or any_of
.
all_of
requires several successful checks for positive recognition.
The following example assigns your check plug-in to a device if the text in the sysDescr
begins with foo
(or FOO
or Foo
) and
the OID 1.3.6.1.2.1.1.2.0
contains the text .4.1.11863.
:
detect = all_of(
startswith(".1.3.6.1.2.1.1.1.0", "foo"),
contains(".1.3.6.1.2.1.1.2.0", ".4.1.11863.")
)
In contrast, any_of
is satisfied if only one of the criteria has been met.
Here is an example in which different values are permitted for the sysDescr
:
detect = any_of(
startswith(".1.3.6.1.2.1.1.1.0", "foo version 3 system"),
startswith(".1.3.6.1.2.1.1.1.0", "foo version 4 system"),
startswith(".1.3.6.1.2.1.1.1.0", "foo version 4.1 system"),
)
By the way: Are you familiar with regular expressions? If so, you could probably simplify this example and get by with just a single line:
detect = matches(".1.3.6.1.2.1.1.1.0", "FOO Version (3|4|4.1) .*")
And another important note: The OIDs that you pass to the SNMP detection for a check plug-in are retrieved from every device that is monitored via SNMP. This is the only way Checkmk can determine which devices the check plug-in should be applied to.
You should therefore be very careful when using manufacturer-specific OIDs.
Try to design your SNMP detection to prioritize that the sysDescr
(1.3.6.1.2.1.1.1.0
) and the sysObjectID
(1.3.6.1.2.1.1.2.0
) are checked first.
If you still need a different OID for exact identification, use all_of()
and proceed as follows:
First check for
sysDescr
orsysObjectID
.In further arguments, you can then further restrict the group of devices for which your plug-in is to be executed.
detect = all_of(
startswith(".1.3.6.1.2.1.1.1.0", "Flintstone"), # first check sysDescr
contains(".1.3.6.1.4.1.424242.2.3.37.0", "foo"), # fetch vendor specific OID
)
This works thanks to the lazy evaluation principle:
As soon as one of the earlier checks fails, no further checks will be performed.
In the example above, the OID 1.3.6.1.4.1.424242.2.3.37.0
is only retrieved from devices that also have Flintstone
in their sysDescr
.
3.3. Writing the parse function
As with agent-based plug-ins, the parse function in the SNMP-based check plug-in also has the task of converting the received agent data into a form that can be processed easily and, above all, with high performance.
You also receive the data here as a list. However, there are a few subtleties to consider, as it makes a difference whether you are querying leaves or branches. As a reminder — in our above example, leaves are requested:
fetch = SNMPTree(
base = '.1.3.6.1.2.1.1',
oids = ['4.0', '5.0', '6.0'],
)
If you temporarily extend the parse function with the print()
function, you can display the data that Checkmk provides from this query when testing the check plug-in:
def parse_flintstone(string_table):
print(string_table)
return {}
You will receive a nested list which contains only one element in its first level, namely a list of the retrieved values:
[
['barney@example.com', 'big-router-01', 'Server room 23, Stonestreet 52, Munich']
]
The result looks a little different if you retrieve branches that contain multiple leaves.
Assume that the router can be equipped with a variable number of network cards whose name, connection status and speed can be read below 1.3.6.1.4.1.424242.2.3.23
…
fetch = SNMPTree(
base = '.1.3.6.1.4.1.424242.2.3.23',
oids = [
'6', # all names
'7', # all states
'8', # all speeds
],
)
… then the two-dimensional list could possibly look like this:
[
# Name, State, Speed
['net0', '1', '1000'],
['net1', '0', '100'],
['net2', '1', '10000'],
['net3', '1', '1000'],
]
All leaves available under an OID are written to a table column. It should therefore be obvious that for the purposes of displaying the data, only matching OIDs may be queried.
Tip: The last example shown for retrieving OID branches is also a part of our SNMP walk provided on GitHub, which you can use for simulations.
But now back to the example in which the OID leaves for contact, device name and location are queried: The following parse function simply copies each element of the inner list into a key-value pair in the returned dictionary:
def parse_flintstone(string_table):
# print(string_table)
result = {}
result["contact"] = string_table[0][0]
result["name"] = string_table[0][1]
result["location"] = string_table[0][2]
# print(result)
return result
The result from the parse function will then look like this:
{
'contact': 'barney@example.com',
'name': 'big-router-01',
'location': 'Server room 23, Stonestreet 52, Munich'
}
3.4. Registering the check plug-in
The check plug-in is registered in exactly the same way as described for an agent-based check plug-in.
Since in most cases you will be querying several SNMP branches and this will result in several SNMP sections, the sections
parameter with the list of sections to be evaluated is usually required:
register.check_plugin(
name = "flintstone_setup_check",
sections = ["flintstone_base_config"],
service_name = "Flintstone setup check",
discovery_function = discover_flintstone,
check_function = check_flintstone,
)
3.5. Writing the discovery function
The discovery function also corresponds to the example for agent-based check plug-ins.
For check plug-ins that only generate one service per host, a single yield()
is sufficient:
def discover_flintstone(section):
yield Service()
3.6. Writing the check function
In the example, we want to check whether the contact, device name and location information is available. It is therefore sufficient to check which fields are empty in the check function and accordingly set the status to CRIT (if something is missing) or to OK (if everything is available):
def check_flintstone(section):
missing = 0
for e in ["contact", "name", "location"]:
if section[e] == "":
missing += 1
yield Result(state=State.CRIT, summary=f"Missing information: {e}!")
if missing > 0:
yield Result(state=State.CRIT, summary=f"Missing fields: {missing}!")
else:
yield Result(state=State.OK, summary="All required information is available.")
Once the check function has been created, the check plug-in will be ready for use.
We have made this complete check plug-in available at GitHub.
3.7. Testing and activating the check plug-in
Testing and activation are carried out in the same way as for an agent-based check plug-in.
The first step is the service discovery for the plug-in:
OMD[mysite]:~$ cmk -vI --detect-plugins=flintstone_setup_check mydevice01
Discovering services and host labels on: mydevice01
mydevice01:
+ FETCHING DATA
[SNMPFetcher] Execute data source
[PiggybackFetcher] Execute data source
No piggyback files for 'mydevice01'. Skip processing.
No piggyback files for '198.51.100.123'. Skip processing.
+ ANALYSE DISCOVERED HOST LABELS
SUCCESS - Found no new host labels
+ ANALYSE DISCOVERED SERVICES
+ EXECUTING DISCOVERY PLUGINS (1)
1 flintstone_setup_check
SUCCESS - Found 1 services
As expected, the service discovery was successful. Now you can test the check contained in the check plug-in:
OMD[mysite]:~$ cmk -v --detect-plugins=flintstone_setup_check mydevice01
+ FETCHING DATA
[SNMPFetcher] Execute data source
[PiggybackFetcher] Execute data source
No piggyback files for 'mydevice01'. Skip processing.
No piggyback files for '198.51.100.123'. Skip processing.
Flintstone setup check All required information is available.
[snmp] Success, [piggyback] Success ...
After restarting the monitoring core …
OMD[mysite]:~$ cmk -R
Generating configuration for core (type nagios)...
Precompiling host checks...OK
Validating Nagios configuration...OK
Restarting monitoring core...OK
… the new service will then be visible in the monitoring:
4. Troubleshooting
As the troubleshooting in agent-based check plug-ins essentially also applies to SNMP-based check plug-ins, we will only deal with the SNMP specifics here.
4.1. Simulation options
Using saved SNMP walks in Checkmk
In the article on monitoring via SNMP we show in detail how you can create SNMP walks from the GUI and how you can use them for simulation. This also makes it possible to develop check plug-ins on test systems that cannot access the SNMP hosts for which you are developing a plug-in. In our GitHub repository you will find an example of an SNMP walk, which we use in this article and which you can use to develop and test the check plug-in.
The dummy SNMP daemon
If you want to ensure that specific OIDs change depending on each other, it can be useful to program a dummy SNMP daemon that delivers consistent data.
The Python snmp-agent
module can be an aid when programming such a dummy.
4.2. Uncooperative hardware
Before a device can be monitored with a new SNMP-based check plug-in, it must first be able to be monitored via SNMP. You can therefore find an overview of known problems with suggested solutions in the article on Monitoring via SNMP.
5. Files and directories
File Path | Description |
---|---|
|
Location where self-written check plug-ins are stored. |
|
Store SNMP MIB files here that are to be loaded automatically. |