1. Synthetic monitoring with Robot Framework
Checkmk Synthetic Monitoring is available in the commercial Checkmk editions, but it requires an additional subscription. You can however test the function with up to three tests free of charge and without a time limit.
With Checkmk you can monitor your own infrastructure very closely — right down to the question of whether a particular service, such as a web server, is running properly. If your website is operated via a third-party cloud service, you will not have access to the service itself, but you can use an HTTP check to verify whether the website is accessible. But will that say anything about the user experience? The fact that an online store is accessible does not mean that navigation, ordering processes and the like work smoothly.
This is where Checkmk Synthetic Monitoring comes in. With the Robotmk plug-in, Checkmk offers genuine end-to-end monitoring, i.e. the monitoring of running applications from the user’s perspective. The actual testing is carried out by the open-source software Robot Framework — of which Checkmk GmbH is also a member.
The automation software can be used to fully replicate a user’s behavior, for example to simulate order processes in online stores, 'click-by-click'.
The special thing about Robot Framework is that tests are not written in a fully-fledged programming language, but are defined using easy-to-use keywords such as Open Browser
or Click Button
.
An Open Browser checkmk.com
is sufficient to call up the Checkmk website.
Several test cases are then combined in so-called test suites (in the form of a .robot
file).
Robotmk can now trigger these Robot Framework test suites on the host and monitor their execution and results as services in Checkmk. In the Checkmk web interface you will then find the status, associated performance graphs and the original evaluations of Robot Framework itself.
1.1. Components
Multiple components work together to create this end-to-end monitoring, so here is a brief overview.
Checkmk server
Checkmk Synthetic monitoring is realized via Robotmk, which uses an agent plug-in as a data collector, and the Robotmk scheduler (on the monitored host) for triggering Robot Framework projects. Synthetic monitoring is activated and configured via the Robotmk Scheduler rule. Here you specify which test suites should be executed and how exactly Robot Framework should start them — summarized in a plan. Once rolled out, the Robotmk scheduler on the target host ensures the scheduled execution of your Robot Framework suites.
In the monitoring, a number of new services will be generated: RMK Scheduler Status shows the status of the scheduler itself, i.e. whether test suites could be started successfully. There are also services for all configured test plans (such as RMK MyApp1 Plan) and individual tests from test suites (such as RMK MyApp1 Test). The services of the individual tests also include the original Robot Framework reports.
Last but not least, there are two optional service rules: Robotmk plan and Robotmk test provide for fine-tuning the plan and test services — for example, to effect status changes at certain runtimes.
Test machine
You must provide the Robot Framework test suites on a Windows host. For execution, Robot Framework requires access to their dependencies (Python, libraries, drivers for browser automation and so on). This configuration is independent of Checkmk and can be stored declaratively in a portable package. This is performed with the open-source command line tool RCC. This tool uses your configuration files in YAML format to build virtual Python environments including dependencies and the Robot Framework itself. The Robotmk scheduler running as a background process triggers this build and then executes the tests itself.
Such an RCC automation package with the package configuration (robot.yaml
), the definition of the execution environment (conda.yaml
) and the test suites (tests.robot
) is also called robot.
RCC and the scheduler are rolled out with the Checkmk agent, the automation package must be available on the host.
The great advantage of RCC is that the executing Windows host itself does not require a configured Python environment.
The agent itself is only required for the transfer of results, logs and screenshots. This also enables the monitoring of very long-running or locally very resource-intensive suites — provided that your Windows host has the corresponding capacities.
2. Monitoring test suites with Robotmk
In the following, we will show you how to include a test suite in the monitoring. As an example we will use a simple Hello World suite which only outputs two strings and which waits briefly between each. An introduction to Robot Framework is of course not the subject of this article, but a brief look at the automation package and the demo test suite is necessary so that you can see which data ends up where in the monitoring.
The example runs on the basis of RCC, so that the Windows host does not have to be configured separately.
The rcc.exe
is rolled out with the agent and can be found under C:\ProgramData\checkmk\agent\bin\
.
You can download the sample suite as a ZIP file via GitHub.
The directory of the suite:
conda.yaml
robot.yaml
tests.robot
RCC can also process test suites based on a number of other programming languages, but for use in Checkmk it must be the Robot Framework declaration. |
The suite directory now contains two important files:
The declaration of the environment required for execution in the file conda.yaml
and the actual tests in the file tests.robot
(the suite).
The robot.yaml
file is not relevant for use in Checkmk, but is required by RCC.
For the sake of completeness, here is a brief look into robot.yaml
file:
tasks:
task1:
# (task definitions are not required by Robotmk,
but expected by RCC to be compatible with other Robocorp features)
shell: echo "nothing to do"
environmentConfigs:
- conda.yaml
artifactsDir: output
At first, tasks
defines which tasks, here tests, are to be executed at all.
However, although this part is formally required by RCC, it is not used by Robotmk.
Robot Framework distinguishes between tests and tasks, which stand for automations. However, both are used in exactly the same way. |
In the environmentConfigs
area, only the conda.yaml
is referenced, which takes care about the actual environment.
In this case, only the Python, Pip and Robot Framework dependencies are installed for the environment. The environment build later appears in the monitoring as RCC environment build status. The tests can only be processed and consequently monitored if the environment has been built successfully.
channels:
- conda-forge
dependencies:
- python=3.10.12
- pip=23.2.1
- pip:
- robotframework==7.0
The actual test suite now looks like this:
*** Settings ***
Documentation Template robot main suite.
*** Variables ***
${MYVAR} Hello Checkmk!
*** Test Cases ***
My Test
Log ${MYVAR}
Sleep 3
Log Done.
Here, only the value of the MYVAR
variable is output, then following a 3 second wait, Done
will be output.
You can set the value of the variable later in the Checkmk web interface — otherwise the default Hello Checkmk!
specified here will be used.
You can run this test suite manually.
To do this, the agent and RCC must already be installed (or you can download the RCC binary yourself).
First navigate to your test suite directory, where the |
And this is exactly what the Robotmk scheduler does as soon as the agent plug-in has been activated.
2.1. Configure a rule for the agent plug-in
You can find the Robotmk scheduler under Setup > Agent rules > Robotmk scheduler (Windows). As the rule is quite extensive, here is a look at the empty configuration:
First, the scheduler requires the base directory in which all your test suites are located.
Enter this arbitrary, explicit file path under Base directory of suites, for example C:\robots
.
The Parallel plan groups that are shown now are a Checkmk-specific concept.
To explain this, we must first go down one hierarchy level: Here you can see the item Sequential plans. Such a sequential plan defines which suites are to be executed with which parameters. Robot Framework will process these suites one after the other. The reason for this is simple: in practice, tests are sometimes run on the desktop and several test suites could get in each other’s way at the same time (think of them stealing each others control of the mouse cursor).
The plan groups are now an encapsulation for sequentially executed plans — and are themselves executed in parallel. Again, the reasoning is simple: this allows test suites that do not rely on the desktop to be executed in their own plans without delay — the test suite used in this article is a good example of such processing.
Back to the dialog: The only explicit setting is the execution interval, which you set under Group execution interval.
The plans in the plan group naturally have their own runtimes, determined by the timeout of a single execution and the maximum number of repeated executions in the event of failed tests. The execution interval of the plan group must therefore be greater than the sum of the maximum runtimes of all plans in the group. The maximum runtime of a plan is calculated as follows: Limit per attempt × (1 + Maximum number of re-executions). |
Now it’s time to configure the first plan.
You can enter any name under Application name.
This name does not have to be unique!
The name of the application to be monitored makes sense here, for example OnlineShop
, or here in this example simply MyApplication
.
Of course, it can happen that this online store is tested several times, either by other test suites or by the same test suite with different parameters.
In such cases, the Variant field is used to achieve unambiguous results despite identical names.
For example, if the application OnlineShop
is tested once in German and once in English (via corresponding parameters), you could use corresponding abbreviations here.
The monitoring will then return results for My OnlineShop_en
and My OnlineShop_de
.
However, the specification under Relative path to test suite file or folder is necessary.
The path is relative to the base directory specified above, e.g. mybot\test.robot
for C:\robots\
.
Alternatively, a directory (with several robot
files) can also be specified here, in which case it would simply be mybot
.
Continue with the Execution configuration. Under Limit per attempt you define the maximum elapsed time — per attempt — that a test suite may run. With Robot Framework re-executions you can now instruct Robot Framework to repeat test suites completely or incrementally if tests fail. If the individual tests in a test suite are independent of each other, the incremental strategy is the best way to save time. If, on the other hand, the test suite tests a logical sequence, such as "Login → Call up product page → Product in shopping cart → Checkout", the test suite must of course be completely reprocessed. In the end, there is always only one result.
In the case of complete retries, only the results from self-contained suites are taken into account for the final result: If a test fails on its final retry, the test suite is counted as a failure. In the case of incremental retries, the final result is made up of the best partial results: If some tests only run successfully on the third attempt, the final result is also counted as a success. Reminder: The combination of attempts and maximum run times of all plans in a plan group determines their minimum execution interval.
By default, execution via RCC is activated under Automated environment setup (via RCC), for which you must enter two values.
Firstly, RCC requires the specification of where the robot.yaml
file is located.
Its primary purpose is to reference the conda.yaml
file, which is responsible for setting up the Python environment, i.e. installing Python and dependencies.
This specification is relative to the base directory that you have set above under Relative path to test suite file or folder.
The YAML files can be saved in subdirectories, but best practice is the top suite directory.
For the above base directory C:\robot\
and the suite directory C:\robot\mybot
it is accordingly mybot\robot.yaml
.
With the following time limit for building the Python environment, you should bear in mind that sometimes large volumes of data need to be downloaded and set up.
Especially for the required browsers, several hundred megabytes are quickly accumulated — but only for the first run.
RCC only rebuilds environments if the content of conda.yaml
has changed.
Under Robot Framework parameters you have the possibility to use some of the command line parameters of Robot Framework (which are also displayed by the command robot --help
).
If you want to use additional parameters, the option Argument files will help.
A file specified here can contain any robot parameters.
Further information about the individual parameters can be found in the inline help.
For our example project, only the option Variables is activated and a variable MYVAR
with the value My Value
is set.
Remember the command Log ${MYVAR}
at the top of the file tests.robot
?
This is the corresponding reference.
robot
commandAt the end of the suite configuration, there are three largely self-explanatory options. Execute plan as a specific user allows Robotmk to be executed in the context of a specific user account. Background: By default, Robotmk is executed in the context of the Checkmk agent (LocalSystem account), which has no authorization to access the desktop. Here a user can be specified who must be permanently logged in to a desktop session and who has access to graphical desktop applications accordingly.
With Assign plan/test result to piggyback host the results of the plan/test can be assigned to another host. For example, if Robot Framework is testing the ordering process of an online store, the results can be assigned to the corresponding web server.
Each test run produces data that is stored under C:\ProgramData\checkmk\agent\robotmk_output\working\suites\
.
By default, results from the last 14 days are retained, but you should bear in mind that large volumes of data can quickly accumulate here.
At least 500 kilobytes of data are generated per run — with more complex test suites and embedded screenshots, for example, this can quickly add up to several megabytes of data.
Depending on the execution interval, the size of the report and your documentation requirements, you should intervene in such a situation.
Once here, you can now create further plans in this plan group or further plan groups.
At the end there are two more options, which in turn relate to the complete Robotmk scheduler configuration.
RCC profile configuration allows you to specify proxy servers and hosts to be excluded.
Grace period before scheduler starts can also be very useful: The scheduler starts together with the Checkmk agent before the desktop logon — which, of course, means that any tests on the desktop must fail. The start can be manually delayed using a grace period.
This completes the configuration and you can bake a new agent with the plug-in and then roll it out, manually or via the automatic agent updates.
Data in the agent output
The output in the agent is quite extensive:
error messages, status, configuration and test data are transmitted in several sections.
The latter can be found in the robotmk_suite_execution_report
section, here is an abbreviated excerpt:
<<<robotmk_suite_execution_report:sep(0)>>> { "attempts": [ { "index": 1, "outcome": "AllTestsPassed", "runtime": 20 } ], "rebot": { "Ok": { "xml": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n <robot generator=\"Rebot 6.1.1 (Python 3.10.12 on win32)\" generated=\"20240319 16:23:19.944\" rpa=\"true\" schemaversion=\"4\">\r\n<suite id=\"s1\" name=\"Mybot\" source=\"C:\\robots\\mybot\">\r\n<suite id=\"s1-s1\" name=\"Tests\" source=\"C:\\robots\\mybot\\tests.robot\">\r\n<test id=\"s1-s1-t1\" name=\"Mytest\" line=\"6\">\r\n<kw name=\"Sleep\" library=\"BuiltIn\">\r\n<arg>3 Seconds</arg>\r\n<doc>Pauses the test executed for the given time.</doc>\r\n<msg timestamp=\"20240319 16:23:02.936\" level=\"INFO\">Slept 3 seconds</msg>\r\n<status status=\"PASS\" starttime=\"20240319 16:23:00.934\" endtime=\"20240319 16:23:02.936\"/>" } }, "suite_id": "mybot", "timestamp": 1710861778 } ... "html_base64":"PCFET0NUWVBFIGh0bWw+DQo8aHRtbCBsYW ...
Two areas are of particular interest here.
Firstly, rebot
: The rebot
tool has produced the actual status report for Robot Framework from several partial results (hence re-bot).
Secondly, the last line html_base64
: The HTML reports from Robot Framework are then base64-encoded.
Screenshots taken via tests are also transferred in this way — the output/data volume in the agent can be correspondingly extensive.
Data in monitoring
As soon as the Robotmk scheduler and the test suite have been run, the service discovery will produce three new services:
The service RMK Scheduler Status exists once and immediately after deployment. The services for plans and tests, here RMK MyApplication_mybot Plan and RMK MyApplication_mybot Test: /Test: My Test, are added to the monitoring as soon as the associated suites have been run for the first time.
2.2. Configuring service rules
Creating a rule for the plan status
Reminder: Maximum runtimes for plans were defined in the agent rule above. These runtimes can be evaluated with the Robotmk plan rule. For example, you can set the service to CRIT when 90 percent of all calculated timeouts have been reached.
In the Conditions box, there is the option of restricting the rule to specific plans.
Creating a rule for the test status
Additional data can also be retrieved for individual tests in the test suites via the Robotmk test rule.
Here you will again find the option to monitor runtimes, both for tests and keywords.
The monitoring of keywords is a Checkmk-specific function.
Therefore, the suite-internal status in the Robot Framework report could also be OK
because the test suite was processed within the maximum permitted runtime — in Checkmk, however, WARN or CRIT, because a status change takes place at, for example, 80 percent of this maximum permitted runtime.
In addition, the Enable metrics for high-level keywords option can be used to generate metrics for higher-level keywords. This is particularly useful if your tests are organized in such a way that the higher-level keywords describe the 'what' and the lower-level keywords describe the 'how' — this gives you more abstract evaluations.
In this example, the threshold values for the maximum runtime of a test are 2 and 4 seconds. You will see the effects below in the chapter Robotmk in monitoring.
Once again, there is an explicit filter option in the Conditions box, here for individual tests.
2.3. Robotmk in monitoring
In monitoring, you will find services for the status of the Robotmk scheduler as well as the individual plans and tests — even if you have not created any separate service rules.
Scheduler status
The service RMK Scheduler Status is OK if the scheduler is running and has successfully built the execution environments.
Here in the image you can see the note Environment build took 1 second.
One second to build a virtual Python environment with Pip and Robot Framework?
This is possible because RCC is clever: files that have already been downloaded are reused and a new build is only carried out after changes have been made in conda.yaml
.
The first build would have taken 30 seconds or more.
Plan status
The status of a plan is reflected in a service named by application name and suite, for example RMK MyApplication_mybot Plan.
Test status
The evaluation of the tests is where it gets really interesting.
In the image you can now see the effect of the threshold values set above for the runtime of tests — here the 2 seconds for the WARN status.
As the Sleep 3 Seconds
instruction in the test itself already ensures a longer runtime, this service must go to WARN here, although the test was successful.
The fact that the test was successful is shown by the Robot Framework report, which you can access via the log icon.
The report now clearly shows that the test and test suite have run successfully.
At the bottom of the data you can also see the individual keywords, here for example Log ${MYVAR}
together with the value My value
set in Checkmk for MYVAR
.
Dashboards
Of course, you can build your own dashboards as usual — but you can also find two built-in dashboards under Monitor > Synthetic Monitoring.
3. Troubleshooting
3.1. Scheduler reports No Data
If the scheduler does not receive any data, building the environment probably did not work.
A common reason for this are network problems, for example, due to which certain dependencies cannot be loaded.
In this case, take a look at the corresponding log file under C:\ProgramData\checkmk\agent\robotmk_output\working\environment_building
.
3.2. Environment building fails: post-install script execution
This is a particularly interesting error that you might encounter on fresh Windows systems.
The conda.yaml
can also contain instructions that are to be executed after the installation of the dependencies — for example, the initialization of the Robot Framework browser.
Python commands should therefore be executed here.
By default, Windows 11 has aliases for python.exe
and python3.exe
that refer to the Microsoft Store.
You must deactivate these aliases under 'Settings/Aliases for app execution'.
4. Files and directories
File path | Description |
---|---|
|
Log files and results of the suites |
|
Log files for building virtual environments |
|
Messages of the RCC execution |
|
Log file of the agent plug-in |
|
|
|
Agent plug-in |