How to to set a fixed downtime in Checkmk with Ansible
Table Of Contents
Introduction
Checkmk 2 provides a useful HTTP API that can be used to do all sorts of things. The most useful feature for me is to te able to set a downtime on my hosts before installing updates on my Linux hosts with an Ansible Playbook.
See my Ansible Playbook for a full example to update a Debain/Ubuntu host.
Ansible Code to set a fixed downtime
Add the following vars to your Ansible Playbook
You need to
- set your monitoring server and site in checkmk_url
- set the secret of the Checkmk automation user in checkmk_api_token (See in Checkmk under Users > automation > secret)
- specify the downtime duration in downtime
You can always check the API documentation of you installed Checkmk version in the left sidebar under Help > APIs
vars:
checkmk_url: https://{ YOUR MONITORING SERVER }/{ YOUR SITE }/check_mk/view.py?
required_fields: _do_confirm=yes&_transid=-1&_do_actions=yes
# if host, use hoststatus, if service, specify service and add '&service={{ service }}' in the url below
# NOTE: this syntax may only work for host
view: hoststatus
site: cmk
# set downtime eg. minutes=10&_down_from_now=yes
# or to remove: remove=Remove+all
downtime: minutes=10&_down_from_now=yes
username: automation
checkmk_api_token: { YOUR automation secret }
comment: "Systemupdate with Ansible"
In the tasks, add the downtime step
tasks:
- name: Set downtime in monitoring
# Delegate to a host that can reach the monitoring server
delegate_to: localhost
uri:
url: "{{ checkmk_url }}{{ required_fields }}&host={{ inventory_hostname }}&site={{ site }}&view_name={{ view }}&_down_{{ downtime }}&_down_comment={{ comment | replace(' ', '%20') }}&_username={{ username }}&_secret={{ checkmk_api_token }}"
Ansible Playbook to update Debian or Ubuntu hosts with a downtime
This example checks if any package updates are available and if so, creates a downtime for the next 10 minutes on the site ‘cmk’ before running the package upgrade process and removal of old and unused dependencies or kernels. In the last step it reboots the target host if required and exits as soon as the host is reachable again by testing the uptime command on it.
You need to
- set your monitoring server url in checkmk_url
- set the secret of the Checkmk automation user (See in Checkmk under Users > automation > secret)
- specify the downtime duration in downtime
---
- name: "Update apt packages on host"
hosts: '{{ target }}'
vars:
checkmk_url: https://{ YOUR MONITORING SERVER }/{ YOUR SITE }/check_mk/view.py?
required_fields: _do_confirm=yes&_transid=-1&_do_actions=yes
# if host, use hoststatus, if service, specify service and add '&service={{ service }}' in the url below
# NOTE: this syntax may only work for host
view: hoststatus
site: cmk
# set downtime eg. minutes=10&_down_from_now=yes
# or to remove: remove=Remove+all
downtime: minutes=10&_down_from_now=yes
username: automation
checkmk_api_token: { YOUR automation secret }
comment: "Systemupdate with Ansible"
become: true
become_user: root
tasks:
- block:
- name: Update repo cache
apt:
update_cache: yes
- name: Check for available updates
command: apt list --upgradable
register: updates
changed_when: false
- block:
- name: Set downtime in monitoring
# Delegate to a host that can reach the monitoring server
delegate_to: localhost
uri:
url: "{{ checkmk_url }}{{ required_fields }}&host={{ inventory_hostname }}&site={{ site }}&view_name={{ view }}&_down_{{ downtime }}&_down_comment={{ comment | replace(' ', '%20') }}&_username={{ username }}&_secret={{ checkmk_api_token }}"
- name: Update and upgrade apt packages
apt:
update_cache: yes
upgrade: dist
cache_valid_time: 3600
- name: Remove old and unused dependencies
apt:
autoremove: true
autoclean: true
- name: Check if reboot required
register: reboot_required_file
stat: path=/var/run/reboot-required get_md5=no
- name: Reboot machine
reboot:
msg: "Reboot initiated by Ansible due to kernel updates"
connect_timeout: 5
reboot_timeout: 300
pre_reboot_delay: 0
post_reboot_delay: 30
test_command: uptime
when: reboot_required_file.stat.exists
when: updates.stdout != "Listing..."