Introduction

Checkmk 2 provides a useful HTTP API that can be used to do all sorts of things. The most useful feature for me is to te able to set a downtime on my hosts before installing updates on my Linux hosts with an Ansible Playbook.

See my Ansible Playbook for a full example to update a Debain/Ubuntu host.

Ansible Code to set a fixed downtime

Add the following vars to your Ansible Playbook

You need to

  • set your monitoring server and site in checkmk_url
  • set the secret of the Checkmk automation user in checkmk_api_token (See in Checkmk under Users > automation > secret)
  • specify the downtime duration in downtime

You can always check the API documentation of you installed Checkmk version in the left sidebar under Help > APIs

vars:
  checkmk_url: https://{ YOUR MONITORING SERVER }/{ YOUR SITE }/check_mk/view.py?
  required_fields: _do_confirm=yes&_transid=-1&_do_actions=yes
  # if host, use hoststatus, if service, specify service and add '&service={{ service }}' in the url below
  # NOTE: this syntax may only work for host
  view: hoststatus
  site: cmk
  # set downtime eg. minutes=10&_down_from_now=yes 
  # or to remove: remove=Remove+all
  downtime: minutes=10&_down_from_now=yes
  username: automation
  checkmk_api_token: { YOUR automation secret }
  comment: "Systemupdate with Ansible"

In the tasks, add the downtime step

tasks:
    - name: Set downtime in monitoring
        # Delegate to a host that can reach the monitoring server
        delegate_to: localhost
        uri:
            url: "{{ checkmk_url }}{{ required_fields }}&host={{ inventory_hostname }}&site={{ site }}&view_name={{ view }}&_down_{{ downtime }}&_down_comment={{ comment | replace(' ', '%20') }}&_username={{ username }}&_secret={{ checkmk_api_token }}"

Ansible Playbook to update Debian or Ubuntu hosts with a downtime

This example checks if any package updates are available and if so, creates a downtime for the next 10 minutes on the site ‘cmk’ before running the package upgrade process and removal of old and unused dependencies or kernels. In the last step it reboots the target host if required and exits as soon as the host is reachable again by testing the uptime command on it.

You need to

  • set your monitoring server url in checkmk_url
  • set the secret of the Checkmk automation user (See in Checkmk under Users > automation > secret)
  • specify the downtime duration in downtime
---
- name: "Update apt packages on host"
  hosts: '{{ target }}'
  vars:
    checkmk_url: https://{ YOUR MONITORING SERVER }/{ YOUR SITE }/check_mk/view.py?
    required_fields: _do_confirm=yes&_transid=-1&_do_actions=yes
    # if host, use hoststatus, if service, specify service and add '&service={{ service }}' in the url below
    # NOTE: this syntax may only work for host
    view: hoststatus
    site: cmk
    # set downtime eg. minutes=10&_down_from_now=yes 
    # or to remove: remove=Remove+all
    downtime: minutes=10&_down_from_now=yes
    username: automation
    checkmk_api_token: { YOUR automation secret }
    comment: "Systemupdate with Ansible"
  become: true
  become_user: root
  tasks:
    - block:
      - name: Update repo cache
        apt:
          update_cache: yes

      - name: Check for available updates
        command: apt list --upgradable
        register: updates
        changed_when: false

    - block:
      - name: Set downtime in monitoring
        # Delegate to a host that can reach the monitoring server
        delegate_to: localhost
        uri:
          url: "{{ checkmk_url }}{{ required_fields }}&host={{ inventory_hostname }}&site={{ site }}&view_name={{ view }}&_down_{{ downtime }}&_down_comment={{ comment | replace(' ', '%20') }}&_username={{ username }}&_secret={{ checkmk_api_token }}"
      - name: Update and upgrade apt packages
        apt:
          update_cache: yes
          upgrade: dist
          cache_valid_time: 3600

      - name: Remove old and unused dependencies
        apt:
          autoremove: true
          autoclean: true

      - name: Check if reboot required
        register: reboot_required_file
        stat: path=/var/run/reboot-required get_md5=no

      - name: Reboot machine
        reboot:
          msg: "Reboot initiated by Ansible due to kernel updates"
          connect_timeout: 5
          reboot_timeout: 300
          pre_reboot_delay: 0
          post_reboot_delay: 30
          test_command: uptime
        when: reboot_required_file.stat.exists

      when: updates.stdout != "Listing..."