Icinga2 and PagerDuty integration

E-mail is not a good way to get my attention in a timely fashion. E-mail is inherently asynchronous, and healthy minds may ignore it for hours or even days at a time. So how do I handle monitoring alerts? One way is by using PagerDuty, a service that can call, text, or send push notifications to you (among other features).

I followed the steps at PagerDuty’s Icinga2 Integration Guide, but no alerts were coming through. What went wrong?
Continue reading Icinga2 and PagerDuty integration

Icinga2 role permissions, filters

I have Icinga2 and Icingaweb2 set up for monitoring hosts and services for myself, but I wanted to expand on my current configuration and let web developers manage monitoring for their assets (development and staging hosts and web servers).

webdev is the name of one of my host groups, defined in my /etc/icinga2/conf.d/groups.conf file:

object HostGroup "webdev" {
  display_name = "Web Development Hosts"
}

The hosts I want developers to be able to monitor are members of the webdev host group.

First I created a new role in the web interface under Configuration — Authentication — Roles:
Continue reading Icinga2 role permissions, filters

NTP checks with icinga2

On my new Icinga2 monitoring host, I am slowly adding additional service checks to achieve parity with my existing Nagios monitoring. Next on my list, implementing NTP checks. The first step was to add a new service check to the Icinga2 configuration:

/etc/icinga2/conf.d/services.cfg:

apply Service "ntp_time" {
  import "generic-service"
  check_command = "ntp_time"
  assign where host.vars.os == "Linux"
}

The service check produced an error, as seen in the icingaweb2 interface:

execvpe(/usr/lib64/nagios/plugins/check_ntp_time) failed: No such file or directory

Oh! I don’t have the appropriate Nagios plugin installed on the Icinga2 host.

sudo yum install nagios-plugins-ntp

The NTP service check now reports OK on some hosts, but on other hosts I get a different error:

CRITICAL: No response from NTP server

The hosts that did not receive a response are all using chronyd. I edited /etc/chrony.conf and added:

allow 192.168.46.46

And restarted chronyd:

systemctl restart chronyd

Now all but one host reports OK. The last remaining host to show an error? The Icinga2 host itself!

allow 127.0.0.1

Another chronyd restart, and the NTP service on all hosts reports OK.

NRPE: Unable to read output

This one was a real facepalm moment, but I thought I’d share in case anyone else runs into the same thing.

I’ve been working on migrating from Nagios to Icinga2. One of the services I monitor is whether or not a given host has any available yum updates. This service, which I label check_yum, worked on all my hosts except for the Icinga2 host. All the other services monitored on that host were working, but check_yum returned an error:

NRPE: Unable to read output

I tried running the test manually on the Icinga2 host:

/usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_yum
NRPE: Unable to read output

I checked to make sure NRPE was listening, in this case via xinetd:

lsof -i

I checked the service definition to see what script/plugin NRPE runs:

cat /etc/nrpe.d/check_yum.cfg
command[check_yum]=/usr/lib64/nagios/plugins/check_updates -w 0 -c 10 -t 60

I tried to run that manually and…the file /usr/lib64/nagios/plugins/check_updates did not exist.

I installed the corresponding yum package:

sudo yum install nagios-plugins-check-updates

Now it works! It was a reminder to myself to check the basics before trying to troubleshoot network issues.