Monitoring – The Accidental Developer

icinga2 and http_expect_body_regex

This check (along with the other accompanying http variables) tries to confirm that the page includes an IP address or subnet:

http_expect_body_regex = "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}($|\/)"

However, the updated conf file didn’t pass validation:
Continue reading icinga2 and http_expect_body_regex

Notifying a REST API from Icinga2

I wanted to send Icinga2 notifications to Slack. Some hosts and services don’t rise to the level of a PagerDuty notification, but e-mail just doesn’t cut it. A message in a Slack channel seemed an appropriate in-between.

This process is relatively straightforward, although I ran into some issues with SELinux that I will cover in this post.
Continue reading Notifying a REST API from Icinga2

Icinga2 and PagerDuty integration

E-mail is not a good way to get my attention in a timely fashion. E-mail is inherently asynchronous, and healthy minds may ignore it for hours or even days at a time. So how do I handle monitoring alerts? One way is by using PagerDuty, a service that can call, text, or send push notifications to you (among other features).

I followed the steps at PagerDuty’s Icinga2 Integration Guide, but no alerts were coming through. What went wrong?
Continue reading Icinga2 and PagerDuty integration

Icinga2 role permissions, filters

I have Icinga2 and Icingaweb2 set up for monitoring hosts and services for myself, but I wanted to expand on my current configuration and let web developers manage monitoring for their assets (development and staging hosts and web servers).

webdev is the name of one of my host groups, defined in my /etc/icinga2/conf.d/groups.conf file:

object HostGroup "webdev" {
  display_name = "Web Development Hosts"
}

The hosts I want developers to be able to monitor are members of the webdev host group.

First I created a new role in the web interface under Configuration — Authentication — Roles:
Continue reading Icinga2 role permissions, filters

NTP checks with icinga2

On my new Icinga2 monitoring host, I am slowly adding additional service checks to achieve parity with my existing Nagios monitoring. Next on my list, implementing NTP checks. The first step was to add a new service check to the Icinga2 configuration:

/etc/icinga2/conf.d/services.cfg:

apply Service "ntp_time" {
  import "generic-service"
  check_command = "ntp_time"
  assign where host.vars.os == "Linux"
}

The service check produced an error, as seen in the icingaweb2 interface:

execvpe(/usr/lib64/nagios/plugins/check_ntp_time) failed: No such file or directory

Oh! I don’t have the appropriate Nagios plugin installed on the Icinga2 host.

sudo yum install nagios-plugins-ntp

The NTP service check now reports OK on some hosts, but on other hosts I get a different error:

CRITICAL: No response from NTP server

The hosts that did not receive a response are all using chronyd. I edited /etc/chrony.conf and added:

allow 192.168.46.46

And restarted chronyd:

systemctl restart chronyd

Now all but one host reports OK. The last remaining host to show an error? The Icinga2 host itself!

allow 127.0.0.1

Another chronyd restart, and the NTP service on all hosts reports OK.

NRPE: Unable to read output

This one was a real facepalm moment, but I thought I’d share in case anyone else runs into the same thing.

I’ve been working on migrating from Nagios to Icinga2. One of the services I monitor is whether or not a given host has any available yum updates. This service, which I label check_yum, worked on all my hosts except for the Icinga2 host. All the other services monitored on that host were working, but check_yum returned an error:

NRPE: Unable to read output

I tried running the test manually on the Icinga2 host:

/usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_yum
NRPE: Unable to read output

I checked to make sure NRPE was listening, in this case via xinetd:

lsof -i

I checked the service definition to see what script/plugin NRPE runs:

cat /etc/nrpe.d/check_yum.cfg
command[check_yum]=/usr/lib64/nagios/plugins/check_updates -w 0 -c 10 -t 60

I tried to run that manually and…the file /usr/lib64/nagios/plugins/check_updates did not exist.

I installed the corresponding yum package:

sudo yum install nagios-plugins-check-updates

Now it works! It was a reminder to myself to check the basics before trying to troubleshoot network issues.

Nagios alert: CRITICAL: No response from NTP server

One of a pair of new hosts was causing the following Nagios alert today:

CRITICAL: No response from NTP server

Both of the new systems have the same configuration in theory, but based on the different results something clearly was overlooked.

I tried running NTP from the Nagios host:

Host 1

$ check_ntp -H ephemeralbox1.osric.net -w 0.1 -c 0.2 NTP OK: Offset -0.02545583248 secs|offset=-0.025456s;0.100000;0.200000;

Host 2

$ check_ntp -H ephemeralbox2.osric.net -w 0.1 -c 0.2 CRITICAL: No response from NTP server

The iptables rules look the same on both. The hosts are all on the same LAN, so there’s no firewall in the way.

Both systems are running chronyd:

Host 1

[chris@ephemeralbox1 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)' ActiveState=active SubState=running

Host 2

[chris@ephemeralbox2 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)' ActiveState=active SubState=running

Both systems are listening on port 123:

Host 1

[chris@ephemeralbox1 ssh]$ sudo lsof -i :123 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME chronyd 3027 chrony 3u IPv4 1095448 0t0 UDP *:ntp

Host 2

[chris@ephemeralbox2 ssh]$ sudo lsof -i :123 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME chronyd 1241 chrony 3u IPv4 51276 0t0 UDP *:ntp

Finally, I found it. In the obvious place that perhaps I should have looked first. The /etc/chrony.conf file on Host 2 was missing the allow line for the Nagios host:

# Allow NTP client access from Nagios host allow 192.168.100.100

And the first place I looked was iptables. Blame the firewall, after all. The configurations were both pushed to these systems via Ansible playbooks, but apparently I had not included the role that updates the chrony.conf file on the 2nd host. Looks like I need configuration management management!