One of a pair of new hosts was causing the following Nagios alert today:
CRITICAL: No response from NTP server
Both of the new systems have the same configuration in theory, but based on the different results something clearly was overlooked.
I tried running NTP from the Nagios host:
Host 1
$ check_ntp -H ephemeralbox1.osric.net -w 0.1 -c 0.2
NTP OK: Offset -0.02545583248 secs|offset=-0.025456s;0.100000;0.200000;
Host 2
$ check_ntp -H ephemeralbox2.osric.net -w 0.1 -c 0.2
CRITICAL: No response from NTP server
The iptables rules look the same on both. The hosts are all on the same LAN, so there’s no firewall in the way.
Both systems are running chronyd
:
Host 1
[chris@ephemeralbox1 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'
ActiveState=active
SubState=running
Host 2
[chris@ephemeralbox2 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'
ActiveState=active
SubState=running
Both systems are listening on port 123:
Host 1
[chris@ephemeralbox1 ssh]$ sudo lsof -i :123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 3027 chrony 3u IPv4 1095448 0t0 UDP *:ntp
Host 2
[chris@ephemeralbox2 ssh]$ sudo lsof -i :123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 1241 chrony 3u IPv4 51276 0t0 UDP *:ntp
Finally, I found it. In the obvious place that perhaps I should have looked first. The /etc/chrony.conf
file on Host 2 was missing the allow
line for the Nagios host:
# Allow NTP client access from Nagios host
allow 192.168.100.100
And the first place I looked was iptables. Blame the firewall, after all. The configurations were both pushed to these systems via Ansible playbooks, but apparently I had not included the role that updates the chrony.conf file on the 2nd host. Looks like I need configuration management management!