I set up a Nagios server on a CentOS 7 VM (Virtual Machine):
sudo yum install epel-release
sudo yum install nrpe
sudo yum install nagios
By default it sets up some basic checks for localhost. When I checked the Nagios site at http://127.0.0.1/nagios/, I found that even PING was critical:
(No output on stdout) stderr: execvp(/usr/lib64/nagios/plugins/check_ping, ...) failed. errno is 2: No such file or directory
I checked the contents of the plugins directory:
# ls /usr/lib64/nagios/plugins
eventhandlers negate urlize utils.sh
Sure enough, the usual suspects are not there. E.g.:
Eventually I stumbled onto the following document,
Nagios plugins for Fedora have all been packaged separately. For
example, to isntall the check_http just install nagios-plugins-http.
All plugins are installed in the architecture dependent directory
I installed some of the plugins following that convention:
sudo yum install nagios-plugins-load
sudo yum install nagios-plugins-ping
sudo yum install nagios-plugins-disk
sudo yum install nagios-plugins-http
sudo yum install nagios-plugins-procs
Now the the corresponding plugins exist in
/usr/lib64/nagios/plugins, and Nagios reports OK for those checks on localhost.
Recently, my VPS (Virtual Private Server) ran into some issues where it exceeded the maximum amount of RAM allotted under my subscription. When this happens, the web server software shuts down and does not restart until I manually restart it.
This is bad. I’m not always visiting my own web site, so it could be down for days without me knowing. Although I really need to identify what is using all the RAM, in the meantime I’ll settle for a monitoring system that will notify me when the server is down.
if curl -s --head http://osric.com/ | grep "200 OK" > /dev/null
echo "The HTTP server on osric.com is up!" > /dev/null
echo "The HTTP server on osric.com is down!"
cURL will let you retrieve a URL via the command line, and provides more options than Wget for a single URL. In this case, I used the silent switch to eliminate the status/progress output, and the head switch to retrieve only the document headers. The document header is then piped to Grep, which searches for the string “200 OK” (the HTTP status message for a successful request).
I send the result of that to /dev/null so that the output doesn’t appear on the screen.
If grep does find 200 OK, then I send a success message to /dev/null. This is largely unnecessary, but it is nice to leave in to test the script in a successful case–just remove the
> /dev/null. If it doesn’t find 200 OK, then there is a problem. It might not mean, necessarily, that the web server is down, but it definitely indicates there is a problem that needs to be identified.
I added a call to this script to a crontab to run every 5 minutes. If there is no output, nothing happens. If there is output, the output is sent to me via e-mail, which, assuming I am checking my e-mail religiously, should reduce server downtime.