I enabled Nagios checks for free disk space on a group of servers today, and was hit with alerts containing the following error message:
DISK CRITICAL - /sys/kernel/config is not accessible: Permission denied
If you are looking for a solution, skip to the end. Some of my mistakes before finding the solution may be interesting though!
The wrong solution
Permission denied? I had a hunch SELinux was behind this. SELinux is behind every unexpected permissions problem lately. But before jumping to any conclusions, Google the error message. It’s rare to run across a problem that no one else has had before.
Sure enough, I found a RedHat bug report describing the same issue with the check_disk plugin. The developers closed the bug, saying that a new version has been released, and if it is still a problem someone should open a new bug for the new version. Initially I thought that was a terrible assumption. “We released a new version and have not confirmed whether or not this is still a bug. Therefore this isn’t a bug unless you do the work to re-report it.” Now that I have determined that it is not and likely never was a bug, I’m not sure I feel the same.
The bug report mentions a workaround for the Nagios check_disk failure. It is in fact a successful workaround. I don’t entirely like it, but assuming that only the root user can modify /usr/lib64/nagios/plugins/check_disk
it seems like an acceptable risk. Still, recall that one of the benefits of SELinux is that even if another process owned by a user is compromised it doesn’t mean everything owned by that user gets compromised.
Compare before and after:
$ ls --context /usr/lib64/nagios/plugins/check_disk
-rwxr-xr-x. root root system_u:object_r:nagios_checkdisk_plugin_exec_t:s0 /usr/lib64/nagios/plugins/check_disk
$ sudo chcon -t nagios_unconfined_plugin_exec_t /usr/lib64/nagios/plugins/check_disk
$ ls --context /usr/lib64/nagios/plugins/check_disk
-rwxr-xr-x. root root system_u:object_r:nagios_unconfined_plugin_exec_t:s0 /usr/lib64/nagios/plugins/check_disk
The SELinux type is now set to nagios_unconfined_plugin_exec_t
.
Can I set the SELinux context via Ansible?
I need to make this change on several servers, and I need the change to be documented and repeatable, so I need to see how to best make that happen via Ansible.
The Ansible sefcontext module says it’s similar to the semanage fcontext
command, so it seemed like a good choice.
First I tried the semange fcontext
command directly:
$ sudo semanage fcontext -m -t nagios_unconfined_plugin_exec_t /usr/lib64/nagios/plugins/check_disk
ValueError: File spec /usr/lib64/nagios/plugins/check_disk conflicts with equivalency rule '/usr/lib64 /usr/lib'; Try adding '/usr/lib/nagios/plugins/check_disk' instead
$ sudo semanage fcontext -m -t nagios_unconfined_plugin_exec_t /usr/lib/nagios/plugins/check_disk
ValueError: File context for /usr/lib/nagios/plugins/check_disk is not defined
$ sudo semanage fcontext --list /usr/lib64/nagios/plugins/check_disk | grep check_disk
/usr/lib/nagios/plugins/check_disk regular file system_u:object_r:nagios_checkdisk_plugin_exec_t:s0
/usr/lib/nagios/plugins/check_disk_smb regular file system_u:object_r:nagios_checkdisk_plugin_exec_t:s0
The file looked like it had a defined context, didn’t it? A comment on the Fedora SELinux support list had good advice:
If the file context is not already defined in your local modification, you need to add is [sic], not modify
I tried again, adding instead of modifying, and comparing context before and after the change:
$ ls --context /usr/lib64/nagios/plugins/check_disk
-rwxr-xr-x. root root system_u:object_r:nagios_checkdisk_plugin_exec_t:s0 /usr/lib64/nagios/plugins/check_disk
$ sudo semanage fcontext -a -t nagios_unconfined_plugin_exec_t /usr/lib/nagios/plugins/check_disk
$ sudo restorecon /usr/lib64/nagios/plugins/check_disk
$ ls --context /usr/lib64/nagios/plugins/check_disk
-rwxr-xr-x. root root system_u:object_r:nagios_unconfined_plugin_exec_t:s0 /usr/lib64/nagios/plugins/check_disk
OK! That looks good. Now to do the same with Ansible. An excerpt of my Ansible role is below:
- name: Allow Nagios to execute check_disk (change SELinux type)
sefcontext:
# Use lib, not lib64
# SELinux defines the equivalency rule '/usr/lib64 /usr/lib'
target: /usr/lib/nagios/plugins/check_disk
setype: nagios_unconfined_plugin_exec_t
state: present
That didn’t work though. See the context returned below:
$ ls --context /usr/lib64/nagios/plugins/check_disk
-rwxr-xr-x. root root system_u:object_r:nagios_checkdisk_plugin_exec_t:s0 /usr/lib64/nagios/plugins/check_disk
Maybe it doesn’t run restorecon
? A thread on the Ansible Project Google Group explains that “reload SELinux policy after commit” is not the same as restorecon
.
The sefcontext module is roughly the functionality that ‘semanage fcontext’ provides you. It allows you to add SELinux file context mappings to the internal database.
Now, the module is not intended to change file contexts based on the mapping, just like ‘semanage fcontext’ does not do. (See man semanage)
As you said, you can do this with restorecon, or the file module….
The Ansible files module! I gave it a try. Here’s an excerpt of my new Ansible role:
- name: Allow Nagios to execute check_disk (change SELinux type)
file:
path: /usr/lib64/nagios/plugins/check_disk
setype: nagios_unconfined_plugin_exec_t
Check the SELinux context:
$ ls --context /usr/lib64/nagios/plugins/check_disk
-rwxr-xr-x. root root system_u:object_r:nagios_unconfined_plugin_exec_t:s0 /usr/lib64/nagios/plugins/check_disk
That worked! That was easy! I use the Ansible files module all the time and didn’t even know that option was there!
But wait. Have I just solved the wrong problem?
The real solution
Why does the error message say /sys/kernel/config is not accessible
? That isn’t one of the disks in my system, is it?
Turns out, it is. Run mount
to see all the filesystems. There are more of them than you might guess:
$ mount | grep /sys/kernel/config
configfs on /sys/kernel/config type configfs (rw,relatime)
What is configfs
?
configfs is a ram-based filesystem that…is a filesystem-based manager of kernel objects
(from configfs.txt)
That’s the real problem. I’m checking a filesystem that I didn’t intend to, and one that Nagios probably shouldn’t have access to.
I checked the default config in /etc/nagios/nrpe.cfg
and found these (the latter is commented-out by default, as shown below):
command[check_hda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
#command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
My config had a custom definition in a cfg file in /etc/nrpe.d
:
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 10 -c 5 -X devfs -X$
The above excludes devfs
filesystems, which don’t even exist on my CentOS 7 hosts. The -X$
looks like a mistake, possibly a copy-paste error of a line truncated by the terminal.
I checked and confirmed that even without the bad exclusions, the check_disk command still produced the same error:
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 10 -c 5
I created a revised definition that excludes the problematic filesystem:
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 10% -c 5% -X configfs
It works! No more errors.
It might be better, of course, to include only the filesystems I expect to be checking:
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 10% -c 5% -N xfs
The above also works.
Be sure to see the check_disk plugin docs for the full list of parameters. It isn’t completely obvious, but the second example implies that, unless otherwise specified, check_disk
tries to check all available filesystems.
Summary
- Read the error messages.
- Understand the error messages.
- Google the error messages, but the results don’t mean much if no one else read and understood the error messages.
- Don’t change SELinux settings unless absolutely necessary. The defaults are there for a reason.
Thank you for taking the time to document your experience and your discovery process and, particularly for drawing attention to the points in your closing summary. I should add to point 3 something like: “Understand the solution found by googling before you apply it – a good solution to the author’s problem is not always a good solution to your problem.”
Ironically, a colleague advised me, based on this post: “the engineers responsible for those servers don’t want us to run check_disk sudo root so we have to use ‘-x ext4’ to exclude the problem filesystem.” It just happens that the only ext4 fs on the box is the one with the application installed on it!
My challenge is different to yours, the solution does not apply, but your post has given me some ideas how to target the investigation…
… and the result: not related to selinux at all, simple file permissions issue.
I knew that check_disk doesn’t need read access to the filesystem to check it but, it seems it does need permissions to traverse the file hierarchy up to the mount point! Df exhibits similar behaviour, BTW.
Fact – a lot of details are changing. It is a pity that the documentation does not contain it. The most painful transition was from 3 to 4 nagios. I spent two weeks repairing configuration files.
Good work!