{"id":2035,"date":"2017-09-09T00:22:47","date_gmt":"2017-09-09T05:22:47","guid":{"rendered":"http:\/\/osric.com\/chris\/accidental-developer\/?p=2035"},"modified":"2017-09-09T00:23:15","modified_gmt":"2017-09-09T05:23:15","slug":"nagios-check_disk-returns-disk-critical-sys-kernel-config-not-accessible-permission-denied","status":"publish","type":"post","link":"https:\/\/osric.com\/chris\/accidental-developer\/2017\/09\/nagios-check_disk-returns-disk-critical-sys-kernel-config-not-accessible-permission-denied\/","title":{"rendered":"Nagios check_disk returns DISK CRITICAL &#8211; \/sys\/kernel\/config is not accessible: Permission denied"},"content":{"rendered":"<p>I enabled Nagios checks for free disk space on a group of servers today, and was hit with alerts containing the following error message:<br \/>\n<code>DISK CRITICAL - \/sys\/kernel\/config is not accessible: Permission denied<\/code><\/p>\n<p>If you are looking for a solution, skip to the end. Some of my mistakes before finding the solution may be interesting though!<\/p>\n<p><!--more--><\/p>\n<p><strong>The wrong solution<\/strong><\/p>\n<p><em>Permission denied<\/em>? I had a hunch <a href=\"https:\/\/wiki.centos.org\/HowTos\/SELinux\">SELinux<\/a> was behind this. SELinux is behind every unexpected permissions problem lately. But before jumping to any conclusions, Google the error message. It&#8217;s rare to run across a problem that no one else has had before.<\/p>\n<p>Sure enough, I found a <a href=\"https:\/\/bugzilla.redhat.com\/show_bug.cgi?id=1255948\">RedHat bug report<\/a> describing the same issue with the check_disk plugin. The developers closed the bug, saying that a new version has been released, and if it is still a problem someone should open a new bug for the new version. Initially I thought that was a terrible assumption. &#8220;We released a new version and have not confirmed whether or not this is still a bug. Therefore this isn&#8217;t a bug unless you do the work to re-report it.&#8221; Now that I have determined that it is not and likely never was a bug, I&#8217;m not sure I feel the same.<\/p>\n<p>The bug report mentions a <a href=\"http:\/\/edvoncken.net\/2012\/01\/workaround-for-nagios-check_disk-failure-in-rhel-centos-6-2\/\">workaround for the Nagios check_disk failure<\/a>. It is in fact a successful workaround. I don&#8217;t entirely like it, but assuming that only the root user can modify <code>\/usr\/lib64\/nagios\/plugins\/check_disk<\/code> it seems like an acceptable risk. Still, recall that one of the benefits of SELinux is that even if another process owned by a user is compromised it doesn&#8217;t mean everything owned by that user gets compromised.<\/p>\n<p>Compare before and after:<\/p>\n<pre><code>$ ls --context \/usr\/lib64\/nagios\/plugins\/check_disk\r\n-rwxr-xr-x. root root system_u:object_r:nagios_checkdisk_plugin_exec_t:s0 \/usr\/lib64\/nagios\/plugins\/check_disk\r\n$ sudo chcon -t nagios_unconfined_plugin_exec_t \/usr\/lib64\/nagios\/plugins\/check_disk\r\n$ ls --context \/usr\/lib64\/nagios\/plugins\/check_disk\r\n-rwxr-xr-x. root root system_u:object_r:nagios_unconfined_plugin_exec_t:s0 \/usr\/lib64\/nagios\/plugins\/check_disk<\/code><\/pre>\n<p>The SELinux type is now set to <code>nagios_<strong>unconfined<\/strong>_plugin_exec_t<\/code>.<\/p>\n<p><strong>Can I set the SELinux context via Ansible?<\/strong><br \/>\nI need to make this change on several servers, and I need the change to be documented and repeatable, so I need to see how to best make that happen via Ansible.<\/p>\n<p>The Ansible <a href=\"http:\/\/docs.ansible.com\/ansible\/latest\/sefcontext_module.html\">sefcontext module<\/a> says it&#8217;s similar to the <code>semanage fcontext<\/code> command, so it seemed like a good choice.<\/p>\n<p>First I tried the <code>semange fcontext<\/code> command directly:<\/p>\n<pre><code>$ sudo semanage fcontext -m -t nagios_unconfined_plugin_exec_t \/usr\/lib64\/nagios\/plugins\/check_disk\r\nValueError: File spec \/usr\/lib64\/nagios\/plugins\/check_disk conflicts with equivalency rule '\/usr\/lib64 \/usr\/lib'; Try adding '\/usr\/lib\/nagios\/plugins\/check_disk' instead\r\n$ sudo semanage fcontext -m -t nagios_unconfined_plugin_exec_t \/usr\/lib\/nagios\/plugins\/check_disk\r\nValueError: File context for \/usr\/lib\/nagios\/plugins\/check_disk is not defined\r\n\r\n$ sudo semanage fcontext --list \/usr\/lib64\/nagios\/plugins\/check_disk | grep check_disk\r\n\/usr\/lib\/nagios\/plugins\/check_disk                 regular file       system_u:object_r:nagios_checkdisk_plugin_exec_t:s0 \r\n\/usr\/lib\/nagios\/plugins\/check_disk_smb             regular file       system_u:object_r:nagios_checkdisk_plugin_exec_t:s0<\/code><\/pre>\n<p>The file looked like it had a defined context, didn&#8217;t it? A <a href=\"https:\/\/lists.fedoraproject.org\/archives\/list\/selinux@lists.fedoraproject.org\/thread\/BKZKFZVIBH3KMWBYUMPGYPSQQGYD5T52\/\">comment on the Fedora SELinux support list<\/a> had good advice:<\/p>\n<blockquote><p>If the file context is not already defined in your local modification, you need to add is [sic], not modify<\/p><\/blockquote>\n<p>I tried again, <em>adding<\/em> instead of <em>modifying<\/em>, and comparing context before and after the change:<\/p>\n<pre><code>$ ls --context \/usr\/lib64\/nagios\/plugins\/check_disk\r\n-rwxr-xr-x. root root system_u:object_r:nagios_checkdisk_plugin_exec_t:s0 \/usr\/lib64\/nagios\/plugins\/check_disk\r\n$ sudo semanage fcontext -a -t nagios_unconfined_plugin_exec_t \/usr\/lib\/nagios\/plugins\/check_disk\r\n$ sudo restorecon \/usr\/lib64\/nagios\/plugins\/check_disk\r\n$ ls --context \/usr\/lib64\/nagios\/plugins\/check_disk\r\n-rwxr-xr-x. root root system_u:object_r:nagios_unconfined_plugin_exec_t:s0 \/usr\/lib64\/nagios\/plugins\/check_disk<\/code><\/pre>\n<p>OK! That looks good. Now to do the same with Ansible. An excerpt of my Ansible role is below:<\/p>\n<pre><code>- name: Allow Nagios to execute check_disk (change SELinux type)\r\n  sefcontext:\r\n    # Use lib, not lib64\r\n    # SELinux defines the equivalency rule '\/usr\/lib64 \/usr\/lib'\r\n    target: \/usr\/lib\/nagios\/plugins\/check_disk\r\n    setype: nagios_unconfined_plugin_exec_t\r\n    state: present<\/code><\/pre>\n<p>That didn&#8217;t work though. See the context returned below:<\/p>\n<pre><code>$ ls --context \/usr\/lib64\/nagios\/plugins\/check_disk\r\n-rwxr-xr-x. root root system_u:object_r:nagios_checkdisk_plugin_exec_t:s0 \/usr\/lib64\/nagios\/plugins\/check_disk<\/code><\/pre>\n<p>Maybe it doesn&#8217;t run <code>restorecon<\/code>? A <a href=\"https:\/\/groups.google.com\/forum\/#!topic\/ansible-project\/IcgnAekwsbA\">thread on the Ansible Project Google Group<\/a> explains that &#8220;reload SELinux policy after commit&#8221; is not the same as <code>restorecon<\/code>.<\/p>\n<blockquote><p>The sefcontext module is roughly the functionality that &#8216;semanage fcontext&#8217; provides you. It allows you to add SELinux file context mappings to the internal database.<\/p>\n<p>Now, the module is not intended to change file contexts based on the mapping, just like &#8216;semanage fcontext&#8217; does not do. (See man semanage)<\/p>\n<p>As you said, you can do this with restorecon, or the file module&#8230;.<\/p><\/blockquote>\n<p>The <a href=\"http:\/\/docs.ansible.com\/ansible\/latest\/file_module.html\">Ansible files module<\/a>! I gave it a try. Here&#8217;s an excerpt of my new Ansible role:<\/p>\n<pre><code>- name: Allow Nagios to execute check_disk (change SELinux type)\r\n  file:\r\n    path: \/usr\/lib64\/nagios\/plugins\/check_disk\r\n    setype: nagios_unconfined_plugin_exec_t<\/code><\/pre>\n<p>Check the SELinux context:<\/p>\n<pre><code>$ ls --context \/usr\/lib64\/nagios\/plugins\/check_disk\r\n-rwxr-xr-x. root root system_u:object_r:nagios_unconfined_plugin_exec_t:s0 \/usr\/lib64\/nagios\/plugins\/check_disk<\/code><\/pre>\n<p>That worked! That was easy! I use the Ansible files module all the time and didn&#8217;t even know that option was there!<\/p>\n<p>But wait. Have I just solved the wrong problem?<\/p>\n<p><strong>The real solution<\/strong><br \/>\nWhy does the error message say <code>\/sys\/kernel\/config is not accessible<\/code>? That isn&#8217;t one of the disks in my system, is it?<\/p>\n<p>Turns out, it is. Run <code>mount<\/code> to see all the filesystems. There are more of them than you might guess:<\/p>\n<pre><code>$ mount | grep \/sys\/kernel\/config\r\nconfigfs on \/sys\/kernel\/config type configfs (rw,relatime)<\/code><\/pre>\n<p>What is <code>configfs<\/code>?<\/p>\n<blockquote><p>configfs is a ram-based filesystem that&#8230;is a filesystem-based manager of kernel objects<\/p><\/blockquote>\n<p>(from <a href=\"https:\/\/www.kernel.org\/doc\/Documentation\/filesystems\/configfs\/configfs.txt\">configfs.txt<\/a>)<\/p>\n<p>That&#8217;s the real problem. I&#8217;m checking a filesystem that I didn&#8217;t intend to, and one that Nagios probably shouldn&#8217;t have access to.<\/p>\n<p>I checked the default config in <code>\/etc\/nagios\/nrpe.cfg<\/code> and found these (the latter is commented-out by default, as shown below):<\/p>\n<pre><code>command[check_hda1]=\/usr\/lib64\/nagios\/plugins\/check_disk -w 20% -c 10% -p \/dev\/hda1\r\n#command[check_disk]=\/usr\/lib64\/nagios\/plugins\/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$<\/code><\/pre>\n<p>My config had a custom definition in a cfg file in <code>\/etc\/nrpe.d<\/code>:<\/p>\n<pre><code>command[check_disk]=\/usr\/lib64\/nagios\/plugins\/check_disk -w 10 -c 5 -X devfs -X$<\/code><\/pre>\n<p>The above excludes <code>devfs<\/code> filesystems, which don&#8217;t even exist on my CentOS 7 hosts. The <code>-X$<\/code> looks like a mistake, possibly a copy-paste error of a line truncated by the terminal.<\/p>\n<p>I checked and confirmed that even without the bad exclusions, the check_disk command still produced the same error:<\/p>\n<pre><code>command[check_disk]=\/usr\/lib64\/nagios\/plugins\/check_disk -w 10 -c 5<\/code><\/pre>\n<p>I created a revised definition that excludes the problematic filesystem:<\/p>\n<pre><code>command[check_disk]=\/usr\/lib64\/nagios\/plugins\/check_disk -w 10% -c 5% -X configfs<\/code><\/pre>\n<p>It works! No more errors.<\/p>\n<p>It might be better, of course, to <em>include <\/em>only the filesystems I expect to be checking:<\/p>\n<pre><code>command[check_disk]=\/usr\/lib64\/nagios\/plugins\/check_disk -w 10% -c 5% -N xfs<\/code><\/pre>\n<p>The above also works.<\/p>\n<p>Be sure to see the <a href=\"https:\/\/www.monitoring-plugins.org\/doc\/man\/check_disk.html\">check_disk plugin docs<\/a> for the full list of parameters. It isn&#8217;t completely obvious, but the second example implies that, unless otherwise specified, <code>check_disk<\/code> tries to check <em>all<\/em> available filesystems.<\/p>\n<p><strong>Summary<\/strong><\/p>\n<ol>\n<li>Read the error messages.<\/li>\n<li>Understand the error messages.<\/li>\n<li>Google the error messages, but the results don&#8217;t mean much if no one else read and understood the error messages.<\/li>\n<li>Don&#8217;t change SELinux settings unless absolutely necessary. The defaults are there for a reason.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>The Nagios check_disk plugin returned the error &#8220;DISK CRITICAL &#8211; \/sys\/kernel\/config is not accessible: Permission denied&#8221;. At first I blamed SELinux and tried to change the SELinux context, but in reality I was using a bad Nagios command definition.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[439,422],"tags":[423,348,458],"class_list":["post-2035","post","type-post","status-publish","format-standard","hentry","category-ansible","category-sysadmin","tag-ansible","tag-nagios","tag-selinux"],"_links":{"self":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/2035","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/comments?post=2035"}],"version-history":[{"count":18,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/2035\/revisions"}],"predecessor-version":[{"id":2053,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/2035\/revisions\/2053"}],"wp:attachment":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/media?parent=2035"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/categories?post=2035"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/tags?post=2035"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}