vCenter 6.x: Unable to deploy template

I had a VM on vSphere 5.5 that I was trying to move to a new ESXi 6.7 host via vCenter.

First I exported the OVF template from vSphere 5.5.

Then in vCenter, on the vSphere 6.7 host, I deployed the OVF template.

Error!

The "Deploy OVF template" operation failed for the entity with the following error message.

Unable to deploy template.

The error was vague. Why was it unable to deploy the template? Is there a compatibility issue between 5.5 and 6.7?
Continue reading vCenter 6.x: Unable to deploy template

Re-bind host to FreeIPA

The sudo command on one particular FreeIPA-bound host was taking an exceedingly long time to run. And when it finally ran, it would not accept my current password, but rather my previous password — somehow still cached on the system. It was a strange problem.

Instead of trying to figure out exactly why it was happening, I decided to remove & re-bind the host to my FreeIPA domain.
Continue reading Re-bind host to FreeIPA

Nagios alert: CRITICAL: No response from NTP server

One of a pair of new hosts was causing the following Nagios alert today:

CRITICAL: No response from NTP server

Both of the new systems have the same configuration in theory, but based on the different results something clearly was overlooked.

I tried running NTP from the Nagios host:

Host 1

$ check_ntp -H ephemeralbox1.osric.net -w 0.1 -c 0.2
NTP OK: Offset -0.02545583248 secs|offset=-0.025456s;0.100000;0.200000;

Host 2

$ check_ntp -H ephemeralbox2.osric.net -w 0.1 -c 0.2
CRITICAL: No response from NTP server

The iptables rules look the same on both. The hosts are all on the same LAN, so there’s no firewall in the way.

Both systems are running chronyd:

Host 1

[chris@ephemeralbox1 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'
ActiveState=active
SubState=running

Host 2

[chris@ephemeralbox2 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'
ActiveState=active
SubState=running

Both systems are listening on port 123:

Host 1

[chris@ephemeralbox1 ssh]$ sudo lsof -i :123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 3027 chrony 3u IPv4 1095448 0t0 UDP *:ntp

Host 2

[chris@ephemeralbox2 ssh]$ sudo lsof -i :123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 1241 chrony 3u IPv4 51276 0t0 UDP *:ntp

Finally, I found it. In the obvious place that perhaps I should have looked first. The /etc/chrony.conf file on Host 2 was missing the allow line for the Nagios host:

# Allow NTP client access from Nagios host
allow 192.168.100.100

And the first place I looked was iptables. Blame the firewall, after all. The configurations were both pushed to these systems via Ansible playbooks, but apparently I had not included the role that updates the chrony.conf file on the 2nd host. Looks like I need configuration management management!

yum Error: requested datatype primary not available

I ran into a new-to-me yum error earlier today:

$ yum --quiet check-updates
Error: requested datatype primary not available

Following the tips on Unix & Linux StackExchange: Error: requested datatype primary not available, I:

  • ran yum clean all
  • disabled repositories one at a time to identify the repo that was causing the error

In my case, it turned out to be the extras repo. The following did not produce any errors:

$ yum --quiet --disablerepo=extras check-updates

What is wrong with the extras repo? It is defined in /etc/yum.repos.d/CentOS-Base.repo, so I took a look at what was there:

[extras]
name=CentOS-$releasever - Extras
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
#baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

None of that looked unusual (or had changed recently), so back to Google.

I tried excluding the specific mirror that was listed for the extras repo (http://mirrors.unifiedlayer.com/centos/7.4.1708/extras/x86_64/) by adding unifiedlayer.com to the exclude line in /etc/yum/pluginconf.d/fastestmirror.conf, as described in yum and fastestmirror plugin. Although yum appeared to pick a different mirror it still gave me the same error.

It turns out, the mirror in question was “poisoned” (rerouted) by my DNS servers, as it had been identified (possibly erroneously) as malicious. As such, the domain still resolved but the path to the CentOS repository did not exist.

I didn’t think that excluding the domain in fastestmirror.conf was having the intended effect, and yum was still trying to contact the bad mirror. I took the following steps, which resolved the error, although I can’t say I entirely understand why:

$ sudo yum makecache

This still produced the error.

I removed the bad entry from:

/var/cache/yum/x86_64/7/extras/mirrorlist.txt

Then I ran makecache again:

$ sudo yum makecache

No error this time! I tried running check-update:

$ yum check-update

No error!

Shouldn’t yum clean all have eliminated the bad cache value in /var/cache/yum/x86_64/7/extras/mirrorlist.txt?

Cache invalidation, one of the hard problems. At least I have steps to take if I run into this problem again.

SELinux, audit2why, audit2allow, and policy files

I’m no expert on SELinux, but I cringe whenever I read an online tutorial that includes the step Disable SELinux.

I ran into such a problem recently when I was installing Icinga. The service failed to start because of permissions issues creating the process ID (PID) file. One site suggested disabling SELinux, but I thought it was time to learn to update SELinux’s Type Enforcement (TE) policies instead.

First, I needed the audit2why tool, to explain what was being blocked and why:

# yum -q provides audit2why
policycoreutils-python-2.5-17.1.el7.x86_64 : SELinux policy core python
                                           : utilities
Repo        : base
Matched from:
Filename    : /usr/bin/audit2allow

I installed the policycoreutils-python package (and dependencies):

# yum install policycoreutils-python

I then ran audit2why against the audit log:

# audit2why -i /var/log/audit/audit.log
type=AVC msg=audit(1510711476.690:132): avc:  denied  { chown } for  pid=2459 comm="icinga" capability=0  scontext=system_u:system_r:nagios_t:s0 tcontext=system_u:system_r:nagios_t:s0 tclass=capability

        Was caused by:
                Missing type enforcement (TE) allow rule.

                You can use audit2allow to generate a loadable module to allow this access.

type=AVC msg=audit(1510711476.724:134): avc:  denied  { read write } for  pid=2465 comm="icinga" name="icinga.pid" dev="tmpfs" ino=19128 scontext=system_u:system_r:nagios_t:s0 tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file

        Was caused by:
                Missing type enforcement (TE) allow rule.

                You can use audit2allow to generate a loadable module to allow this access.

That’s still a little opaque. It’s not entirely clear to me why chown was blocked, for example. Look at the following specifics:

scontext=system_u:system_r:nagios_t:s0
tcontext=system_u:system_r:nagios_t:s0

To help decode that:

  • scontext = Source Context
  • tcontext = Target Context
  • _u:_r:_t:s# = user:role:type:security level

The source and target contexts are identical, and so it seems to me that the command should be allowed. But let’s try audit2allow and see what that tells us:

# audit2allow -i /var/log/audit/audit.log


#============= nagios_t ==============
allow nagios_t initrc_var_run_t:file { lock open read write };
allow nagios_t self:capability chown;

It is unclear to me how broad the first rule is: does it allow the nagios type (nagios_t) access to all initrc_var_run_t files? If so, that’s probably too broad. As the man page warns:

Care must be exercised while acting on the output of  this  utility  to
ensure  that  the  operations  being  permitted  do not pose a security
threat. Often it is better to define new domains and/or types, or  make
other structural changes to narrowly allow an optimal set of operations
to succeed, as opposed to  blindly  implementing  the  sometimes  broad
changes  recommended  by this utility.

That’s fairly terrifying. Although if the alternative is disabling SELinux completely, an overly broad SELinux policy is not the worst thing in the world.

So audit2allow provided a couple rules. Now what? Fortunately the audit2why and audit2allow man pages both include details on how to incorporate the rules into your SELinux policy. First, generate a new type enforcement policy:

# audit2allow -i /var/log/audit/audit.log --module local > local.te

This includes some extra information in addition to the default output:

# cat local.te

module local 1.0;

require {
        type nagios_t;
        type initrc_var_run_t;
        class capability chown;
        class file { lock open read write };
}

#============= nagios_t ==============
allow nagios_t initrc_var_run_t:file { lock open read write };
allow nagios_t self:capability chown;

Next the man page says:

# SELinux provides a policy devel environment under
# /usr/share/selinux/devel including all of the shipped
# interface files.
# You can create a te file and compile it by executing

$ make -f /usr/share/selinux/devel/Makefile local.pp

However, my system had no /usr/share/selinux/devel directory:

# ls /usr/share/selinux/
packages  targeted

I needed to install the policycoreutils-devel package (and dependencies):

# yum install policycoreutils-devel

Now compile the policy file to a binary:

# make -f /usr/share/selinux/devel/Makefile local.pp
Compiling targeted local module
/usr/bin/checkmodule:  loading policy configuration from tmp/local.tmp
/usr/bin/checkmodule:  policy configuration loaded
/usr/bin/checkmodule:  writing binary representation (version 17) to tmp/local.mod
Creating targeted local.pp policy package
rm tmp/local.mod.fc tmp/local.mod

Now install it using the semodule command:

# semodule -i local.pp

Did that solve the problem?

# systemctl start icinga
# systemctl status icinga
● icinga.service - LSB: start and stop Icinga monitoring daemon
   Loaded: loaded (/etc/rc.d/init.d/icinga; bad; vendor preset: disabled)
   Active: active (running) since Tue 2017-11-14 22:35:23 EST; 6s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 2661 ExecStop=/etc/rc.d/init.d/icinga stop (code=exited, status=0/SUCCESS)
  Process: 3838 ExecStart=/etc/rc.d/init.d/icinga start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/icinga.service
           └─3850 /usr/bin/icinga -d /etc/icinga/icinga.cfg

Nov 14 22:35:23 localhost.localdomain systemd[1]: Starting LSB: start and sto...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Running configuration che...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Icinga with PID  not runn...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Starting icinga: Starting...
Nov 14 22:35:23 localhost.localdomain systemd[1]: Started LSB: start and stop...
Nov 14 22:35:23 localhost.localdomain icinga[3850]: Finished daemonizing... (...
Nov 14 22:35:23 localhost.localdomain icinga[3850]: Event loop started...
Hint: Some lines were ellipsized, use -l to show in full.

It worked! The permissions issues were resolved without resorting to disabling SELinux.

There is still more I need to understand about SELinux, but it’s a start.

Additional reading:
CentOS: SELinux Policy Overview

ipa-server-upgrade: IPv6 stack is enabled in the kernel but there is no interface that has ::1 address assigned

I applied the latest CentOS updates, as usual. It included a kernel update, so I rebooted the system:

$ sudo yum update -y
$ sudo reboot

After reboot, ipactl showed that FreeIPA was not running:

$ sudo ipactl status
Directory Service: STOPPED
Directory Service must be running in order to obtain status of other services
ipa: INFO: The ipactl command was successful

I tried to start it:

$ sudo ipactl start
Upgrade required: please run ipa-server-upgrade command
Aborting ipactl

I tried running ipa-server-upgrade:

$ sudo ipa-server-upgrade
IPv6 stack is enabled in the kernel but there is no interface that has ::1 address assigned. Add ::1 address resolution to 'lo' interface. You might need to enable IPv6 on the interface 'lo' in sysctl.conf.
The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information

I had previously disabled IPv6 in /etc/sysctl.conf and removed the ::1 entry from /etc/hosts.

I added the localhost entry back to /etc/hosts:

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

I removed the statements disabling IPv6 from /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

I rebooted for good measure, but even after reboot ipa-server-upgrade produced the same error. Indeed, IPv6 is not enabled:

$ ping6 ::1
connect: No route to host
$ ping6 localhost
connect: No route to host
$ sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1

That makes sense. Merely removing the lines setting IPv6 to disabled didn’t actually do anything to re-enable it.

$ sudo sysctl net.ipv6.conf.all.disable_ipv6=0
net.ipv6.conf.all.disable_ipv6 = 0
$ sudo sysctl net.ipv6.conf.lo.disable_ipv6=0
net.ipv6.conf.lo.disable_ipv6 = 0

After that change, ping6 ::1 and ping6 localhost worked as expected. I left IPv6 disabled on the default interface, but noticed in ifconfig that eth0 had picked up an IPv6 address, so I disabled that:

$ sudo sysctl net.ipv6.conf.eth0.disable_ipv6=1

I also added that same line to /etc/sysctl.conf.

I ran the upgrade again:

$ sudo ipa-server-upgrade
Upgrading IPA:. Estimated time: 1 minute 30 seconds
...
...
...
The IPA services were upgraded
The ipa-server-upgrade command was successful

And started FreeIPA:

$ sudo ipactl start
Starting Directory Service
Starting krb5kdc Service
Starting kadmin Service
Starting httpd Service
Starting ipa-custodia Service
Starting ntpd Service
Starting pki-tomcatd Service
Starting ipa-otpd Service
ipa: INFO: The ipactl command was successful

Success! And apparently disabling IPv6 is not the best idea.

FreeIPA connection check passes, but then fails during install

One of my FreeIPA servers is on a VM that’s too small and I’ve been having problems with it. I should have known that anything that runs Java and Tomcat should have double the processing power, double the memory, and double the drive space of whatever I think it should have. Rather than merely adjust the VM settings though, I thought I would spin up a new VM with better specs and create a new replica. Should be easy, right?

I created a new CentOS 7 VM, trinculo.osric.net, and installed ipa-server 4.5.0:

$ sudo yum install ipa-server

I checked the connection from the replica target to the master:

$ sudo ipa-replica-conncheck --master=ariel.osric.net

Likewise I checked the connection from the master to the replica target:

$ sudo ipa-replica-conncheck --replica=trinculo.osric.net

Everything was successful, so on the existing master I created the replica file:

$ sudo ipa-replica-prepare --ip-address=192.168.0.101 trinculo.osric.net

I copied that over to the replica target, but the replica installer indicated a failed connection check:

$ sudo ipa-replica-install /root/replica-info-trinculo.osric.net.gpg --ip-address=192.168.0.101
...
ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): ERROR    Connection check failed!
See /var/log/ipareplica-conncheck.log for more information.
If the check results are not valid it can be skipped with --skip-conncheck parameter.

A failed connection check when the connection checks passed? Continue reading FreeIPA connection check passes, but then fails during install

Reset the iDRAC administrator password via ipmitool

In the previous post, I configured the iDRAC interface on a Dell server using ipmitool on CentOS. However, I ran into a problem, which I blame on poor user interface design:

When you log into the iDRAC web interface as root/calvin, it warns you that you are using the default username/password and prompts you to change the password. I did so by generating a random password in my password manager and pasting it into the password field.

The problem? The password can contain at most 20 characters, a limitation that is not obvious from the web interface. The password field on the iDRAC web interface truncates the password at 20 characters, and so I submitted a partial password. Then later, when I attempting to log it using the password saved in my password manager, it didn’t match. (For reasons that aren’t clear to me, submitting just the first 20 characters of the password saved in the password manager did not work either.)

I figured I was stuck and would have to go to the data center, reboot the server, and boot into the Lifecycle Controller in order to reset the iDRAC password. But I thought I’d see what I could do via ipmitool first.

From Configuring DRAC with ipmitool and ipmitool Cheatsheet:

Reset BMC/DRAC to default:

$ sudo ipmitool mc reset cold

The command was successful, but that did not reset the password for me.

From Resetting the BMC:

…you can reset the BMC to factory defaults with IPMICFG or ipmitool. Be aware that this will wipe any existing settings on the BMC that you may have set from the web interface, but excludes network settings.

# ipmitool raw 0x3c 0x40

But that did not work for me, and produced an error code. I spent some time trying to determine what the various raw hex values for ipmi meant, but that was not productive.

Eventually though I did hit upon an ipmitool command that worked:

$ sudo ipmitool user list 1
ID  Name	     Callin  Link Auth	IPMI Msg   Channel Priv Limit
1                    true    false      false      NO ACCESS
2   superuser        true    true       true       ADMINISTRATOR
3                    true    false      false      NO ACCESS
etc.

The username I configured corresponds with ID 2, so then I used ipmitool to set the password for that user:

$ sudo ipmitool user set password 2

I was prompted to enter the password, which I was then able to use to log in to the iDRAC web interface.

Using ipmitool to configure Dell iDRAC

I have a number of Dell servers in a remote data center, so I wanted to configure the iDRAC interface in order to power on the systems remotely, get troubleshooting info for Dell, etc., without going to the data center myself. I’ve never configured iDRAC except through the Lifecycle Controller via a crash-cart on bootup. I thought that I would be spending all day in the data center getting everything configured, but when I mentioned this to another sysadmin he said, “Just use ipmitool.”

I had no idea such a tool existed!

First, I installed ipmitool (I’m using CentOS):

sudo yum install ipmitool

I found a helpful website: ipmitool Cheatsheet and Configuring DRAC from ipmitool

I was a little skeptical, but I read through (most) of the ipmitool man page to make sure I had a reasonable idea what the commands would do, and then I tried one. And immediately received an error message:

$ ipmitool lan set 1 ipsrc static
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

I checked and found that the path listed does exist:

$ ls /dev/ipmi*
/dev/ipmi0

Then it hit me: I need to be superuser, don’t I? That worked!

sudo ipmitool lan set 1 ipsrc static
sudo ipmitool lan set 1 ipaddr 192.168.100.1
sudo ipmitool lan set 1 netmask 255.255.255.0
sudo ipmitool lan set 1 defgw ipaddr 192.168.100.254

I was then able to connect to the IP address in a browser (it warned me there was an untrusted certificate, and I added it as a permanent exception in the browser.)

The default username/password was root/calvin. I changed both the username and password right away. Even though I have the iDRAC interfaces on an RFC 1918 subnet and behind a firewall, why take the risk of keeping the default values?

As I discovered though, pay attention to the iDRAC password restrictions. Otherwise you may need to use ipmitool to reset the iDRAC admin password.

FreeIPA: Failed to start pki-tomcatd Service

After a recent CentOS update, FreeIPA 4.5 failed to start with the following error message:
Failed to start pki-tomcatd Service

What changed? The following were the 3 packages updated:

  • httpd.x86_64
  • httpd-tools.x86_64
  • mod_session.x86_64

I successfully restarted FreeIPA without the pki-tomcatd service:
$ sudo ipactl start --ignore-service-failure

But it’s not ideal to run it without the PKI service. What is going on? According to the log at /var/log/pki/pki-tomcat/ca/debug:

java.lang.Exception: Certificate auditSigningCert cert-pki-ca is invalid: Invalid certificate: (-8101) Certificate type not approved for application.

Which cert is that? Where is it? How did it get created? Didn’t FreeIPA create it? Why isn’t it valid? Why doesn’t it give me any additional info?

Eventually I found the certificate location (although I don’t recall how, likely a post on the FreeIPA mailing list):
/var/lib/pki/pki-tomcat/alias -> /etc/pki/pki-tomcat/alias

I ran certutil to find out more about the certificate:
$ certutil -L -d /etc/pki/pki-tomcat/alias
certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The certificate/key database is in an old, unsupported format.

That uninformative and misleading error message looked familiar to me. Indeed, I wrote a post about it 7 months ago:
certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The certificate/key database is in an old, unsupported format

$ sudo certutil -L -d /etc/pki/pki-tomcat/alias -n 'auditSigningCert cert-pki-ca'

The expiration date looked fine, which was the first thing I suspected.

I did note the following, which looked interesting:
Mozilla-CA-Policy: false (attribute missing)

But after reading about that at http://mozilla.github.io/ca-policy/ it looked like it shouldn’t be needed.

Fortunately, I have another working FreeIPA replica that I had not yet upgraded, so I compared the certificates on both systems:

On the IPA replica with errors:

$ sudo certutil -L -d /etc/pki/pki-tomcat/alias

Certificate Nickname                                         Trust Attributes
                                                             SSL,S/MIME,JAR/XPI

caSigningCert cert-pki-ca                                    CTu,Cu,Cu
auditSigningCert cert-pki-ca                                 u,u,u
ocspSigningCert cert-pki-ca                                  u,u,u
Server-Cert cert-pki-ca                                      u,u,u
subsystemCert cert-pki-ca                                    u,u,u

On the working IPA replica:

$ sudo certutil -L -d /etc/pki/pki-tomcat/alias

Certificate Nickname                                         Trust Attributes
                                                             SSL,S/MIME,JAR/XPI

caSigningCert cert-pki-ca                                    CTu,Cu,Cu
Server-Cert cert-pki-ca                                      u,u,u
auditSigningCert cert-pki-ca                                 u,u,Pu
ocspSigningCert cert-pki-ca                                  u,u,u
subsystemCert cert-pki-ca                                    u,u,u

Note the P trust attribute in the latter. What does it mean? From man certutil:

-t trustargs
           Specify the trust attributes to modify in an existing certificate
           or to apply to a certificate when creating it or adding it to a
           database. There are three available trust categories for each
           certificate, expressed in the order SSL, email, object signing for
           each trust setting. In each category position, use none, any, or
           all of the attribute codes:

           ·   p - Valid peer

           ·   P - Trusted peer (implies p)

           ·   c - Valid CA

           ·   C - Trusted CA (implies c)

           ·   T - trusted CA for client authentication (ssl server only)

I modified the trust attributes of the certificate accordingly:

$ sudo certutil -M -t ',,P' -d /etc/pki/pki-tomcat/alias -n 'auditSigningCert cert-pki-ca'

I tried restarting FreeIPA again:

$ sudo ipactl restart
Stopping pki-tomcatd Service
Restarting Directory Service
Restarting krb5kdc Service
Restarting kadmin Service
Restarting httpd Service
Restarting ipa-custodia Service
Restarting ntpd Service
Restarting pki-tomcatd Service
Restarting ipa-otpd Service
ipa: INFO: The ipactl command was successful

It worked!

But why? What does the trust attribute for JAR/XPI mean? I don’t really know — I suppose it means that that the Java code we’re running should trust the certificate. Since I didn’t have this problem when I upgraded the working replica, I’m guessing that I must have done something to change it (and break it) along the way. It likely had nothing to do with the CentOS updates I applied, but I just happened to run into the problem after restarting FreeIPA post-updates.