Nagios alert: CRITICAL: No response from NTP server

One of a pair of new hosts was causing the following Nagios alert today:

CRITICAL: No response from NTP server

Both of the new systems have the same configuration in theory, but based on the different results something clearly was overlooked.

I tried running NTP from the Nagios host:

Host 1

$ check_ntp -H ephemeralbox1.osric.net -w 0.1 -c 0.2
NTP OK: Offset -0.02545583248 secs|offset=-0.025456s;0.100000;0.200000;

Host 2

$ check_ntp -H ephemeralbox2.osric.net -w 0.1 -c 0.2
CRITICAL: No response from NTP server

The iptables rules look the same on both. The hosts are all on the same LAN, so there’s no firewall in the way.

Both systems are running chronyd:

Host 1

[chris@ephemeralbox1 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'
ActiveState=active
SubState=running

Host 2

[chris@ephemeralbox2 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'
ActiveState=active
SubState=running

Both systems are listening on port 123:

Host 1

[chris@ephemeralbox1 ssh]$ sudo lsof -i :123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 3027 chrony 3u IPv4 1095448 0t0 UDP *:ntp

Host 2

[chris@ephemeralbox2 ssh]$ sudo lsof -i :123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 1241 chrony 3u IPv4 51276 0t0 UDP *:ntp

Finally, I found it. In the obvious place that perhaps I should have looked first. The /etc/chrony.conf file on Host 2 was missing the allow line for the Nagios host:

# Allow NTP client access from Nagios host
allow 192.168.100.100

And the first place I looked was iptables. Blame the firewall, after all. The configurations were both pushed to these systems via Ansible playbooks, but apparently I had not included the role that updates the chrony.conf file on the 2nd host. Looks like I need configuration management management!

yum Error: requested datatype primary not available

I ran into a new-to-me yum error earlier today:

$ yum --quiet check-updates
Error: requested datatype primary not available

Following the tips on Unix & Linux StackExchange: Error: requested datatype primary not available, I:

  • ran yum clean all
  • disabled repositories one at a time to identify the repo that was causing the error

In my case, it turned out to be the extras repo. The following did not produce any errors:

$ yum --quiet --disablerepo=extras check-updates

What is wrong with the extras repo? It is defined in /etc/yum.repos.d/CentOS-Base.repo, so I took a look at what was there:

[extras]
name=CentOS-$releasever - Extras
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
#baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

None of that looked unusual (or had changed recently), so back to Google.

I tried excluding the specific mirror that was listed for the extras repo (http://mirrors.unifiedlayer.com/centos/7.4.1708/extras/x86_64/) by adding unifiedlayer.com to the exclude line in /etc/yum/pluginconf.d/fastestmirror.conf, as described in yum and fastestmirror plugin. Although yum appeared to pick a different mirror it still gave me the same error.

It turns out, the mirror in question was “poisoned” (rerouted) by my DNS servers, as it had been identified (possibly erroneously) as malicious. As such, the domain still resolved but the path to the CentOS repository did not exist.

I didn’t think that excluding the domain in fastestmirror.conf was having the intended effect, and yum was still trying to contact the bad mirror. I took the following steps, which resolved the error, although I can’t say I entirely understand why:

$ sudo yum makecache

This still produced the error.

I removed the bad entry from:

/var/cache/yum/x86_64/7/extras/mirrorlist.txt

Then I ran makecache again:

$ sudo yum makecache

No error this time! I tried running check-update:

$ yum check-update

No error!

Shouldn’t yum clean all have eliminated the bad cache value in /var/cache/yum/x86_64/7/extras/mirrorlist.txt?

Cache invalidation, one of the hard problems. At least I have steps to take if I run into this problem again.

Using nc (netcat) to make an HTTP request

I must have had some reason for wanting to do this, although I can’t think of why right now. curl is an excellent tool for ad hoc HTTP requests.

On a server running Apache 2.4.6, first I tried:

# nc 127.0.0.1 80
GET / HTTP/1.1

Which returned a HTTP/1.1 400 Bad Request error.

Next I tried:

# printf "GET /index.html HTTP/1.1\r\n\r\n" | nc 127.0.0.1 80

Which also returned a HTTP/1.1 400 Bad Request error.

I decided to take a look at what curl was sending, since that was working:

# curl -v http://127.0.0.1
* About to connect() to 127.0.0.1 port 80 (#0)
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1
> Accept: */*
...

I put the same headers (with a modified User-Agent) into my printf statement:

# printf "GET /index.html HTTP/1.1\r\nUser-Agent: nc/0.0.1\r\nHost: 127.0.0.1\r\nAccept: */*\r\n\r\n" | nc 127.0.0.1 80
HTTP/1.1 200 OK
Date: Sun, 28 Jan 2018 23:11:04 GMT
Server: Apache/2.4.6 (CentOS) PHP/5.4.16
Last-Modified: Sun, 28 Jan 2018 20:10:37 GMT
ETag: "78-563dbb912bfe0"
Accept-Ranges: bytes
Content-Length: 120
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<html>
<head>
<title>well that worked</title>
</head>
<body>
<h1>apache is running</h1>
</body>
</html>

That worked!

I eliminated the User-Agent the Accept headers and it still worked, so the missing Host header was the cause of my problems. I swear I’ve done this before without a Host header though.

I looked up the HTTP specification, and as described in section 5.2 of the RFC:

1. If Request-URI is an absoluteURI, the host is part of the Request-URI. Any Host header field value in the request MUST be ignored.

2. If the Request-URI is not an absoluteURI, and the request includes a Host header field, the host is determined by the Host header field value.

3. If the host as determined by rule 1 or 2 is not a valid host on the server, the response MUST be a 400 (Bad Request) error message.

Recipients of an HTTP/1.0 request that lacks a Host header field MAY attempt to use heuristics (e.g., examination of the URI path for something unique to a particular host) in order to determine what exact resource is being requested.

I could not get it to work with an absoluteURI, even using the example in the RFC. However I did find that I could ignore the Host header if I specified HTTP/1.0:

# printf "GET / HTTP/1.0\r\n\r\n" | nc 127.0.0.1 80

I also found that Apache didn’t care what the Host header was when using HTTP/1.1, just so long as something was there:

# printf "GET / HTTP/1.1\r\nHost: z\r\n\r\n" | nc 127.0.0.1 80

That’s a little odd. I did not specify a ServerName in my Apache config, but even after I specified ServerName 127.0.0.1:80 in /etc/httpd/conf/httpd.conf and restarted Apache, it still required the Host header and it still didn’t care what the content of the Host header was (so long as it was not empty).

Using jshint with Travis CI

I thought I’d give Travis CI a try. It’s a tool that hooks into GitHub and runs automated tests on your code every time you push a commit. I found a straightforward tutorial that basically said I’d need 2 files in addition to my existing code:

  1. .travis.yml
  2. package.json

Simple! I configured Travis CI to run JSHint (a Javascript code linter, similar to JSLint) on the Javascript files in my Simple Steganography project.

I pushed a commit and discovered my files did not pass JSHint. However, I thought they should be. My files were previously configured for JSLint, and JSHint can read JSLint directives. At the top of my Javascript files I had the directives:

/*jslint
bitwise, browser
*/

These indicate that bitwise operators should be allowed, and to assume the code is running in a web browser.

JSHint (via Travis-CI) reported the following errors:

js/decode.js: line 3, col 1, Bad option value.
js/decode.js: line 3, col 1, Bad option value.

The JSHint docs on inline configuration indicate that the formatting should be option: Boolean, so I reformatted the configuration directives:

/*jslint
bitwise: false,
browser: true
*/

JSHint (via Travis-CI) reported several errors regarding bitwise operators:

js/encode.js: line 179, col 37, Unexpected use of '&'.
js/encode.js: line 179, col 42, Unexpected use of '^'.
js/encode.js: line 179, col 53, Unexpected use of '&'.
js/encode.js: line 180, col 51, Unexpected use of '|'.
js/encode.js: line 182, col 51, Unexpected use of '&'.

I changed the JSHint configuration directives to:

/*jslint
bitwise: true,
browser: true
*/

This worked, and my JSHint tests passed! I added a build status image to my GitHub repo:
GitHub build status for Simple Steganography project

The page describing the JSHint Options definitely leads me to believe that enabling the bitwise option would enforce bitwise errors. This is the opposite of the behavior I’m seeing. The problem is either with the documentation, the behavior, or my reading comprehension! I opened a GitHub issue on the JSHint project describing what I experienced.

SELinux, audit2why, audit2allow, and policy files

I’m no expert on SELinux, but I cringe whenever I read an online tutorial that includes the step Disable SELinux.

I ran into such a problem recently when I was installing Icinga. The service failed to start because of permissions issues creating the process ID (PID) file. One site suggested disabling SELinux, but I thought it was time to learn to update SELinux’s Type Enforcement (TE) policies instead.

First, I needed the audit2why tool, to explain what was being blocked and why:

# yum -q provides audit2why
policycoreutils-python-2.5-17.1.el7.x86_64 : SELinux policy core python
                                           : utilities
Repo        : base
Matched from:
Filename    : /usr/bin/audit2allow

I installed the policycoreutils-python package (and dependencies):

# yum install policycoreutils-python

I then ran audit2why against the audit log:

# audit2why -i /var/log/audit/audit.log
type=AVC msg=audit(1510711476.690:132): avc:  denied  { chown } for  pid=2459 comm="icinga" capability=0  scontext=system_u:system_r:nagios_t:s0 tcontext=system_u:system_r:nagios_t:s0 tclass=capability

        Was caused by:
                Missing type enforcement (TE) allow rule.

                You can use audit2allow to generate a loadable module to allow this access.

type=AVC msg=audit(1510711476.724:134): avc:  denied  { read write } for  pid=2465 comm="icinga" name="icinga.pid" dev="tmpfs" ino=19128 scontext=system_u:system_r:nagios_t:s0 tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file

        Was caused by:
                Missing type enforcement (TE) allow rule.

                You can use audit2allow to generate a loadable module to allow this access.

That’s still a little opaque. It’s not entirely clear to me why chown was blocked, for example. Look at the following specifics:

scontext=system_u:system_r:nagios_t:s0
tcontext=system_u:system_r:nagios_t:s0

To help decode that:

  • scontext = Source Context
  • tcontext = Target Context
  • _u:_r:_t:s# = user:role:type:security level

The source and target contexts are identical, and so it seems to me that the command should be allowed. But let’s try audit2allow and see what that tells us:

# audit2allow -i /var/log/audit/audit.log


#============= nagios_t ==============
allow nagios_t initrc_var_run_t:file { lock open read write };
allow nagios_t self:capability chown;

It is unclear to me how broad the first rule is: does it allow the nagios type (nagios_t) access to all initrc_var_run_t files? If so, that’s probably too broad. As the man page warns:

Care must be exercised while acting on the output of  this  utility  to
ensure  that  the  operations  being  permitted  do not pose a security
threat. Often it is better to define new domains and/or types, or  make
other structural changes to narrowly allow an optimal set of operations
to succeed, as opposed to  blindly  implementing  the  sometimes  broad
changes  recommended  by this utility.

That’s fairly terrifying. Although if the alternative is disabling SELinux completely, an overly broad SELinux policy is not the worst thing in the world.

So audit2allow provided a couple rules. Now what? Fortunately the audit2why and audit2allow man pages both include details on how to incorporate the rules into your SELinux policy. First, generate a new type enforcement policy:

# audit2allow -i /var/log/audit/audit.log --module local > local.te

This includes some extra information in addition to the default output:

# cat local.te

module local 1.0;

require {
        type nagios_t;
        type initrc_var_run_t;
        class capability chown;
        class file { lock open read write };
}

#============= nagios_t ==============
allow nagios_t initrc_var_run_t:file { lock open read write };
allow nagios_t self:capability chown;

Next the man page says:

# SELinux provides a policy devel environment under
# /usr/share/selinux/devel including all of the shipped
# interface files.
# You can create a te file and compile it by executing

$ make -f /usr/share/selinux/devel/Makefile local.pp

However, my system had no /usr/share/selinux/devel directory:

# ls /usr/share/selinux/
packages  targeted

I needed to install the policycoreutils-devel package (and dependencies):

# yum install policycoreutils-devel

Now compile the policy file to a binary:

# make -f /usr/share/selinux/devel/Makefile local.pp
Compiling targeted local module
/usr/bin/checkmodule:  loading policy configuration from tmp/local.tmp
/usr/bin/checkmodule:  policy configuration loaded
/usr/bin/checkmodule:  writing binary representation (version 17) to tmp/local.mod
Creating targeted local.pp policy package
rm tmp/local.mod.fc tmp/local.mod

Now install it using the semodule command:

# semodule -i local.pp

Did that solve the problem?

# systemctl start icinga
# systemctl status icinga
● icinga.service - LSB: start and stop Icinga monitoring daemon
   Loaded: loaded (/etc/rc.d/init.d/icinga; bad; vendor preset: disabled)
   Active: active (running) since Tue 2017-11-14 22:35:23 EST; 6s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 2661 ExecStop=/etc/rc.d/init.d/icinga stop (code=exited, status=0/SUCCESS)
  Process: 3838 ExecStart=/etc/rc.d/init.d/icinga start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/icinga.service
           └─3850 /usr/bin/icinga -d /etc/icinga/icinga.cfg

Nov 14 22:35:23 localhost.localdomain systemd[1]: Starting LSB: start and sto...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Running configuration che...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Icinga with PID  not runn...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Starting icinga: Starting...
Nov 14 22:35:23 localhost.localdomain systemd[1]: Started LSB: start and stop...
Nov 14 22:35:23 localhost.localdomain icinga[3850]: Finished daemonizing... (...
Nov 14 22:35:23 localhost.localdomain icinga[3850]: Event loop started...
Hint: Some lines were ellipsized, use -l to show in full.

It worked! The permissions issues were resolved without resorting to disabling SELinux.

There is still more I need to understand about SELinux, but it’s a start.

Additional reading:
CentOS: SELinux Policy Overview

ipa-server-upgrade: IPv6 stack is enabled in the kernel but there is no interface that has ::1 address assigned

I applied the latest CentOS updates, as usual. It included a kernel update, so I rebooted the system:

$ sudo yum update -y
$ sudo reboot

After reboot, ipactl showed that FreeIPA was not running:

$ sudo ipactl status
Directory Service: STOPPED
Directory Service must be running in order to obtain status of other services
ipa: INFO: The ipactl command was successful

I tried to start it:

$ sudo ipactl start
Upgrade required: please run ipa-server-upgrade command
Aborting ipactl

I tried running ipa-server-upgrade:

$ sudo ipa-server-upgrade
IPv6 stack is enabled in the kernel but there is no interface that has ::1 address assigned. Add ::1 address resolution to 'lo' interface. You might need to enable IPv6 on the interface 'lo' in sysctl.conf.
The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information

I had previously disabled IPv6 in /etc/sysctl.conf and removed the ::1 entry from /etc/hosts.

I added the localhost entry back to /etc/hosts:

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

I removed the statements disabling IPv6 from /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

I rebooted for good measure, but even after reboot ipa-server-upgrade produced the same error. Indeed, IPv6 is not enabled:

$ ping6 ::1
connect: No route to host
$ ping6 localhost
connect: No route to host
$ sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1

That makes sense. Merely removing the lines setting IPv6 to disabled didn’t actually do anything to re-enable it.

$ sudo sysctl net.ipv6.conf.all.disable_ipv6=0
net.ipv6.conf.all.disable_ipv6 = 0
$ sudo sysctl net.ipv6.conf.lo.disable_ipv6=0
net.ipv6.conf.lo.disable_ipv6 = 0

After that change, ping6 ::1 and ping6 localhost worked as expected. I left IPv6 disabled on the default interface, but noticed in ifconfig that eth0 had picked up an IPv6 address, so I disabled that:

$ sudo sysctl net.ipv6.conf.eth0.disable_ipv6=1

I also added that same line to /etc/sysctl.conf.

I ran the upgrade again:

$ sudo ipa-server-upgrade
Upgrading IPA:. Estimated time: 1 minute 30 seconds
...
...
...
The IPA services were upgraded
The ipa-server-upgrade command was successful

And started FreeIPA:

$ sudo ipactl start
Starting Directory Service
Starting krb5kdc Service
Starting kadmin Service
Starting httpd Service
Starting ipa-custodia Service
Starting ntpd Service
Starting pki-tomcatd Service
Starting ipa-otpd Service
ipa: INFO: The ipactl command was successful

Success! And apparently disabling IPv6 is not the best idea.

FreeIPA connection check passes, but then fails during install

One of my FreeIPA servers is on a VM that’s too small and I’ve been having problems with it. I should have known that anything that runs Java and Tomcat should have double the processing power, double the memory, and double the drive space of whatever I think it should have. Rather than merely adjust the VM settings though, I thought I would spin up a new VM with better specs and create a new replica. Should be easy, right?

I created a new CentOS 7 VM, trinculo.osric.net, and installed ipa-server 4.5.0:

$ sudo yum install ipa-server

I checked the connection from the replica target to the master:

$ sudo ipa-replica-conncheck --master=ariel.osric.net

Likewise I checked the connection from the master to the replica target:

$ sudo ipa-replica-conncheck --replica=trinculo.osric.net

Everything was successful, so on the existing master I created the replica file:

$ sudo ipa-replica-prepare --ip-address=192.168.0.101 trinculo.osric.net

I copied that over to the replica target, but the replica installer indicated a failed connection check:

$ sudo ipa-replica-install /root/replica-info-trinculo.osric.net.gpg --ip-address=192.168.0.101
...
ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): ERROR    Connection check failed!
See /var/log/ipareplica-conncheck.log for more information.
If the check results are not valid it can be skipped with --skip-conncheck parameter.

A failed connection check when the connection checks passed? Continue reading FreeIPA connection check passes, but then fails during install

Reset the iDRAC administrator password via ipmitool

In the previous post, I configured the iDRAC interface on a Dell server using ipmitool on CentOS. However, I ran into a problem, which I blame on poor user interface design:

When you log into the iDRAC web interface as root/calvin, it warns you that you are using the default username/password and prompts you to change the password. I did so by generating a random password in my password manager and pasting it into the password field.

The problem? The password can contain at most 20 characters, a limitation that is not obvious from the web interface. The password field on the iDRAC web interface truncates the password at 20 characters, and so I submitted a partial password. Then later, when I attempting to log it using the password saved in my password manager, it didn’t match. (For reasons that aren’t clear to me, submitting just the first 20 characters of the password saved in the password manager did not work either.)

I figured I was stuck and would have to go to the data center, reboot the server, and boot into the Lifecycle Controller in order to reset the iDRAC password. But I thought I’d see what I could do via ipmitool first.

From Configuring DRAC with ipmitool and ipmitool Cheatsheet:

Reset BMC/DRAC to default:

$ sudo ipmitool mc reset cold

The command was successful, but that did not reset the password for me.

From Resetting the BMC:

…you can reset the BMC to factory defaults with IPMICFG or ipmitool. Be aware that this will wipe any existing settings on the BMC that you may have set from the web interface, but excludes network settings.

# ipmitool raw 0x3c 0x40

But that did not work for me, and produced an error code. I spent some time trying to determine what the various raw hex values for ipmi meant, but that was not productive.

Eventually though I did hit upon an ipmitool command that worked:

$ sudo ipmitool user list 1
ID  Name	     Callin  Link Auth	IPMI Msg   Channel Priv Limit
1                    true    false      false      NO ACCESS
2   superuser        true    true       true       ADMINISTRATOR
3                    true    false      false      NO ACCESS
etc.

The username I configured corresponds with ID 2, so then I used ipmitool to set the password for that user:

$ sudo ipmitool user set password 2

I was prompted to enter the password, which I was then able to use to log in to the iDRAC web interface.

Using ipmitool to configure Dell iDRAC

I have a number of Dell servers in a remote data center, so I wanted to configure the iDRAC interface in order to power on the systems remotely, get troubleshooting info for Dell, etc., without going to the data center myself. I’ve never configured iDRAC except through the Lifecycle Controller via a crash-cart on bootup. I thought that I would be spending all day in the data center getting everything configured, but when I mentioned this to another sysadmin he said, “Just use ipmitool.”

I had no idea such a tool existed!

First, I installed ipmitool (I’m using CentOS):

sudo yum install ipmitool

I found a helpful website: ipmitool Cheatsheet and Configuring DRAC from ipmitool

I was a little skeptical, but I read through (most) of the ipmitool man page to make sure I had a reasonable idea what the commands would do, and then I tried one. And immediately received an error message:

$ ipmitool lan set 1 ipsrc static
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

I checked and found that the path listed does exist:

$ ls /dev/ipmi*
/dev/ipmi0

Then it hit me: I need to be superuser, don’t I? That worked!

sudo ipmitool lan set 1 ipsrc static
sudo ipmitool lan set 1 ipaddr 192.168.100.1
sudo ipmitool lan set 1 netmask 255.255.255.0
sudo ipmitool lan set 1 defgw ipaddr 192.168.100.254

I was then able to connect to the IP address in a browser (it warned me there was an untrusted certificate, and I added it as a permanent exception in the browser.)

The default username/password was root/calvin. I changed both the username and password right away. Even though I have the iDRAC interfaces on an RFC 1918 subnet and behind a firewall, why take the risk of keeping the default values?

As I discovered though, pay attention to the iDRAC password restrictions. Otherwise you may need to use ipmitool to reset the iDRAC admin password.

Ansible conditional check failed

I wanted to add a check to one of my Ansible roles so that an application source would be copied and the source recompiled only if no current version existed or if the existing version did not match the expected version:

- name: Check to see if app is installed and the expected version
  command: /usr/local/app/bin/app --version
  register: version_check
  ignore_errors: True
  changed_when: "version_check.rc != 0 or {{ target_version }} not in version_check.stdout"

- name: include app install
  include: tasks/install.yml
  when: "version_check.rc != 0 or {{ target_version }} not in version_check.stdout"

I defined the target version in my role’s defaults/main.yml:

---
target_version: "2.5.2"
...

The first time I ran it, I encountered an error:

fatal: [trinculo.osric.net]: FAILED! => {"failed": true, "msg": "The conditional check 'version_check.rc != 0 or {{ target_version }} not in version_check.stdout' failed. The error was: error while evaluating conditional (version_check.rc != 0 or {{ target_version }} not in version_check.stdout): Unable to look up a name or access an attribute in template string ({% if version_check.rc != 0 or 2.5.2 not in version_check.stdout %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': coercing to Unicode: need string or buffer, StrictUndefined found"}

It’s a little unclear what is wrong, so I figured it was likely an issue with quotes or a lack of parentheses.

First I tried parentheses:

changed_when: "version_check.rc != 0 or ({{ target_version }} not in version_check.stdout)"

No luck.

changed_when: "version_check.rc != 0 or !('{{ target_version }}' in version_check.stdout)"

You know, trying to google and or not or or or is is tricky. Even if you add terms like Boolean logic or propositional calculus.

I tried to break it down into smaller parts:

changed_when: "version_check.rc != 0"

That worked.

changed_when: "!('{{ target_version }}' in version_check.stdout)"

A different error appeared:

template error while templating string: unexpected char u'!'

OK, that’s getting somewhere! Try a variation:

changed_when: "'{{ target_version }}' not in version_check.stdout"

It worked! But with a warning:

[WARNING]: when statements should not include jinja2 templating delimiters
such as {{ }} or {% %}. Found: ('{{ target_version }}' not in version_check.stdout)

Next try:

changed_when: "target_version not in version_check.stdout"

That worked, and without any warnings. I put the or back in:

changed_when: "version_check.rc != 0 or target_version not in version_check.stdout"

That worked! It was the jinja2 delimiters the whole time. The value of the changed_when key is already interpreted as jinja2 apparently, so the delimiters were redundant. Even though it succeeded (with a warning) in a single propositional statement, it failed when the logical disjunction was added. It was an important reminder: error messages aren’t perfect.