Installing Ansible role dependencies

I have a monolithic Ansible playbook that contains dozens of different roles, all bundled into the same Git repository. Some of the roles are more generically useful than others, so I thought I would do some refactoring.

I decided to move the role that installs and configures fail2ban to its own repository, and then call that new/refactored role as a dependency in my now-slightly-less-monolithic role.

Of course, I had no idea what I was doing.
Continue reading Installing Ansible role dependencies

Using Ansible to check version before install or upgrade

One thing that I do frequently with an Ansible role is check to see if software is already installed and at the desired version. I do this for several related reasons:

  1. To avoid taking extra time and doing extra work
  2. To make the role idempotent (changes are only made if changes are needed)
  3. So that the play recap summary lists accurate results

I’m thinking particularly of software that needs to be unpacked, configured, compiled, and installed (rather than .rpm or .deb packages). In this example, I’ll be installing the fictional widgetizer software.

First I add a couple variables to the defaults/main.yml file for the role:

path_to_widgetizer: /usr/local/bin/widgetizer
widgetizer_target_version: 1.2

Next I add a task to see if the installed binary already exists:

- name: check for existing widgetizer install
    path: "{{ path_to_widgetizer }}"
  register: result_a
  tags: widgetizer

Then, if widgetizer is installed, I check which version is installed:

- name: check widgetizer version
  command: "{{ path_to_widgetizer }} --version"
  register: result_b
  when: "result_a.stat.exists"
  changed_when: False
  failed_when: False
  tags: widgetizer

2 things to note in the above:

  • The command task normally reports changed: true, so specify changed_when: False to prevent this.
  • Although this task should only run if widgetizer is present, we don’t want the task (and therefore the entire playbook) to fail if it is not present. Specify failed_when: false to prevent this. (I could also specify ignore_errors: true, which would report the error but would not prevent the rest of the playbook from running.)

Now I can check the registered variables to determine if widgetizer needs to be installed or upgraded:

- name: install/upgrade widgetizer, if needed
  include: tasks/install.yml
  when: "not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout"
  tags: widgetizer

However, when I ran my playbook I received an error:

$ ansible-playbook -i hosts site.yaml --limit localhost --tags widgetizer


fatal: [localhost]: FAILED! => {"failed": true, "msg": "The conditional check 'not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout' failed. The error was: Unexpected templating type error occurred on ({% if not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout %} True {% else %} False {% endif %}): coercing to Unicode: need string or buffer, float found\n\nThe error appears to have been in '/home/chris/projectz/roles/widgetizer/tasks/install.yml': line 3, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: copy widgetizer source\n  ^ here\n"}

The key piece of information to note in that error message is:

need string or buffer, float found

We’ve supplied widgetizer_target_version as 1.2 (a floating point number), but Python/jinja2 wants a string to search for in result_b.stdout.

There are at least 2 ways to fix this:

  • Enclose the value in quotes to specify widgetizer_target_version as a string in the variable definition, e.g. widgetizer_target_version: "1.2"
  • Convert widgetizer_target_version to a string in the when statement, e.g. widgetizer_target_version|string not in result_b.stdout

After making either of those changes, the playbook runs successfully and correctly includes or ignores the install.yml file as appropriate.

Ansible unarchive module error: path does not exist

I was working on deploying files to a host via Ansible’s unarchive module when I ran into an error message:

path /tmp/datafiles/ does not exist

Here’s the relevant portion of my Ansible role’s task/main.yml:

- name: copy datafiles
    src: datafiles.tar.gz
    dest: /tmp
    owner: root
    group: datauser

Here’s the full result of running that task:

TASK [datafiles : copy datafiles] *******************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "path /tmp/datafiles/ does not exist", "path": "/tmp/datafiles/", "state": "absent"}

The error message confused me. The datafiles directory shouldn’t need to exist!

The problem was completely unrelated to the error message. I had specified a group, datauser, that did not exist on the target host. Once I removed the group parameter, the task ran without error. (Another option would be to ensure that the specified group exists on the target host.)

Analyzing text to find common terms using Python and NLTK

I just recently started playing with the Python NLTK (Natural Language ToolKit) to analyze text. The book Natural Language Processing with Python is available online and is very helpful if you’re just getting started.

At the beginning of the book the examples cover importing and analyzing text (primarily books) that you import from nltk (Getting Started with NLTK). It includes texts like Moby-Dick and Sense and Sensibility.

But you will probably want to analyze a source of your own. For example, I had text from a series of tweets debating political issues. The third chapter (Accessing Text from the Web and from Disk) has the answers:

First you need to turn raw text into tokens:

tokens = word_tokenize(raw)

Next turn your tokens into NLTK text:

text = nltk.Text(tokens)

Now you can treat it like the book examples in chapter 1.

I was analyzing a number number of tweets. One of the things I wanted to do was find common words in the tweets, to see if there were particular keywords that were common.

I was using the Python interpreter for my tests, and I did run into a couple errors with word_tokenize and later FreqDist, such as:

>>> fdist1 = FreqDist(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'FreqDist' is not defined

You can address this by importing the specific libraries:

>>> from nltk import FreqDist

Here are the commands, in order, that I ran to produce my list of common words — in this case, I was looking for words that appeared at least 3 times and that were at least 5 characters long:

>>> import nltk
>>> from nltk import word_tokenize
>>> from nltk import FreqDist

>>> with open("corpus-twitter", "r") as myfile:
...     raw ="utf8")

>>> tokens = word_tokenize(raw)
>>> text = nltk.Text(tokens)

>>> fdist = FreqDist(text)
>>> sorted(w for w in set(text) if len(w) >= 5 and fdist[w] >= 3)

[u'Americans', u'Detroit', u'Please', u'TaxReform', u'Thanks', u'There', u'Trump', u'about', u'against', u'always', u'anyone', u'argument', u'because', u'being', u'believe', u'context', u'could', u'debate', u'defend', u'diluted', u'dollars', u'enough', u'every', u'going', u'happened', u'heard', u'human', u'ideas', u'immigration', u'indefensible', u'logic', u'never', u'opinion', u'people', u'point', u'pragmatic', u'problem', u'problems', u'proposed', u'public', u'question', u'really', u'restricting', u'right', u'saying', u'school', u'scope', u'serious', u'should', u'solution', u'still', u'talking', u'their', u'there', u'think', u'thinking', u'thread', u'times', u'truth', u'trying', u'tweet', u'understand', u'until', u'welfare', u'where', u'world', u'would', u'wrong', u'years', u'yesterday']

It turns out the results weren’t as interesting as I’d hoped. A few interesting items–Detroit for example–but most of the words aren’t surprising given I was looking at tweets around political debate. Perhaps with a larger corpus there would be more stand-out words.

Nagios alert: CRITICAL: No response from NTP server

One of a pair of new hosts was causing the following Nagios alert today:

CRITICAL: No response from NTP server

Both of the new systems have the same configuration in theory, but based on the different results something clearly was overlooked.

I tried running NTP from the Nagios host:

Host 1

$ check_ntp -H -w 0.1 -c 0.2
NTP OK: Offset -0.02545583248 secs|offset=-0.025456s;0.100000;0.200000;

Host 2

$ check_ntp -H -w 0.1 -c 0.2
CRITICAL: No response from NTP server

The iptables rules look the same on both. The hosts are all on the same LAN, so there’s no firewall in the way.

Both systems are running chronyd:

Host 1

[chris@ephemeralbox1 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'

Host 2

[chris@ephemeralbox2 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'

Both systems are listening on port 123:

Host 1

[chris@ephemeralbox1 ssh]$ sudo lsof -i :123
chronyd 3027 chrony 3u IPv4 1095448 0t0 UDP *:ntp

Host 2

[chris@ephemeralbox2 ssh]$ sudo lsof -i :123
chronyd 1241 chrony 3u IPv4 51276 0t0 UDP *:ntp

Finally, I found it. In the obvious place that perhaps I should have looked first. The /etc/chrony.conf file on Host 2 was missing the allow line for the Nagios host:

# Allow NTP client access from Nagios host

And the first place I looked was iptables. Blame the firewall, after all. The configurations were both pushed to these systems via Ansible playbooks, but apparently I had not included the role that updates the chrony.conf file on the 2nd host. Looks like I need configuration management management!

yum Error: requested datatype primary not available

I ran into a new-to-me yum error earlier today:

$ yum --quiet check-updates
Error: requested datatype primary not available

Following the tips on Unix & Linux StackExchange: Error: requested datatype primary not available, I:

  • ran yum clean all
  • disabled repositories one at a time to identify the repo that was causing the error

In my case, it turned out to be the extras repo. The following did not produce any errors:

$ yum --quiet --disablerepo=extras check-updates

What is wrong with the extras repo? It is defined in /etc/yum.repos.d/CentOS-Base.repo, so I took a look at what was there:

name=CentOS-$releasever - Extras

None of that looked unusual (or had changed recently), so back to Google.

I tried excluding the specific mirror that was listed for the extras repo ( by adding to the exclude line in /etc/yum/pluginconf.d/fastestmirror.conf, as described in yum and fastestmirror plugin. Although yum appeared to pick a different mirror it still gave me the same error.

It turns out, the mirror in question was “poisoned” (rerouted) by my DNS servers, as it had been identified (possibly erroneously) as malicious. As such, the domain still resolved but the path to the CentOS repository did not exist.

I didn’t think that excluding the domain in fastestmirror.conf was having the intended effect, and yum was still trying to contact the bad mirror. I took the following steps, which resolved the error, although I can’t say I entirely understand why:

$ sudo yum makecache

This still produced the error.

I removed the bad entry from:


Then I ran makecache again:

$ sudo yum makecache

No error this time! I tried running check-update:

$ yum check-update

No error!

Shouldn’t yum clean all have eliminated the bad cache value in /var/cache/yum/x86_64/7/extras/mirrorlist.txt?

Cache invalidation, one of the hard problems. At least I have steps to take if I run into this problem again.

Using nc (netcat) to make an HTTP request

I must have had some reason for wanting to do this, although I can’t think of why right now. curl is an excellent tool for ad hoc HTTP requests.

On a server running Apache 2.4.6, first I tried:

# nc 80
GET / HTTP/1.1

Which returned a HTTP/1.1 400 Bad Request error.

Next I tried:

# printf "GET /index.html HTTP/1.1\r\n\r\n" | nc 80

Which also returned a HTTP/1.1 400 Bad Request error.

I decided to take a look at what curl was sending, since that was working:

# curl -v
* About to connect() to port 80 (#0)
* Trying
* Connected to ( port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host:
> Accept: */*

I put the same headers (with a modified User-Agent) into my printf statement:

# printf "GET /index.html HTTP/1.1\r\nUser-Agent: nc/0.0.1\r\nHost:\r\nAccept: */*\r\n\r\n" | nc 80
HTTP/1.1 200 OK
Date: Sun, 28 Jan 2018 23:11:04 GMT
Server: Apache/2.4.6 (CentOS) PHP/5.4.16
Last-Modified: Sun, 28 Jan 2018 20:10:37 GMT
ETag: "78-563dbb912bfe0"
Accept-Ranges: bytes
Content-Length: 120
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<title>well that worked</title>
<h1>apache is running</h1>

That worked!

I eliminated the User-Agent the Accept headers and it still worked, so the missing Host header was the cause of my problems. I swear I’ve done this before without a Host header though.

I looked up the HTTP specification, and as described in section 5.2 of the RFC:

1. If Request-URI is an absoluteURI, the host is part of the Request-URI. Any Host header field value in the request MUST be ignored.

2. If the Request-URI is not an absoluteURI, and the request includes a Host header field, the host is determined by the Host header field value.

3. If the host as determined by rule 1 or 2 is not a valid host on the server, the response MUST be a 400 (Bad Request) error message.

Recipients of an HTTP/1.0 request that lacks a Host header field MAY attempt to use heuristics (e.g., examination of the URI path for something unique to a particular host) in order to determine what exact resource is being requested.

I could not get it to work with an absoluteURI, even using the example in the RFC. However I did find that I could ignore the Host header if I specified HTTP/1.0:

# printf "GET / HTTP/1.0\r\n\r\n" | nc 80

I also found that Apache didn’t care what the Host header was when using HTTP/1.1, just so long as something was there:

# printf "GET / HTTP/1.1\r\nHost: z\r\n\r\n" | nc 80

That’s a little odd. I did not specify a ServerName in my Apache config, but even after I specified ServerName in /etc/httpd/conf/httpd.conf and restarted Apache, it still required the Host header and it still didn’t care what the content of the Host header was (so long as it was not empty).

Using jshint with Travis CI

I thought I’d give Travis CI a try. It’s a tool that hooks into GitHub and runs automated tests on your code every time you push a commit. I found a straightforward tutorial that basically said I’d need 2 files in addition to my existing code:

  1. .travis.yml
  2. package.json

Simple! I configured Travis CI to run JSHint (a Javascript code linter, similar to JSLint) on the Javascript files in my Simple Steganography project.

I pushed a commit and discovered my files did not pass JSHint. However, I thought they should be. My files were previously configured for JSLint, and JSHint can read JSLint directives. At the top of my Javascript files I had the directives:

bitwise, browser

These indicate that bitwise operators should be allowed, and to assume the code is running in a web browser.

JSHint (via Travis-CI) reported the following errors:

js/decode.js: line 3, col 1, Bad option value.
js/decode.js: line 3, col 1, Bad option value.

The JSHint docs on inline configuration indicate that the formatting should be option: Boolean, so I reformatted the configuration directives:

bitwise: false,
browser: true

JSHint (via Travis-CI) reported several errors regarding bitwise operators:

js/encode.js: line 179, col 37, Unexpected use of '&'.
js/encode.js: line 179, col 42, Unexpected use of '^'.
js/encode.js: line 179, col 53, Unexpected use of '&'.
js/encode.js: line 180, col 51, Unexpected use of '|'.
js/encode.js: line 182, col 51, Unexpected use of '&'.

I changed the JSHint configuration directives to:

bitwise: true,
browser: true

This worked, and my JSHint tests passed! I added a build status image to my GitHub repo:
GitHub build status for Simple Steganography project

The page describing the JSHint Options definitely leads me to believe that enabling the bitwise option would enforce bitwise errors. This is the opposite of the behavior I’m seeing. The problem is either with the documentation, the behavior, or my reading comprehension! I opened a GitHub issue on the JSHint project describing what I experienced.

SELinux, audit2why, audit2allow, and policy files

I’m no expert on SELinux, but I cringe whenever I read an online tutorial that includes the step Disable SELinux.

I ran into such a problem recently when I was installing Icinga. The service failed to start because of permissions issues creating the process ID (PID) file. One site suggested disabling SELinux, but I thought it was time to learn to update SELinux’s Type Enforcement (TE) policies instead.

First, I needed the audit2why tool, to explain what was being blocked and why:

# yum -q provides audit2why
policycoreutils-python-2.5-17.1.el7.x86_64 : SELinux policy core python
                                           : utilities
Repo        : base
Matched from:
Filename    : /usr/bin/audit2allow

I installed the policycoreutils-python package (and dependencies):

# yum install policycoreutils-python

I then ran audit2why against the audit log:

# audit2why -i /var/log/audit/audit.log
type=AVC msg=audit(1510711476.690:132): avc:  denied  { chown } for  pid=2459 comm="icinga" capability=0  scontext=system_u:system_r:nagios_t:s0 tcontext=system_u:system_r:nagios_t:s0 tclass=capability

        Was caused by:
                Missing type enforcement (TE) allow rule.

                You can use audit2allow to generate a loadable module to allow this access.

type=AVC msg=audit(1510711476.724:134): avc:  denied  { read write } for  pid=2465 comm="icinga" name="" dev="tmpfs" ino=19128 scontext=system_u:system_r:nagios_t:s0 tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file

        Was caused by:
                Missing type enforcement (TE) allow rule.

                You can use audit2allow to generate a loadable module to allow this access.

That’s still a little opaque. It’s not entirely clear to me why chown was blocked, for example. Look at the following specifics:


To help decode that:

  • scontext = Source Context
  • tcontext = Target Context
  • _u:_r:_t:s# = user:role:type:security level

The source and target contexts are identical, and so it seems to me that the command should be allowed. But let’s try audit2allow and see what that tells us:

# audit2allow -i /var/log/audit/audit.log

#============= nagios_t ==============
allow nagios_t initrc_var_run_t:file { lock open read write };
allow nagios_t self:capability chown;

It is unclear to me how broad the first rule is: does it allow the nagios type (nagios_t) access to all initrc_var_run_t files? If so, that’s probably too broad. As the man page warns:

Care must be exercised while acting on the output of  this  utility  to
ensure  that  the  operations  being  permitted  do not pose a security
threat. Often it is better to define new domains and/or types, or  make
other structural changes to narrowly allow an optimal set of operations
to succeed, as opposed to  blindly  implementing  the  sometimes  broad
changes  recommended  by this utility.

That’s fairly terrifying. Although if the alternative is disabling SELinux completely, an overly broad SELinux policy is not the worst thing in the world.

So audit2allow provided a couple rules. Now what? Fortunately the audit2why and audit2allow man pages both include details on how to incorporate the rules into your SELinux policy. First, generate a new type enforcement policy:

# audit2allow -i /var/log/audit/audit.log --module local > local.te

This includes some extra information in addition to the default output:

# cat local.te

module local 1.0;

require {
        type nagios_t;
        type initrc_var_run_t;
        class capability chown;
        class file { lock open read write };

#============= nagios_t ==============
allow nagios_t initrc_var_run_t:file { lock open read write };
allow nagios_t self:capability chown;

Next the man page says:

# SELinux provides a policy devel environment under
# /usr/share/selinux/devel including all of the shipped
# interface files.
# You can create a te file and compile it by executing

$ make -f /usr/share/selinux/devel/Makefile local.pp

However, my system had no /usr/share/selinux/devel directory:

# ls /usr/share/selinux/
packages  targeted

I needed to install the policycoreutils-devel package (and dependencies):

# yum install policycoreutils-devel

Now compile the policy file to a binary:

# make -f /usr/share/selinux/devel/Makefile local.pp
Compiling targeted local module
/usr/bin/checkmodule:  loading policy configuration from tmp/local.tmp
/usr/bin/checkmodule:  policy configuration loaded
/usr/bin/checkmodule:  writing binary representation (version 17) to tmp/local.mod
Creating targeted local.pp policy package
rm tmp/local.mod.fc tmp/local.mod

Now install it using the semodule command:

# semodule -i local.pp

Did that solve the problem?

# systemctl start icinga
# systemctl status icinga
● icinga.service - LSB: start and stop Icinga monitoring daemon
   Loaded: loaded (/etc/rc.d/init.d/icinga; bad; vendor preset: disabled)
   Active: active (running) since Tue 2017-11-14 22:35:23 EST; 6s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 2661 ExecStop=/etc/rc.d/init.d/icinga stop (code=exited, status=0/SUCCESS)
  Process: 3838 ExecStart=/etc/rc.d/init.d/icinga start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/icinga.service
           └─3850 /usr/bin/icinga -d /etc/icinga/icinga.cfg

Nov 14 22:35:23 localhost.localdomain systemd[1]: Starting LSB: start and sto...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Running configuration che...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Icinga with PID  not runn...
Nov 14 22:35:23 localhost.localdomain icinga[3838]: Starting icinga: Starting...
Nov 14 22:35:23 localhost.localdomain systemd[1]: Started LSB: start and stop...
Nov 14 22:35:23 localhost.localdomain icinga[3850]: Finished daemonizing... (...
Nov 14 22:35:23 localhost.localdomain icinga[3850]: Event loop started...
Hint: Some lines were ellipsized, use -l to show in full.

It worked! The permissions issues were resolved without resorting to disabling SELinux.

There is still more I need to understand about SELinux, but it’s a start.

Additional reading:
CentOS: SELinux Policy Overview

ipa-server-upgrade: IPv6 stack is enabled in the kernel but there is no interface that has ::1 address assigned

I applied the latest CentOS updates, as usual. It included a kernel update, so I rebooted the system:

$ sudo yum update -y
$ sudo reboot

After reboot, ipactl showed that FreeIPA was not running:

$ sudo ipactl status
Directory Service: STOPPED
Directory Service must be running in order to obtain status of other services
ipa: INFO: The ipactl command was successful

I tried to start it:

$ sudo ipactl start
Upgrade required: please run ipa-server-upgrade command
Aborting ipactl

I tried running ipa-server-upgrade:

$ sudo ipa-server-upgrade
IPv6 stack is enabled in the kernel but there is no interface that has ::1 address assigned. Add ::1 address resolution to 'lo' interface. You might need to enable IPv6 on the interface 'lo' in sysctl.conf.
The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information

I had previously disabled IPv6 in /etc/sysctl.conf and removed the ::1 entry from /etc/hosts.

I added the localhost entry back to /etc/hosts:

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

I removed the statements disabling IPv6 from /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

I rebooted for good measure, but even after reboot ipa-server-upgrade produced the same error. Indeed, IPv6 is not enabled:

$ ping6 ::1
connect: No route to host
$ ping6 localhost
connect: No route to host
$ sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1

That makes sense. Merely removing the lines setting IPv6 to disabled didn’t actually do anything to re-enable it.

$ sudo sysctl net.ipv6.conf.all.disable_ipv6=0
net.ipv6.conf.all.disable_ipv6 = 0
$ sudo sysctl net.ipv6.conf.lo.disable_ipv6=0
net.ipv6.conf.lo.disable_ipv6 = 0

After that change, ping6 ::1 and ping6 localhost worked as expected. I left IPv6 disabled on the default interface, but noticed in ifconfig that eth0 had picked up an IPv6 address, so I disabled that:

$ sudo sysctl net.ipv6.conf.eth0.disable_ipv6=1

I also added that same line to /etc/sysctl.conf.

I ran the upgrade again:

$ sudo ipa-server-upgrade
Upgrading IPA:. Estimated time: 1 minute 30 seconds
The IPA services were upgraded
The ipa-server-upgrade command was successful

And started FreeIPA:

$ sudo ipactl start
Starting Directory Service
Starting krb5kdc Service
Starting kadmin Service
Starting httpd Service
Starting ipa-custodia Service
Starting ntpd Service
Starting pki-tomcatd Service
Starting ipa-otpd Service
ipa: INFO: The ipactl command was successful

Success! And apparently disabling IPv6 is not the best idea.