Installing Ansible role dependencies

I have a monolithic Ansible playbook that contains dozens of different roles, all bundled into the same Git repository. Some of the roles are more generically useful than others, so I thought I would do some refactoring.

I decided to move the role that installs and configures fail2ban to its own repository, and then call that new/refactored role as a dependency in my now-slightly-less-monolithic role.

Of course, I had no idea what I was doing.
Continue reading Installing Ansible role dependencies

Using Ansible to check version before install or upgrade

One thing that I do frequently with an Ansible role is check to see if software is already installed and at the desired version. I do this for several related reasons:

  1. To avoid taking extra time and doing extra work
  2. To make the role idempotent (changes are only made if changes are needed)
  3. So that the play recap summary lists accurate results

I’m thinking particularly of software that needs to be unpacked, configured, compiled, and installed (rather than .rpm or .deb packages). In this example, I’ll be installing the fictional widgetizer software.

First I add a couple variables to the defaults/main.yml file for the role:

---
path_to_widgetizer: /usr/local/bin/widgetizer
widgetizer_target_version: 1.2
...

Next I add a task to see if the installed binary already exists:

- name: check for existing widgetizer install
  stat:
    path: "{{ path_to_widgetizer }}"
  register: result_a
  tags: widgetizer

Then, if widgetizer is installed, I check which version is installed:

- name: check widgetizer version
  command: "{{ path_to_widgetizer }} --version"
  register: result_b
  when: "result_a.stat.exists"
  changed_when: False
  failed_when: False
  tags: widgetizer

2 things to note in the above:

  • The command task normally reports changed: true, so specify changed_when: False to prevent this.
  • Although this task should only run if widgetizer is present, we don’t want the task (and therefore the entire playbook) to fail if it is not present. Specify failed_when: false to prevent this. (I could also specify ignore_errors: true, which would report the error but would not prevent the rest of the playbook from running.)

Now I can check the registered variables to determine if widgetizer needs to be installed or upgraded:

- name: install/upgrade widgetizer, if needed
  include: tasks/install.yml
  when: "not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout"
  tags: widgetizer

However, when I ran my playbook I received an error:

$ ansible-playbook -i hosts site.yaml --limit localhost --tags widgetizer

...

fatal: [localhost]: FAILED! => {"failed": true, "msg": "The conditional check 'not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout' failed. The error was: Unexpected templating type error occurred on ({% if not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout %} True {% else %} False {% endif %}): coercing to Unicode: need string or buffer, float found\n\nThe error appears to have been in '/home/chris/projectz/roles/widgetizer/tasks/install.yml': line 3, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: copy widgetizer source\n  ^ here\n"}

The key piece of information to note in that error message is:

need string or buffer, float found

We’ve supplied widgetizer_target_version as 1.2 (a floating point number), but Python/jinja2 wants a string to search for in result_b.stdout.

There are at least 2 ways to fix this:

  • Enclose the value in quotes to specify widgetizer_target_version as a string in the variable definition, e.g. widgetizer_target_version: "1.2"
  • Convert widgetizer_target_version to a string in the when statement, e.g. widgetizer_target_version|string not in result_b.stdout

After making either of those changes, the playbook runs successfully and correctly includes or ignores the install.yml file as appropriate.

Ansible unarchive module error: path does not exist

I was working on deploying files to a host via Ansible’s unarchive module when I ran into an error message:

path /tmp/datafiles/ does not exist

Here’s the relevant portion of my Ansible role’s task/main.yml:

- name: copy datafiles
  unarchive:
    src: datafiles.tar.gz
    dest: /tmp
    owner: root
    group: datauser

Here’s the full result of running that task:

TASK [datafiles : copy datafiles] *******************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "path /tmp/datafiles/ does not exist", "path": "/tmp/datafiles/", "state": "absent"}

The error message confused me. The datafiles directory shouldn’t need to exist!

The problem was completely unrelated to the error message. I had specified a group, datauser, that did not exist on the target host. Once I removed the group parameter, the task ran without error. (Another option would be to ensure that the specified group exists on the target host.)

Ansible conditional check failed

I wanted to add a check to one of my Ansible roles so that an application source would be copied and the source recompiled only if no current version existed or if the existing version did not match the expected version:

- name: Check to see if app is installed and the expected version
  command: /usr/local/app/bin/app --version
  register: version_check
  ignore_errors: True
  changed_when: "version_check.rc != 0 or {{ target_version }} not in version_check.stdout"

- name: include app install
  include: tasks/install.yml
  when: "version_check.rc != 0 or {{ target_version }} not in version_check.stdout"

I defined the target version in my role’s defaults/main.yml:

---
target_version: "2.5.2"
...

The first time I ran it, I encountered an error:

fatal: [trinculo.osric.net]: FAILED! => {"failed": true, "msg": "The conditional check 'version_check.rc != 0 or {{ target_version }} not in version_check.stdout' failed. The error was: error while evaluating conditional (version_check.rc != 0 or {{ target_version }} not in version_check.stdout): Unable to look up a name or access an attribute in template string ({% if version_check.rc != 0 or 2.5.2 not in version_check.stdout %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': coercing to Unicode: need string or buffer, StrictUndefined found"}

It’s a little unclear what is wrong, so I figured it was likely an issue with quotes or a lack of parentheses.

First I tried parentheses:

changed_when: "version_check.rc != 0 or ({{ target_version }} not in version_check.stdout)"

No luck.

changed_when: "version_check.rc != 0 or !('{{ target_version }}' in version_check.stdout)"

You know, trying to google and or not or or or is is tricky. Even if you add terms like Boolean logic or propositional calculus.

I tried to break it down into smaller parts:

changed_when: "version_check.rc != 0"

That worked.

changed_when: "!('{{ target_version }}' in version_check.stdout)"

A different error appeared:

template error while templating string: unexpected char u'!'

OK, that’s getting somewhere! Try a variation:

changed_when: "'{{ target_version }}' not in version_check.stdout"

It worked! But with a warning:

[WARNING]: when statements should not include jinja2 templating delimiters
such as {{ }} or {% %}. Found: ('{{ target_version }}' not in version_check.stdout)

Next try:

changed_when: "target_version not in version_check.stdout"

That worked, and without any warnings. I put the or back in:

changed_when: "version_check.rc != 0 or target_version not in version_check.stdout"

That worked! It was the jinja2 delimiters the whole time. The value of the changed_when key is already interpreted as jinja2 apparently, so the delimiters were redundant. Even though it succeeded (with a warning) in a single propositional statement, it failed when the logical disjunction was added. It was an important reminder: error messages aren’t perfect.

Nagios check_disk returns DISK CRITICAL – /sys/kernel/config is not accessible: Permission denied

I enabled Nagios checks for free disk space on a group of servers today, and was hit with alerts containing the following error message:
DISK CRITICAL - /sys/kernel/config is not accessible: Permission denied

If you are looking for a solution, skip to the end. Some of my mistakes before finding the solution may be interesting though!

Continue reading Nagios check_disk returns DISK CRITICAL – /sys/kernel/config is not accessible: Permission denied

Ansible: [Errno 2] No such file or directory

I tried running a command on several remote servers at once via Ansible:

$ ansible -a 'rpcinfo -p' centos

Which returned a series of errors:

ariel.osric.net | FAILED | rc=2 >>
[Errno 2] No such file or directory

caliban.osric.net | FAILED | rc=2 >>
[Errno 2] No such file or directory

trinculo.osric.net | FAILED | rc=2 >>
[Errno 2] No such file or directory

I also received an error when I tried running it via ssh:

$ ssh ariel.osric.net 'rpcinfo -p'
bash: rpcinfo: command not found

I can run it interactively on a specific host:

$ ssh ariel.osric.net
$ rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper

The problem is that the user profile isn’t loaded when running via ansible or a non-interactive ssh session. rpcinfo isn’t found in the PATH. In the next step I identify the full path:

$ ssh ariel.osric.net
$ whereis rpcinfo
rpcinfo: /usr/sbin/rpcinfo /usr/share/man/man8/rpcinfo.8.gz

(/usr/sbin is added to the path via /etc/profile)

Once I specified the full path, it worked:

$ ansible -a '/usr/sbin/rpcinfo -p' centos

ariel.osric.net | SUCCESS | rc=0 >>
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper

Etc.

Ansible meta action: “ERROR! conflicting action statements”

The initial problem I was trying to solve had to do with a reboot role. Although it was the last role listed, once it ran the connection would be broken and then none of the notified tasks from previous roles would run.

Under normal circumstances, handlers are run after all the roles. One idea I had was to have to reboot role contain a trivial task, which would notify a handler that contained the actual reboot task. Presumably as the last task notified, it would run last.

While I was looking at ways to do this, I discovered Ansible meta actions:
Ansible Documentation: Meta Module

The flush_handlers action looked like just what I needed:

    meta: flush_handlers

I tried adding that to my reboot task so that it would run all of the previously notified handlers:

- name: Rebooting ...
  command: shutdown -r now "Ansible says: Time for a reboot"
  meta: flush_handlers

This produced an error:
ERROR! conflicting action statements

As I looked at the documentation a little more closely, I saw that the examples have meta as a separate task, not part of an existing task. I missed that at first glance because of the lack of line spacing in the examples, e.g. compare:

- template:
    src: new.j2
    dest: /etc/config.txt
  notify: myhandler
- name: force all notified handlers to run at this point, not waiting for normal sync points
  meta: flush_handlers

to

- template:
    src: new.j2
    dest: /etc/config.txt
  notify: myhandler

- name: force all notified handlers to run at this point, not waiting for normal sync points
  meta: flush_handlers

I updated my reboot task accordingly:

- name: Force handlers to run before rebooting
  meta: flush_handlers

- name: Rebooting ...
  command: shutdown -r now "Ansible says: Time for a reboot"

I tested the playbook with the revised role and confirmed that all notified handlers from previous roles ran before the systems rebooted.