Kitchen CI – using the Vagrant driver

I’d previously been using the Docker driver with Kitchen CI and kitchen-ansible to test my Ansible playbooks. I really like using Kitchen CI. Test-driven infrastructure development! Regression testing! It’s great.

There were several reasons I decided to switch from the Docker driver to Vagrant. My target hosts are all either VMs or bare metal servers, so Vagrant VMs more closely resemble that environment. In particular, there are a couple areas where Docker containers don’t perform well for this purpose:

  • Configuring and testing SELinux settings
  • Configuring and testing systemd services

Continue reading Kitchen CI – using the Vagrant driver

AttributeError: module ‘paramiko’ has no attribute ‘SSHClient’

I have a simple Python 3 script (I’m running Python 3.6.1, compiled from source) that does the following 3 things:

  1. Connects to remote server(s) and using scp to get files
  2. Processes the files.
  3. Connects to another remote server and uploads the processed files.

I moved this script from a soon-to-be-retired CentOS 6 host to a CentOS 7 host, and when I ran it there I received the following error message:

AttributeError: module 'paramiko' has no attribute 'SSHClient'

The line number specified in the stack trace led me to:

ssh = paramiko.SSHClient()

First things first: Google the error. Someone else has seen this before. Sure enough, on StackOverflow I found Paramiko: Module object has no attribute error ‘SSHClient’

But no, that’s not the problem I’m having.

I tried to create the the simplest possible script that would reproduce the error:

#!/usr/bin/python3

import paramiko

def main():
        ssh = paramiko.SSHClient()

if __name__ == "__main__":
        main()

I ran it as a super-user and received no errors:

$ sudo /usr/bin/python3 test.py

I ran it as myself, though, and it reproduced the error message. Maybe something with the user permissions?

$ ls -l /usr/local/lib/python3.6/site-packages/
total 1168
...
drwxr-x---.  3 root root   4096 May 29 14:25 paramiko
...

Oh! From that you can see that unless you are the root user, or a member of the root group, there’s no way you can even see the files within the paramiko directory.

What’s the default umask on the system?

$ umask
0027

That explains it. Now, to fix it. I could probably just run:

$ sudo chmod -R 0755 /usr/local/lib/python3.6/site-packages/*

That should have been the end of that, problem solved! But in my case, installation of the pip modules had been handled by Ansible. I needed to fix the Ansible tasks to account for restrictive umask settings on future deployments. See the umask parameter in the documentation for the Ansible pip module. I updated the task:

- name: Install specified python requirements
  pip:
    executable: /usr/local/bin/pip3
    requirements: /tmp/pip3-packages
    umask: 0022

Running the playbook with that task, I received an error:

fatal: [trinculo.osric.net]: FAILED! => {"changed": false, "details": "invalid literal for int() with base 8: '18'", "msg": "umask must be an octal integer"}

Another helpful StackOverflow post suggested the value needed to be in quotes:

- name: Install specified python requirements
  pip:
    executable: /usr/local/bin/pip3
    requirements: /tmp/pip3-packages
    umask: "0022"

Now the playbook runs without error, but it doesn’t change the existing permissions. The task does nothing, since the Python pip modules are already installed. To really test the playbook, I need to clear out the existing modules first.

Warning: this breaks things!:

$ sudo rm -rf /usr/local/lib/python3.6/site-packages/*

I tried running the playbook again and received this error message:

stderr: Traceback (most recent call last):\n  File "/usr/local/bin/pip3", line 7, in <module>\n    from pip import main\nModuleNotFoundError: No module named 'pip'\n

I wasn’t able to run pip at all:

$ pip3
Traceback (most recent call last):
  File "/usr/local/bin/pip3", line 7, in <module>
    from pip import main
ModuleNotFoundError: No module named 'pip'

Clearly, I had deleted something important! I reinstalled Python from the gzipped source tarball for Python 3.6.1 (newer versions available from Python source releases) and then everything worked as expected.

Installing Ansible role dependencies

I have a monolithic Ansible playbook that contains dozens of different roles, all bundled into the same Git repository. Some of the roles are more generically useful than others, so I thought I would do some refactoring.

I decided to move the role that installs and configures fail2ban to its own repository, and then call that new/refactored role as a dependency in my now-slightly-less-monolithic role.

Of course, I had no idea what I was doing.
Continue reading Installing Ansible role dependencies

Using Ansible to check version before install or upgrade

One thing that I do frequently with an Ansible role is check to see if software is already installed and at the desired version. I do this for several related reasons:

  1. To avoid taking extra time and doing extra work
  2. To make the role idempotent (changes are only made if changes are needed)
  3. So that the play recap summary lists accurate results

I’m thinking particularly of software that needs to be unpacked, configured, compiled, and installed (rather than .rpm or .deb packages). In this example, I’ll be installing the fictional widgetizer software.

First I add a couple variables to the defaults/main.yml file for the role:

---
path_to_widgetizer: /usr/local/bin/widgetizer
widgetizer_target_version: 1.2
...

Next I add a task to see if the installed binary already exists:

- name: check for existing widgetizer install
  stat:
    path: "{{ path_to_widgetizer }}"
  register: result_a
  tags: widgetizer

Then, if widgetizer is installed, I check which version is installed:

- name: check widgetizer version
  command: "{{ path_to_widgetizer }} --version"
  register: result_b
  when: "result_a.stat.exists"
  changed_when: False
  failed_when: False
  tags: widgetizer

2 things to note in the above:

  • The command task normally reports changed: true, so specify changed_when: False to prevent this.
  • Although this task should only run if widgetizer is present, we don’t want the task (and therefore the entire playbook) to fail if it is not present. Specify failed_when: false to prevent this. (I could also specify ignore_errors: true, which would report the error but would not prevent the rest of the playbook from running.)

Now I can check the registered variables to determine if widgetizer needs to be installed or upgraded:

- name: install/upgrade widgetizer, if needed
  include: tasks/install.yml
  when: "not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout"
  tags: widgetizer

However, when I ran my playbook I received an error:

$ ansible-playbook -i hosts site.yaml --limit localhost --tags widgetizer

...

fatal: [localhost]: FAILED! => {"failed": true, "msg": "The conditional check 'not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout' failed. The error was: Unexpected templating type error occurred on ({% if not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout %} True {% else %} False {% endif %}): coercing to Unicode: need string or buffer, float found\n\nThe error appears to have been in '/home/chris/projectz/roles/widgetizer/tasks/install.yml': line 3, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: copy widgetizer source\n  ^ here\n"}

The key piece of information to note in that error message is:

need string or buffer, float found

We’ve supplied widgetizer_target_version as 1.2 (a floating point number), but Python/jinja2 wants a string to search for in result_b.stdout.

There are at least 2 ways to fix this:

  • Enclose the value in quotes to specify widgetizer_target_version as a string in the variable definition, e.g. widgetizer_target_version: "1.2"
  • Convert widgetizer_target_version to a string in the when statement, e.g. widgetizer_target_version|string not in result_b.stdout

After making either of those changes, the playbook runs successfully and correctly includes or ignores the install.yml file as appropriate.

Ansible unarchive module error: path does not exist

I was working on deploying files to a host via Ansible’s unarchive module when I ran into an error message:

path /tmp/datafiles/ does not exist

Here’s the relevant portion of my Ansible role’s task/main.yml:

- name: copy datafiles
  unarchive:
    src: datafiles.tar.gz
    dest: /tmp
    owner: root
    group: datauser

Here’s the full result of running that task:

TASK [datafiles : copy datafiles] *******************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "path /tmp/datafiles/ does not exist", "path": "/tmp/datafiles/", "state": "absent"}

The error message confused me. The datafiles directory shouldn’t need to exist!

The problem was completely unrelated to the error message. I had specified a group, datauser, that did not exist on the target host. Once I removed the group parameter, the task ran without error. (Another option would be to ensure that the specified group exists on the target host.)

Ansible conditional check failed

I wanted to add a check to one of my Ansible roles so that an application source would be copied and the source recompiled only if no current version existed or if the existing version did not match the expected version:

- name: Check to see if app is installed and the expected version
  command: /usr/local/app/bin/app --version
  register: version_check
  ignore_errors: True
  changed_when: "version_check.rc != 0 or {{ target_version }} not in version_check.stdout"

- name: include app install
  include: tasks/install.yml
  when: "version_check.rc != 0 or {{ target_version }} not in version_check.stdout"

I defined the target version in my role’s defaults/main.yml:

---
target_version: "2.5.2"
...

The first time I ran it, I encountered an error:

fatal: [trinculo.osric.net]: FAILED! => {"failed": true, "msg": "The conditional check 'version_check.rc != 0 or {{ target_version }} not in version_check.stdout' failed. The error was: error while evaluating conditional (version_check.rc != 0 or {{ target_version }} not in version_check.stdout): Unable to look up a name or access an attribute in template string ({% if version_check.rc != 0 or 2.5.2 not in version_check.stdout %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': coercing to Unicode: need string or buffer, StrictUndefined found"}

It’s a little unclear what is wrong, so I figured it was likely an issue with quotes or a lack of parentheses.

First I tried parentheses:

changed_when: "version_check.rc != 0 or ({{ target_version }} not in version_check.stdout)"

No luck.

changed_when: "version_check.rc != 0 or !('{{ target_version }}' in version_check.stdout)"

You know, trying to google and or not or or or is is tricky. Even if you add terms like Boolean logic or propositional calculus.

I tried to break it down into smaller parts:

changed_when: "version_check.rc != 0"

That worked.

changed_when: "!('{{ target_version }}' in version_check.stdout)"

A different error appeared:

template error while templating string: unexpected char u'!'

OK, that’s getting somewhere! Try a variation:

changed_when: "'{{ target_version }}' not in version_check.stdout"

It worked! But with a warning:

[WARNING]: when statements should not include jinja2 templating delimiters
such as {{ }} or {% %}. Found: ('{{ target_version }}' not in version_check.stdout)

Next try:

changed_when: "target_version not in version_check.stdout"

That worked, and without any warnings. I put the or back in:

changed_when: "version_check.rc != 0 or target_version not in version_check.stdout"

That worked! It was the jinja2 delimiters the whole time. The value of the changed_when key is already interpreted as jinja2 apparently, so the delimiters were redundant. Even though it succeeded (with a warning) in a single propositional statement, it failed when the logical disjunction was added. It was an important reminder: error messages aren’t perfect.

Nagios check_disk returns DISK CRITICAL – /sys/kernel/config is not accessible: Permission denied

I enabled Nagios checks for free disk space on a group of servers today, and was hit with alerts containing the following error message:
DISK CRITICAL - /sys/kernel/config is not accessible: Permission denied

If you are looking for a solution, skip to the end. Some of my mistakes before finding the solution may be interesting though!

Continue reading Nagios check_disk returns DISK CRITICAL – /sys/kernel/config is not accessible: Permission denied

Ansible: [Errno 2] No such file or directory

I tried running a command on several remote servers at once via Ansible:

$ ansible -a 'rpcinfo -p' centos

Which returned a series of errors:

ariel.osric.net | FAILED | rc=2 >>
[Errno 2] No such file or directory

caliban.osric.net | FAILED | rc=2 >>
[Errno 2] No such file or directory

trinculo.osric.net | FAILED | rc=2 >>
[Errno 2] No such file or directory

I also received an error when I tried running it via ssh:

$ ssh ariel.osric.net 'rpcinfo -p'
bash: rpcinfo: command not found

I can run it interactively on a specific host:

$ ssh ariel.osric.net
$ rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper

The problem is that the user profile isn’t loaded when running via ansible or a non-interactive ssh session. rpcinfo isn’t found in the PATH. In the next step I identify the full path:

$ ssh ariel.osric.net
$ whereis rpcinfo
rpcinfo: /usr/sbin/rpcinfo /usr/share/man/man8/rpcinfo.8.gz

(/usr/sbin is added to the path via /etc/profile)

Once I specified the full path, it worked:

$ ansible -a '/usr/sbin/rpcinfo -p' centos

ariel.osric.net | SUCCESS | rc=0 >>
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper

Etc.

Ansible meta action: “ERROR! conflicting action statements”

The initial problem I was trying to solve had to do with a reboot role. Although it was the last role listed, once it ran the connection would be broken and then none of the notified tasks from previous roles would run.

Under normal circumstances, handlers are run after all the roles. One idea I had was to have to reboot role contain a trivial task, which would notify a handler that contained the actual reboot task. Presumably as the last task notified, it would run last.

While I was looking at ways to do this, I discovered Ansible meta actions:
Ansible Documentation: Meta Module

The flush_handlers action looked like just what I needed:

    meta: flush_handlers

I tried adding that to my reboot task so that it would run all of the previously notified handlers:

- name: Rebooting ...
  command: shutdown -r now "Ansible says: Time for a reboot"
  meta: flush_handlers

This produced an error:
ERROR! conflicting action statements

As I looked at the documentation a little more closely, I saw that the examples have meta as a separate task, not part of an existing task. I missed that at first glance because of the lack of line spacing in the examples, e.g. compare:

- template:
    src: new.j2
    dest: /etc/config.txt
  notify: myhandler
- name: force all notified handlers to run at this point, not waiting for normal sync points
  meta: flush_handlers

to

- template:
    src: new.j2
    dest: /etc/config.txt
  notify: myhandler

- name: force all notified handlers to run at this point, not waiting for normal sync points
  meta: flush_handlers

I updated my reboot task accordingly:

- name: Force handlers to run before rebooting
  meta: flush_handlers

- name: Rebooting ...
  command: shutdown -r now "Ansible says: Time for a reboot"

I tested the playbook with the revised role and confirmed that all notified handlers from previous roles ran before the systems rebooted.