The Case of the Mysterious Git Repo

I’m migrating scripts from one CentOS host to another. (The old one is CentOS 6 and the new one is CentOS 7.) One of the items I want to move is a script that’s part of a git repo.

I know it’s a git repo because it has a .git directory. The directory and files are owned by a local user, luser1.

Where did this git repo come from?

You can call git config -l and it will tell you things about the repo, including the remote.origin.url. Now I can tell that it came from my GitHub Enterprise (GHE) instance.

But it doesn’t tell you who created it. And luser1 doesn’t exist on my GitHub Enterprise instance.

I tried to clone it on the new host, just to see what would happen:

$ sudo -u luser1 git clone git@github.osric.net:team1/the-sauce.git

Result:

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Somehow luser1 is able to interact with this repo on the old host, even though the user doesn’t exist in GHE.

Fortunately the local repo has some history:

$ head -n1 .git/logs/HEAD

Result:

0000000000000000000000000000000000000000 ab509345a16d987dac10987c4bc2df0c0dfb3ed9 luser1 luser1 <luser1@oldhost.osric.net> 1505633110 -0500   clone: from git@github.osric.net:team1/the-sauce.git

Right there there it is, all the info I wanted! Except that’s the same user, the user that doesn’t even exist in GitHub Enterprise.

Or does it? I went looking at some of the other users in my GitHub Enterprise instance. There were a few system accounts, and when I checked some of them had several SSH keys associated with the accounts. One of the SSH key fingerprints even said luser1.

I wanted to compare that key fingerprint to the key in /home/luser1/.ssh/id_rsa.pub. I can never remember how to do that, but StackOverflow had the answer on How do I find my RSA key fingerprint?

$ ssh-keygen -lf /home/luser1/.ssh/id_rsa.pub

Result:

2048 bb:fc:1c:03:17:5f:67:4f:1f:0b:50:5a:9f:f9:30:e5 /home/luser1/.ssh/id_rsa.pub (RSA)

The fingerprint matches! The SSH key was added to a completely different username in GitHub Enterprise which does have access to the repo.

That works, but why would someone do that? Convenience, I assume, but it wasn’t obvious when I needed to reverse-engineer it!

Python Flask and VirtualBox networking

I had been using the Python socket module to create a very basic client-server for testing purposes, but soon I wanted to have something slightly more standard, like an HTTP server. I decided to try the Python Flask framework.

First I set up a Flask server on a CentOS 7 Linux VM running on VirtualBox:

# yum install python-pip
# pip install Flask
# mkdir flask-server && cd flask-server

I created the hello.py file as described on the Flask homepage:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

Likewise, I started running Flask:

# FLASK_APP=hello.py flask run
 * Serving Flask app "hello"
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Then I set up port forwarding in VirtualBox on my desktop host so that I could communicate with the virtual machine, using the following settings:

Name: flask
Protocol: TCP
Host IP: 127.0.0.1
Host Port: 9500
Guest IP: 10.0.2.16
Guest Port: 5000

VirtualBox port forwarding rules
VirtualBox port forwarding rules

I tested it in a browser (Firefox) on my desktop at http://127.0.0.1:9500/

No connection. Firefox endlessly tries to load the file.

I tried from the local machine itself:

# curl http://localhost:5000/
Hello World!

I tried running tcpdump to see what the network traffic to that port looked like:

# tcpdump -n -i enp0s3 port 5000
...
14:54:11.938625 IP 10.0.2.2.63923 > 10.0.2.16.commplex-main: Flags [S], seq 3067208705, win 65535, options [mss 1460], length 0
...

Over and over I saw the same SYN packet from the client host, but the server never replied with a SYN-ACK.

I also noted that the local port was labeled commplex-main. This label is from /etc/services:

# grep commplex /etc/services
commplex-main   5000/tcp                #
commplex-main   5000/udp                #
commplex-link   5001/tcp                #
commplex-link   5001/udp                #

I don’t know what commplex-main is, but since I’m not running anything else on port 5000 other than Flask, it shouldn’t matter.

It turned out there were 2 separate problems:

  1. Flask was listening only on localhost
  2. firewalld was blocking the requests from external hosts

To fix the first, run Flask with the host flag:

# FLASK_APP=hello.py flask run --host=0.0.0.0
 * Serving Flask app "hello"
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

(This is mentioned in the Flask Quickstart guide, under Externally Visible Server.)

You can also specify an alternative port, e.g.:

# FLASK_APP=hello.py flask run --host=0.0.0.0 --port=56789
 * Serving Flask app "hello"
 * Running on http://0.0.0.0:56789/ (Press CTRL+C to quit)

To fix the latter temporarily, I disabled firewalld:

systemctl stop firewalld
systemctl disable firewalld

Obviously, if you are dealing with a machine connected directly to the Internet, this would be a terrible solution. You’d want to add rules allowing only the hosts and ports from which you expect to receive connections. But for testing communications between my desktop and a virtual host running on it, this seemed like a quick solution.

After those 2 changes, I was able to load the sample “hello” Flask app in a browser:

The text "Hello World!" loaded in Firefox
The text “Hello World!” loaded in Firefox

Make a CVS project read-only

In the previous post, Converting a CVS project to a Git repository, I describe using cvs2git to convert a CVS project to a git repository. After I made the conversion, I wanted to make the CVS project read-only.

There’s probably no reason to keep the CVS project around (the history is in the git repo, and I have backups of the CVS project), but it felt like the right thing to do. The blog post Read-only CVS access for only certain projects was extremely helpful to accomplish this.

The key component is the CVSROOT/commitinfo file within your CVS repository. Like any other project in CVS, you need to check this out to make changes:

cvs co CVSROOT
cd CVSROOT && vi commitinfo

You specify a regular expression and a script to run before committing data to a project matching that regular expression. If the script exits with a non-zero exit code (indicating an error), the commit is aborted. For initial testing, I used false (or /bin/false) for the script component, which does nothing and returns an exit code of 1.

I had some problems with this, in part because I was not sure what the project string would look like. I tried a few things:

  • ^/testrepo/.* false (didn’t work)
  • ^testrepo/.* false didn’t work
  • ^t.* false worked, but would match other projects as well

Eventually I switched to using the read-only-project.sh example from the aforementioned blog post, which printed out the values of the project path and the filenames to be committed.

From there I could see that the project path:

  • Does not include an initial slash
  • Does not include a trailing slash
  • May include additional slashes if the project contains subdirectories

The same script suggests including the following in commitinfo:

^projectname/.* /path/to/script "%p" %s

That regular expression does not work — it would match a file at projectname/subdir1/file1 but not projectname/file1.

And what do the “%p” and %s mean? From C.3.4 Commitinfo:

Currently, if no format strings are specified, a default string of ` %r/%p %{s}’ will be appended to the command line template before replacement is performed, but this feature is deprecated.

I found another document, C.3.1 The common syntax, which describes the format strings.

  • p – the name of the directory being operated on within the repository.
  • {s} – the file name(s), in curly braces because it is a list

The same page includes a sample regular expression that solves the problem I was having:

^module\(/\|$\)

Finally, here is what I added to CVSROOT/commitinfo:

^testrepo\(/\|$\) /usr/local/script/read-only-project.sh

Note that this script needs to exist on the same machine as the CVS repository (which may or may not be the same machine as your checked-out copy).

Converting a CVS project to a Git repository

Why do I still have projects in CVS in 2018?

  1. I inherited them
  2. Inertia

Fortunately, the cvs2svn project includes cvs2git. The instructions included are good, but here are a few things I ran into that may be useful:

You need the actual CVS repo, not a checked out copy. If you run cvs2git on a checked-out copy, you will get an error message like:

ERROR: No RCS files found under 'projectname'

I found that mentioned on svn2git fails “ERROR: No RCS files found under…”. A comment there mentions getting a tarball of your project from Sourceforge, but if you aren’t working with a Sourceforge project, make your own tarball:

tar -cf cvs.tar.gz /path/to/CVS

I created a tarball because I am not running cvs2git on the same machine as my actual CVS repo. cvs2git is non-destructive, and I have backups in case something goes wrong, but I didn’t feel like taking any risks (or testing my restore procedures) at that moment.

I ended up running cvs2git on a Fedora VM. First, install CVS:

sudo dnf install cvs

Install cvs2svn:

wget http://cvs2svn.tigris.org/files/documents/1462/49543/cvs2svn-2.5.0.tar.gz
tar -xf cvs2svn-2.5.0.tar.gz
cd cvs2svn-2.5.0
make install

Create the blob and dump files (you’ll import these into git shortly):

cvs2git --blobfile=/tmp/gitblob.dat --dumpfile=/tmp/gitdump.dat /path/to/specific/cvs/project

Create a bare git repository:

git init --bare reponame
cd reponame

Import the blob and dump files into the git repository:

cat /tmp/gitblob.dat /tmp/gitdump.dat | git fast-import

Now the CVS project is a git repository! Great, but how do I put a bare repo on GitHub or a GitHub Enterprise instance? The article Moving a repository from GitHub.com to GitHub Enterprise was helpful:

git remote add origin git@[hostname]:[owner]/[repo-name].git
git push origin--mirror

(It’s still a bare repo locally, so if you want to check it out you can clone it out to another destination folder, or rm -rf the local repo and clone it.)

The last thing I wanted to do: make the current CVS project read-only. That turned out to be more confusing than I expected, so I’ve turned that into a separate post, Make a CVS project read-only.

Installing Ansible role dependencies

I have a monolithic Ansible playbook that contains dozens of different roles, all bundled into the same Git repository. Some of the roles are more generically useful than others, so I thought I would do some refactoring.

I decided to move the role that installs and configures fail2ban to its own repository, and then call that new/refactored role as a dependency in my now-slightly-less-monolithic role.

Of course, I had no idea what I was doing.
Continue reading Installing Ansible role dependencies

Using Ansible to check version before install or upgrade

One thing that I do frequently with an Ansible role is check to see if software is already installed and at the desired version. I do this for several related reasons:

  1. To avoid taking extra time and doing extra work
  2. To make the role idempotent (changes are only made if changes are needed)
  3. So that the play recap summary lists accurate results

I’m thinking particularly of software that needs to be unpacked, configured, compiled, and installed (rather than .rpm or .deb packages). In this example, I’ll be installing the fictional widgetizer software.

First I add a couple variables to the defaults/main.yml file for the role:

---
path_to_widgetizer: /usr/local/bin/widgetizer
widgetizer_target_version: 1.2
...

Next I add a task to see if the installed binary already exists:

- name: check for existing widgetizer install
  stat:
    path: "{{ path_to_widgetizer }}"
  register: result_a
  tags: widgetizer

Then, if widgetizer is installed, I check which version is installed:

- name: check widgetizer version
  command: "{{ path_to_widgetizer }} --version"
  register: result_b
  when: "result_a.stat.exists"
  changed_when: False
  failed_when: False
  tags: widgetizer

2 things to note in the above:

  • The command task normally reports changed: true, so specify changed_when: False to prevent this.
  • Although this task should only run if widgetizer is present, we don’t want the task (and therefore the entire playbook) to fail if it is not present. Specify failed_when: false to prevent this. (I could also specify ignore_errors: true, which would report the error but would not prevent the rest of the playbook from running.)

Now I can check the registered variables to determine if widgetizer needs to be installed or upgraded:

- name: install/upgrade widgetizer, if needed
  include: tasks/install.yml
  when: "not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout"
  tags: widgetizer

However, when I ran my playbook I received an error:

$ ansible-playbook -i hosts site.yaml --limit localhost --tags widgetizer

...

fatal: [localhost]: FAILED! => {"failed": true, "msg": "The conditional check 'not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout' failed. The error was: Unexpected templating type error occurred on ({% if not result_a.stat.exists or widgetizer_target_version is not defined or widgetizer_target_version not in result_b.stdout %} True {% else %} False {% endif %}): coercing to Unicode: need string or buffer, float found\n\nThe error appears to have been in '/home/chris/projectz/roles/widgetizer/tasks/install.yml': line 3, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: copy widgetizer source\n  ^ here\n"}

The key piece of information to note in that error message is:

need string or buffer, float found

We’ve supplied widgetizer_target_version as 1.2 (a floating point number), but Python/jinja2 wants a string to search for in result_b.stdout.

There are at least 2 ways to fix this:

  • Enclose the value in quotes to specify widgetizer_target_version as a string in the variable definition, e.g. widgetizer_target_version: "1.2"
  • Convert widgetizer_target_version to a string in the when statement, e.g. widgetizer_target_version|string not in result_b.stdout

After making either of those changes, the playbook runs successfully and correctly includes or ignores the install.yml file as appropriate.

Ansible unarchive module error: path does not exist

I was working on deploying files to a host via Ansible’s unarchive module when I ran into an error message:

path /tmp/datafiles/ does not exist

Here’s the relevant portion of my Ansible role’s task/main.yml:

- name: copy datafiles
  unarchive:
    src: datafiles.tar.gz
    dest: /tmp
    owner: root
    group: datauser

Here’s the full result of running that task:

TASK [datafiles : copy datafiles] *******************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "path /tmp/datafiles/ does not exist", "path": "/tmp/datafiles/", "state": "absent"}

The error message confused me. The datafiles directory shouldn’t need to exist!

The problem was completely unrelated to the error message. I had specified a group, datauser, that did not exist on the target host. Once I removed the group parameter, the task ran without error. (Another option would be to ensure that the specified group exists on the target host.)

Analyzing text to find common terms using Python and NLTK

I just recently started playing with the Python NLTK (Natural Language ToolKit) to analyze text. The book Natural Language Processing with Python is available online and is very helpful if you’re just getting started.

At the beginning of the book the examples cover importing and analyzing text (primarily books) that you import from nltk (Getting Started with NLTK). It includes texts like Moby-Dick and Sense and Sensibility.

But you will probably want to analyze a source of your own. For example, I had text from a series of tweets debating political issues. The third chapter (Accessing Text from the Web and from Disk) has the answers:

First you need to turn raw text into tokens:

tokens = word_tokenize(raw)

Next turn your tokens into NLTK text:

text = nltk.Text(tokens)

Now you can treat it like the book examples in chapter 1.

I was analyzing a number number of tweets. One of the things I wanted to do was find common words in the tweets, to see if there were particular keywords that were common.

I was using the Python interpreter for my tests, and I did run into a couple errors with word_tokenize and later FreqDist, such as:

>>> fdist1 = FreqDist(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'FreqDist' is not defined

You can address this by importing the specific libraries:

>>> from nltk import FreqDist

Here are the commands, in order, that I ran to produce my list of common words — in this case, I was looking for words that appeared at least 3 times and that were at least 5 characters long:

>>> import nltk
>>> from nltk import word_tokenize
>>> from nltk import FreqDist

>>> with open("corpus-twitter", "r") as myfile:
...     raw = myfile.read().decode("utf8")

>>> tokens = word_tokenize(raw)
>>> text = nltk.Text(tokens)

>>> fdist = FreqDist(text)
>>> sorted(w for w in set(text) if len(w) >= 5 and fdist[w] >= 3)

[u'Americans', u'Detroit', u'Please', u'TaxReform', u'Thanks', u'There', u'Trump', u'about', u'against', u'always', u'anyone', u'argument', u'because', u'being', u'believe', u'context', u'could', u'debate', u'defend', u'diluted', u'dollars', u'enough', u'every', u'going', u'happened', u'heard', u'human', u'ideas', u'immigration', u'indefensible', u'logic', u'never', u'opinion', u'people', u'point', u'pragmatic', u'problem', u'problems', u'proposed', u'public', u'question', u'really', u'restricting', u'right', u'saying', u'school', u'scope', u'serious', u'should', u'solution', u'still', u'talking', u'their', u'there', u'think', u'thinking', u'thread', u'times', u'truth', u'trying', u'tweet', u'understand', u'until', u'welfare', u'where', u'world', u'would', u'wrong', u'years', u'yesterday']

It turns out the results weren’t as interesting as I’d hoped. A few interesting items–Detroit for example–but most of the words aren’t surprising given I was looking at tweets around political debate. Perhaps with a larger corpus there would be more stand-out words.

Nagios alert: CRITICAL: No response from NTP server

One of a pair of new hosts was causing the following Nagios alert today:

CRITICAL: No response from NTP server

Both of the new systems have the same configuration in theory, but based on the different results something clearly was overlooked.

I tried running NTP from the Nagios host:

Host 1

$ check_ntp -H ephemeralbox1.osric.net -w 0.1 -c 0.2
NTP OK: Offset -0.02545583248 secs|offset=-0.025456s;0.100000;0.200000;

Host 2

$ check_ntp -H ephemeralbox2.osric.net -w 0.1 -c 0.2
CRITICAL: No response from NTP server

The iptables rules look the same on both. The hosts are all on the same LAN, so there’s no firewall in the way.

Both systems are running chronyd:

Host 1

[chris@ephemeralbox1 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'
ActiveState=active
SubState=running

Host 2

[chris@ephemeralbox2 ssh]$ systemctl show chronyd | egrep '(ActiveState|SubState)'
ActiveState=active
SubState=running

Both systems are listening on port 123:

Host 1

[chris@ephemeralbox1 ssh]$ sudo lsof -i :123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 3027 chrony 3u IPv4 1095448 0t0 UDP *:ntp

Host 2

[chris@ephemeralbox2 ssh]$ sudo lsof -i :123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 1241 chrony 3u IPv4 51276 0t0 UDP *:ntp

Finally, I found it. In the obvious place that perhaps I should have looked first. The /etc/chrony.conf file on Host 2 was missing the allow line for the Nagios host:

# Allow NTP client access from Nagios host
allow 192.168.100.100

And the first place I looked was iptables. Blame the firewall, after all. The configurations were both pushed to these systems via Ansible playbooks, but apparently I had not included the role that updates the chrony.conf file on the 2nd host. Looks like I need configuration management management!

yum Error: requested datatype primary not available

I ran into a new-to-me yum error earlier today:

$ yum --quiet check-updates
Error: requested datatype primary not available

Following the tips on Unix & Linux StackExchange: Error: requested datatype primary not available, I:

  • ran yum clean all
  • disabled repositories one at a time to identify the repo that was causing the error

In my case, it turned out to be the extras repo. The following did not produce any errors:

$ yum --quiet --disablerepo=extras check-updates

What is wrong with the extras repo? It is defined in /etc/yum.repos.d/CentOS-Base.repo, so I took a look at what was there:

[extras]
name=CentOS-$releasever - Extras
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
#baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

None of that looked unusual (or had changed recently), so back to Google.

I tried excluding the specific mirror that was listed for the extras repo (http://mirrors.unifiedlayer.com/centos/7.4.1708/extras/x86_64/) by adding unifiedlayer.com to the exclude line in /etc/yum/pluginconf.d/fastestmirror.conf, as described in yum and fastestmirror plugin. Although yum appeared to pick a different mirror it still gave me the same error.

It turns out, the mirror in question was “poisoned” (rerouted) by my DNS servers, as it had been identified (possibly erroneously) as malicious. As such, the domain still resolved but the path to the CentOS repository did not exist.

I didn’t think that excluding the domain in fastestmirror.conf was having the intended effect, and yum was still trying to contact the bad mirror. I took the following steps, which resolved the error, although I can’t say I entirely understand why:

$ sudo yum makecache

This still produced the error.

I removed the bad entry from:

/var/cache/yum/x86_64/7/extras/mirrorlist.txt

Then I ran makecache again:

$ sudo yum makecache

No error this time! I tried running check-update:

$ yum check-update

No error!

Shouldn’t yum clean all have eliminated the bad cache value in /var/cache/yum/x86_64/7/extras/mirrorlist.txt?

Cache invalidation, one of the hard problems. At least I have steps to take if I run into this problem again.