Minor improvements to legacy Perl code

We’re always working with code we didn’t write. You’ll spend far more time looking at code you didn’t write (or don’t remember writing) than you will spend writing new code.

Today I looked at an example Perl script that used 45 lines of code to pull the company associated with an OUI (Organizationally Unique Identifier) from a text file, given a MAC address.

I thought I could do slightly better.

find_mac_co.sh:

#!/bin/sh
OUI=$(echo "$1" | sed 's/[^A-Fa-f0-9]//g' | cut -c1-6)
awk -F "\t" -v IGNORECASE=1 -v OUI="$OUI" '$0 ~ OUI { print $3 }' ouidb.tsv
exit 0

Example run:

$ sh find_mac_co.sh 7c:ab:60:ff:ff:ff
Apple, Inc.

There’s probably a way to make the Perl version shorter too. I’m more familiar with bash and shell commands.

The biggest problem with this script is that it relies on an up-to-date list of OUIs. An even better way is to query an API:

find_mac_co_api.sh

#!/bin/sh
MACADDRESS="$1"
curl "https://api.maclookup.app/v2/macs/$MACADDRESS/company/name"
exit 0

Example run:

$ sh find_mac_co_api.sh 7c:ab:60:ff:ff:ff
Apple, Inc.

curl basic auth using base64 encoded credentials

I was trying to access password-protected files via HTTPS using curl. The site required basic auth. For a demo, I created this example:

https://osric.com/chris/demo/admin/
Username: admin
Password: 123456

It’s trivial to access this interactively via curl:

$ curl -u admin https://osric.com/chris/demo/admin/
Enter host password for user 'admin':

Or programmatically by providing the credentials in the URL:

$ curl https://admin:123456@osric.com/chris/demo/admin/

Or by providing a base64-encoded username:password pair in an Authorization header:

$ curl -H "Authorization: Basic $(echo -n admin:123456 | base64)" https://osric.com/chris/demo/admin/

(Note that echo includes a trailing newline character by default, which we do not want to include in the base64-encoded value. Specify the -n flag to echo to eliminate the trailing newline.)

But I was manipulating files with a Bash script that was being stored in a Git repository, and I didn’t want to store the credentials in the repository. So I stored the credentials in a separate file:

$ echo -n 'admin:123456' > ~/admin-credentials
$ chmod 0600 ~/admin-credentials

Now I can read the credentials from the file:

$ curl -H "Authorization: Basic $(cat admin-credentials | base64)" https://osric.com/chris/demo/admin/

I ran into a problem when I tried to update the credentials file with vi (or vim). Vi automatically inserts an end-of-line (EOL) character, which is not apparent to the user. The base64-encoded value includes the EOL character, and therefore the above command would supply invalid credentials.

To eliminate this in vi, use the following vi commands:

:set binary
:set noeol

Alternately, just overwrite the file with the updated credentials:

$ echo -n 'admin:123456' > ~/admin-credentials

check_http returns 403 Forbidden on fresh Nagios installation

I recently installed a Nagios server on a new CentOS 7 virtual machine (on Virtual Box).

One of the default checks included upon installation is a check on localhost to confirm that the HTTP server is responding. (First I had to install the check_http plugin, see previous post.) The Nagios web interface reports a warning for this check:

HTTP WARNING: HTTP/1.1 403 Forbidden - 5261 bytes in 0.001 second response time

This is unexpected, since I can request the same page in a browser, which returns the Apache Welcome page.

When I run the check manually I get the same result, as expected:

# /usr/lib64/nagios/plugins/check_http -H localhost
HTTP WARNING: HTTP/1.1 403 Forbidden - 5261 bytes in 0.001 second response time |time=0.000907s|;;;0.000000 size 5261B;;;0

I checked with curl:

# curl http://localhost

This returns the HTML source of the Apache Welcome page. It looks like it is working, right? But looking at the headers returned by the Apache server also shows 403 Forbidden:

# curl -I http://localhost
HTTP/1.1 403 Forbidden
...

The Apache Welcome page gives some hints about this behavior:

Are you the Administrator?

You should add your website content to the directory /var/www/html/.

To prevent this page from ever being used, follow the instructions in the file /etc/httpd/conf.d/welcome.conf.

The /etc/httpd/conf.d/welcome.conf file begins with the following comments and directive:

#
# This configuration file enables the default "Welcome" page if there
# is no default index page present for the root URL.  To disable the
# Welcome page, comment out all the lines below.
#
# NOTE: if this file is removed, it will be restored on upgrades.
#
<LocationMatch "^/+$">
    Options -Indexes
    ErrorDocument 403 /.noindex.html
</LocationMatch>

The Apache config is specifying that if there is no index page for the document root, return the Welcome page as an error document with a 403 HTTP status code.

Once I added a basic HTML file at /var/www/html/index.html, Nagios returned a success message:

HTTP OK: HTTP/1.1 200 OK - 549 bytes in 0.001 second response time

Canvas enrollments.csv and add_sis_stickiness

I have an enrollments.csv file for Instructure’s Canvas LMS, and I want all of the enrollments in it to “stick”–that is, to survive a batch mode SIS import. These are primarily course designers, and so they have no official standing in the class–and therefore are not in our database, and therefore are not included with regular updates to enrollments.

According to the Canvas documentation for SIS imports:

add_sis_stickiness – Boolean

This option, if present, will process all changes as if they were UI changes. This means that “stickiness” will be added to changed fields. This option is only processed if ‘override_sis_stickiness’ is also provided.

Source: https://canvas.instructure.com/doc/api/sis_imports.html#method.sis_imports_api.create

However, experience tells me otherwise. An inquiry to Instructure’s support confirms that add_sis_stickiness does not apply to enrollments. Enrollments added this way will be deleted following the next enrollments batch import.

The choices to preserve these course designer enrollments are basically to add each one manually using the web UI, or add them via the API. Either option will make the enrollments “stick.”

I opted to use the API. Since I already had a formatted input file, I wrote a short BASH script (with the help of several man pages and a couple StackOverflow pages) that reads the CSV and processes each row, adding the enrollment via the API:

headerrow=1
while read row; do
    if [ $headerrow -eq 0 ]
    then
        # get the SIS course ID
        cid="$(echo $row | cut -d',' -f1)"
        # get the SIS user ID
        uid="$(echo $row | cut -d',' -f2)"
        # get the role / enrollment type
        type="$(echo $row | cut -d',' -f3)"
        # reformat the enrollment type
        tid="$(echo $type | cut -c1 | tr [[:lower:]] [[:upper:]])""$(echo $type | cut -c2-)"Enrollment
        echo course is $cid
        echo user is $uid
        echo type is $tid
        result="$(curl https://[yourcanvassite].instructure.com/api/v1/courses/sis_course_id:$cid/enrollments -H 'Authorization: Bearer [REDACTED]' -X POST -F enrollment[type]=$tid -F enrollment[user_id]=sis_user_id:$uid -F 'enrollment[enrollment_state]=active' -F 'enrollment[notify]=false')"
        echo $result
    fi
    headerrow=0
done <enrollments.csv

Using a lightweight web server to debug requests

I’ve been working a lot with the Canvas API lately. One task was to add communication channels (e-mail addresses) to user accounts. I was able to add them successfully following the documentation’s cURL example using the skip_confirmation flag:

curl 'https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels'
-H "Authorization: Bearer [...]"
-d 'communication_channel[address]=chris@osric.com'
-d 'communication_channel[type]=email'
-d 'skip_confirmation=1'

However, when I ran the ColdFusion script I wrote, the user received notification of the addition even though the skip_confirmation flag was set:

<cfhttp url="https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels" method="post">
<cfhttpparam type="header" name="Authorization" value="Bearer [...]" />
<cfhttpparam type="formfield" name="communication_channel[address]" value="chris@osric.com" />
<cfhttpparam type="formfield" name="communication_channel[type]" value="email" />
<cfhttpparam type="formfield" name="skip_confirmation" value="1" />
</cfhttp>

Why didn’t the latter work as expected?

I needed to be able to tell what was different about the 2 requests, but it would be difficult to capture that outbound requests. Tools like Fiddler and WireShark could help, but I was sending one request from my local machine and another from a remote web server.

My idea for capturing the request data was to send the request to a server under my control. I grabbed a Python echo server and modified it slightly to print the data received. Then, instead of sending the requests to the Canvas API I sent them to the echo server. Here are the results of the 2 requests:

cURL
POST / HTTP/1.1
User-Agent: curl/7.24.0 (i686-pc-cygwin) libcurl/7.24.0 OpenSSL/0.9.8t zlib/1.2.5 libidn/1.22 libssh2/1.3.0
Host: osric.com:50000
Accept: */*
Authorization: Bearer [...]
Content-Length: 100
Content-Type: application/x-www-form-urlencoded

communication_channel[address]=chris@osric.com&communications_channel[type]=email&skip_confirmation=1

ColdFusion
POST / HTTP/1.1
User-Agent: ColdFusion
Content-Type: application/x-www-form-urlencoded
Connection: close
Authorization: Bearer [...]
Content-Length: 118
Host: osric.com:50000

communication%5Fchannel%5Baddress%5D=chris%40osric%2Ecom&communications%5Fchannel%5Btype%5D=email&skip%5Fconfirmation=1

I noted the different content lengths for the same data, and upon closer inspection, ColdFusion was URL-encoding characters. I added the encoded attribute to the cfhttpparam tags with a value of “no”, and then the request bodies were identical:

<cfhttp url="https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels" method="post">
<cfhttpparam type="header" name="Authorization" value="Bearer [...]" />
<cfhttpparam type="formfield" encoded="no" name="communication_channel[address]" value="chris@osric.com" />
<cfhttpparam type="formfield" encoded="no" name="communication_channel[type]" value="email" />
<cfhttpparam type="formfield" encoded="no" name="skip_confirmation" value="1" />
</cfhttp>

Not surprisingly that solved the problem.

Monitoring web server status with a shell script

Recently, my VPS (Virtual Private Server) ran into some issues where it exceeded the maximum amount of RAM allotted under my subscription. When this happens, the web server software shuts down and does not restart until I manually restart it.

This is bad. I’m not always visiting my own web site, so it could be down for days without me knowing. Although I really need to identify what is using all the RAM, in the meantime I’ll settle for a monitoring system that will notify me when the server is down.

#!/bin/bash
if curl -s --head http://osric.com/ | grep "200 OK" > /dev/null
  then 
    echo "The HTTP server on osric.com is up!" > /dev/null
  else
    echo "The HTTP server on osric.com is down!"
fi

cURL will let you retrieve a URL via the command line, and provides more options than Wget for a single URL. In this case, I used the silent switch to eliminate the status/progress output, and the head switch to retrieve only the document headers. The document header is then piped to Grep, which searches for the string “200 OK” (the HTTP status message for a successful request).

I send the result of that to /dev/null so that the output doesn’t appear on the screen.

If grep does find 200 OK, then I send a success message to /dev/null. This is largely unnecessary, but it is nice to leave in to test the script in a successful case–just remove the > /dev/null. If it doesn’t find 200 OK, then there is a problem. It might not mean, necessarily, that the web server is down, but it definitely indicates there is a problem that needs to be identified.

I added a call to this script to a crontab to run every 5 minutes. If there is no output, nothing happens. If there is output, the output is sent to me via e-mail, which, assuming I am checking my e-mail religiously, should reduce server downtime.