Python Flask, escaping HTML strings, and the Markup class

As in the previous post, I had created a simple web app using Python Flask to use as a teaching tool. The purpose was to demonstrate SQL injection and XSS (cross-site scripting) vulnerabilities and how to remediate them.

In this case, the remediation step for XSS (escaping output) tripped me up. I tried this:

return '<p>You searched for: ' + escape(user_input) + '</p>'

I expected it to escape only the user_input variable, but instead it escaped all the HTML, returning this:

&lt;p&gt;You searched for: &lt;script&gt;alert(1)&lt;/script&gt;&lt;/p&gt;

Continue reading Python Flask, escaping HTML strings, and the Markup class

Python, tuples, sequences, and parameterized SQL queries

I recently developed a teaching tool using the Python Flask framework to demonstrate SQL injection and XSS (cross-site scripting) vulnerabilities and how to remediate them.

The remediation step for SQL injection tripped me up though when I received the following error message:

sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 4 supplied.

Continue reading Python, tuples, sequences, and parameterized SQL queries

Running a Python Flask application in a Docker container

I’ve played with Docker containers but haven’t really done anything, useful or otherwise, with them. I decided to create a Docker image that includes a web-based chatbot. You can find the Git repository for this (including the finished Dockerfile) at https://github.com/cherdt/docker-nltk-chatbot

Continue reading Running a Python Flask application in a Docker container

AttributeError: module ‘paramiko’ has no attribute ‘SSHClient’

I have a simple Python 3 script (I’m running Python 3.6.1, compiled from source) that does the following 3 things:

  1. Connects to remote server(s) and using scp to get files
  2. Processes the files.
  3. Connects to another remote server and uploads the processed files.

I moved this script from a soon-to-be-retired CentOS 6 host to a CentOS 7 host, and when I ran it there I received the following error message:

AttributeError: module 'paramiko' has no attribute 'SSHClient'

The line number specified in the stack trace led me to:

ssh = paramiko.SSHClient()

First things first: Google the error. Someone else has seen this before. Sure enough, on StackOverflow I found Paramiko: Module object has no attribute error ‘SSHClient’

But no, that’s not the problem I’m having.

I tried to create the the simplest possible script that would reproduce the error:

#!/usr/bin/python3

import paramiko

def main():
        ssh = paramiko.SSHClient()

if __name__ == "__main__":
        main()

I ran it as a super-user and received no errors:

$ sudo /usr/bin/python3 test.py

I ran it as myself, though, and it reproduced the error message. Maybe something with the user permissions?

$ ls -l /usr/local/lib/python3.6/site-packages/
total 1168
...
drwxr-x---.  3 root root   4096 May 29 14:25 paramiko
...

Oh! From that you can see that unless you are the root user, or a member of the root group, there’s no way you can even see the files within the paramiko directory.

What’s the default umask on the system?

$ umask
0027

That explains it. Now, to fix it. I could probably just run:

$ sudo chmod -R 0755 /usr/local/lib/python3.6/site-packages/*

That should have been the end of that, problem solved! But in my case, installation of the pip modules had been handled by Ansible. I needed to fix the Ansible tasks to account for restrictive umask settings on future deployments. See the umask parameter in the documentation for the Ansible pip module. I updated the task:

- name: Install specified python requirements
  pip:
    executable: /usr/local/bin/pip3
    requirements: /tmp/pip3-packages
    umask: 0022

Running the playbook with that task, I received an error:

fatal: [trinculo.osric.net]: FAILED! => {"changed": false, "details": "invalid literal for int() with base 8: '18'", "msg": "umask must be an octal integer"}

Another helpful StackOverflow post suggested the value needed to be in quotes:

- name: Install specified python requirements
  pip:
    executable: /usr/local/bin/pip3
    requirements: /tmp/pip3-packages
    umask: "0022"

Now the playbook runs without error, but it doesn’t change the existing permissions. The task does nothing, since the Python pip modules are already installed. To really test the playbook, I need to clear out the existing modules first.

Warning: this breaks things!:

$ sudo rm -rf /usr/local/lib/python3.6/site-packages/*

I tried running the playbook again and received this error message:

stderr: Traceback (most recent call last):\n  File "/usr/local/bin/pip3", line 7, in <module>\n    from pip import main\nModuleNotFoundError: No module named 'pip'\n

I wasn’t able to run pip at all:

$ pip3
Traceback (most recent call last):
  File "/usr/local/bin/pip3", line 7, in <module>
    from pip import main
ModuleNotFoundError: No module named 'pip'

Clearly, I had deleted something important! I reinstalled Python from the gzipped source tarball for Python 3.6.1 (newer versions available from Python source releases) and then everything worked as expected.

Python Flask and VirtualBox networking

I had been using the Python socket module to create a very basic client-server for testing purposes, but soon I wanted to have something slightly more standard, like an HTTP server. I decided to try the Python Flask framework.

First I set up a Flask server on a CentOS 7 Linux VM running on VirtualBox:

# yum install python-pip
# pip install Flask
# mkdir flask-server && cd flask-server

I created the hello.py file as described on the Flask homepage:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

Likewise, I started running Flask:

# FLASK_APP=hello.py flask run
 * Serving Flask app "hello"
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Then I set up port forwarding in VirtualBox on my desktop host so that I could communicate with the virtual machine, using the following settings:

Name: flask
Protocol: TCP
Host IP: 127.0.0.1
Host Port: 9500
Guest IP: 10.0.2.16
Guest Port: 5000

VirtualBox port forwarding rules
VirtualBox port forwarding rules

I tested it in a browser (Firefox) on my desktop at http://127.0.0.1:9500/

No connection. Firefox endlessly tries to load the file.

I tried from the local machine itself:

# curl http://localhost:5000/
Hello World!

I tried running tcpdump to see what the network traffic to that port looked like:

# tcpdump -n -i enp0s3 port 5000
...
14:54:11.938625 IP 10.0.2.2.63923 > 10.0.2.16.commplex-main: Flags [S], seq 3067208705, win 65535, options [mss 1460], length 0
...

Over and over I saw the same SYN packet from the client host, but the server never replied with a SYN-ACK.

I also noted that the local port was labeled commplex-main. This label is from /etc/services:

# grep commplex /etc/services
commplex-main   5000/tcp                #
commplex-main   5000/udp                #
commplex-link   5001/tcp                #
commplex-link   5001/udp                #

I don’t know what commplex-main is, but since I’m not running anything else on port 5000 other than Flask, it shouldn’t matter.

It turned out there were 2 separate problems:

  1. Flask was listening only on localhost
  2. firewalld was blocking the requests from external hosts

To fix the first, run Flask with the host flag:

# FLASK_APP=hello.py flask run --host=0.0.0.0
 * Serving Flask app "hello"
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

(This is mentioned in the Flask Quickstart guide, under Externally Visible Server.)

You can also specify an alternative port, e.g.:

# FLASK_APP=hello.py flask run --host=0.0.0.0 --port=56789
 * Serving Flask app "hello"
 * Running on http://0.0.0.0:56789/ (Press CTRL+C to quit)

To fix the latter temporarily, I disabled firewalld:

systemctl stop firewalld
systemctl disable firewalld

Obviously, if you are dealing with a machine connected directly to the Internet, this would be a terrible solution. You’d want to add rules allowing only the hosts and ports from which you expect to receive connections. But for testing communications between my desktop and a virtual host running on it, this seemed like a quick solution.

After those 2 changes, I was able to load the sample “hello” Flask app in a browser:

The text "Hello World!" loaded in Firefox
The text “Hello World!” loaded in Firefox

Analyzing text to find common terms using Python and NLTK

I just recently started playing with the Python NLTK (Natural Language ToolKit) to analyze text. The book Natural Language Processing with Python is available online and is very helpful if you’re just getting started.

At the beginning of the book the examples cover importing and analyzing text (primarily books) that you import from nltk (Getting Started with NLTK). It includes texts like Moby-Dick and Sense and Sensibility.

But you will probably want to analyze a source of your own. For example, I had text from a series of tweets debating political issues. The third chapter (Accessing Text from the Web and from Disk) has the answers:

First you need to turn raw text into tokens:

tokens = word_tokenize(raw)

Next turn your tokens into NLTK text:

text = nltk.Text(tokens)

Now you can treat it like the book examples in chapter 1.

I was analyzing a number number of tweets. One of the things I wanted to do was find common words in the tweets, to see if there were particular keywords that were common.

I was using the Python interpreter for my tests, and I did run into a couple errors with word_tokenize and later FreqDist, such as:

>>> fdist1 = FreqDist(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'FreqDist' is not defined

You can address this by importing the specific libraries:

>>> from nltk import FreqDist

Here are the commands, in order, that I ran to produce my list of common words — in this case, I was looking for words that appeared at least 3 times and that were at least 5 characters long:

>>> import nltk
>>> from nltk import word_tokenize
>>> from nltk import FreqDist

>>> with open("corpus-twitter", "r") as myfile:
...     raw = myfile.read().decode("utf8")

>>> tokens = word_tokenize(raw)
>>> text = nltk.Text(tokens)

>>> fdist = FreqDist(text)
>>> sorted(w for w in set(text) if len(w) >= 5 and fdist[w] >= 3)

[u'Americans', u'Detroit', u'Please', u'TaxReform', u'Thanks', u'There', u'Trump', u'about', u'against', u'always', u'anyone', u'argument', u'because', u'being', u'believe', u'context', u'could', u'debate', u'defend', u'diluted', u'dollars', u'enough', u'every', u'going', u'happened', u'heard', u'human', u'ideas', u'immigration', u'indefensible', u'logic', u'never', u'opinion', u'people', u'point', u'pragmatic', u'problem', u'problems', u'proposed', u'public', u'question', u'really', u'restricting', u'right', u'saying', u'school', u'scope', u'serious', u'should', u'solution', u'still', u'talking', u'their', u'there', u'think', u'thinking', u'thread', u'times', u'truth', u'trying', u'tweet', u'understand', u'until', u'welfare', u'where', u'world', u'would', u'wrong', u'years', u'yesterday']

It turns out the results weren’t as interesting as I’d hoped. A few interesting items–Detroit for example–but most of the words aren’t surprising given I was looking at tweets around political debate. Perhaps with a larger corpus there would be more stand-out words.

Using a lightweight web server to debug requests

I’ve been working a lot with the Canvas API lately. One task was to add communication channels (e-mail addresses) to user accounts. I was able to add them successfully following the documentation’s cURL example using the skip_confirmation flag:

curl 'https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels'
-H "Authorization: Bearer [...]"
-d 'communication_channel[address]=chris@osric.com'
-d 'communication_channel[type]=email'
-d 'skip_confirmation=1'

However, when I ran the ColdFusion script I wrote, the user received notification of the addition even though the skip_confirmation flag was set:

<cfhttp url="https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels" method="post">
<cfhttpparam type="header" name="Authorization" value="Bearer [...]" />
<cfhttpparam type="formfield" name="communication_channel[address]" value="chris@osric.com" />
<cfhttpparam type="formfield" name="communication_channel[type]" value="email" />
<cfhttpparam type="formfield" name="skip_confirmation" value="1" />
</cfhttp>

Why didn’t the latter work as expected?

I needed to be able to tell what was different about the 2 requests, but it would be difficult to capture that outbound requests. Tools like Fiddler and WireShark could help, but I was sending one request from my local machine and another from a remote web server.

My idea for capturing the request data was to send the request to a server under my control. I grabbed a Python echo server and modified it slightly to print the data received. Then, instead of sending the requests to the Canvas API I sent them to the echo server. Here are the results of the 2 requests:

cURL
POST / HTTP/1.1
User-Agent: curl/7.24.0 (i686-pc-cygwin) libcurl/7.24.0 OpenSSL/0.9.8t zlib/1.2.5 libidn/1.22 libssh2/1.3.0
Host: osric.com:50000
Accept: */*
Authorization: Bearer [...]
Content-Length: 100
Content-Type: application/x-www-form-urlencoded

communication_channel[address]=chris@osric.com&communications_channel[type]=email&skip_confirmation=1

ColdFusion
POST / HTTP/1.1
User-Agent: ColdFusion
Content-Type: application/x-www-form-urlencoded
Connection: close
Authorization: Bearer [...]
Content-Length: 118
Host: osric.com:50000

communication%5Fchannel%5Baddress%5D=chris%40osric%2Ecom&communications%5Fchannel%5Btype%5D=email&skip%5Fconfirmation=1

I noted the different content lengths for the same data, and upon closer inspection, ColdFusion was URL-encoding characters. I added the encoded attribute to the cfhttpparam tags with a value of “no”, and then the request bodies were identical:

<cfhttp url="https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels" method="post">
<cfhttpparam type="header" name="Authorization" value="Bearer [...]" />
<cfhttpparam type="formfield" encoded="no" name="communication_channel[address]" value="chris@osric.com" />
<cfhttpparam type="formfield" encoded="no" name="communication_channel[type]" value="email" />
<cfhttpparam type="formfield" encoded="no" name="skip_confirmation" value="1" />
</cfhttp>

Not surprisingly that solved the problem.

Using FFmpeg to programmatically slice and splice video

My wife has a research project in which she needs to analyze brief (8-second) segments of hundreds of much longer videos. My goal was to take the videos (~30 minutes each) and cut out only the relevant sections and splice them together, including a static marker between each segment. This should allow her and her colleagues to analyze the videos quickly and using precise time-points (instead of using a slider in a video player to locate and estimate time-points). I’ve posted my notes from this process below for my own reference, and in case it should prove useful to anyone else.

To my knowledge, the best tool for the job is FFmpeg, an open source video tool. Continue reading Using FFmpeg to programmatically slice and splice video

Twitter Status IDs and Direct Message IDs

twitter-birdI recently created a Magic Eight Ball twitter-bot as a demo. Written in Python using the python-twitter API wrapper, it runs every 2 minutes and polls twitter for new replies (status updates containing @osric8ball) and direct messages (DMs) to osric8ball. If there are any, it replies with a random 8-Ball response.

Every status update and DM has an associated numeric ID. Initially, I stored the highest ID in a log file and used that when I polled twitter (i.e. “retrieve all replies and DMs with ID > highest ID”). However, I discovered that status updates and DMs apparently are stored in separate tables on twitter’s backend, as they have a separate set of IDs. Since the highest status ID was an order of magnitude larger than the highest DM ID, my bot completely ignored all DMs. This was not entirely obvious at first, as the IDs looked very similar, other than an extra digit: 2950029179 and 273876291.

My fix for this was to store both the highest status update ID and the highest DM ID is separate log files.

Another interesting twist: you have to be a follower of a user in order to send that user a DM. Continue reading Twitter Status IDs and Direct Message IDs