DIY Gist Chatbots

[This was originally posted at the now-defunct impractical.bot on 23 Feb 2019]

I created a tool that will allow anyone to experiment with NLTK (Natural Language Toolkit) chatbots without writing any Python code. The repository for the backend code is available on GitHub: Docker NLTK chatbot.

I plan to expand on this idea, but it is usable now. In order to create your own bot:

  • Create a GitHub account
  • Create a “gist” or fork my demo gist: Greetings Bot Source
  • Customize the name, match, and replies elements
  • Note your username and the unique ID of your gist (a hash value, a 32-character string of letters and numbers)
  • Visit http://osric.com/chat/user/hash, replacing user with your GitHub username and hash with the unique ID of your gist. For an example, see Greetings Bot.

You can now interact with your custom bot, or share the link with your friends!

One more thing: if you update your gist, you’ll need to let the site know to update the code. Just click the “Reload Source” link on the chat page.

Modifying a packet capture with Scapy

My motivation was to start from a known good packet capture, for example, a DNS request and reply, and modify that request to create something interesting: an example to examine in Wireshark, or positive and negative test cases for an IDS software (Snort, Suricata).

I haven’t done much with Scapy before, but it seemed like the right tool for the task. My planned steps were as follows:

  1. Take pcap (packet capture)
  2. Import pcap via scapy
  3. Modify pcap
  4. Export pcap
  5. View pcap in Wireshark

Continue reading Modifying a packet capture with Scapy

Python Flask, escaping HTML strings, and the Markup class

As in the previous post, I had created a simple web app using Python Flask to use as a teaching tool. The purpose was to demonstrate SQL injection and XSS (cross-site scripting) vulnerabilities and how to remediate them.

In this case, the remediation step for XSS (escaping output) tripped me up. I tried this:

return '<p>You searched for: ' + escape(user_input) + '</p>'

I expected it to escape only the user_input variable, but instead it escaped all the HTML, returning this:

&lt;p&gt;You searched for: &lt;script&gt;alert(1)&lt;/script&gt;&lt;/p&gt;

Continue reading Python Flask, escaping HTML strings, and the Markup class

Python, tuples, sequences, and parameterized SQL queries

I recently developed a teaching tool using the Python Flask framework to demonstrate SQL injection and XSS (cross-site scripting) vulnerabilities and how to remediate them.

The remediation step for SQL injection tripped me up though when I received the following error message:

sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 4 supplied.

Continue reading Python, tuples, sequences, and parameterized SQL queries

AttributeError: module ‘paramiko’ has no attribute ‘SSHClient’

I have a simple Python 3 script (I’m running Python 3.6.1, compiled from source) that does the following 3 things:

  1. Connects to remote server(s) and using scp to get files
  2. Processes the files.
  3. Connects to another remote server and uploads the processed files.

I moved this script from a soon-to-be-retired CentOS 6 host to a CentOS 7 host, and when I ran it there I received the following error message:

AttributeError: module 'paramiko' has no attribute 'SSHClient'

The line number specified in the stack trace led me to:

ssh = paramiko.SSHClient()

First things first: Google the error. Someone else has seen this before. Sure enough, on StackOverflow I found Paramiko: Module object has no attribute error ‘SSHClient’

But no, that’s not the problem I’m having.

I tried to create the the simplest possible script that would reproduce the error:

#!/usr/bin/python3

import paramiko

def main():
        ssh = paramiko.SSHClient()

if __name__ == "__main__":
        main()

I ran it as a super-user and received no errors:

$ sudo /usr/bin/python3 test.py

I ran it as myself, though, and it reproduced the error message. Maybe something with the user permissions?

$ ls -l /usr/local/lib/python3.6/site-packages/
total 1168
...
drwxr-x---.  3 root root   4096 May 29 14:25 paramiko
...

Oh! From that you can see that unless you are the root user, or a member of the root group, there’s no way you can even see the files within the paramiko directory.

What’s the default umask on the system?

$ umask
0027

That explains it. Now, to fix it. I could probably just run:

$ sudo chmod -R 0755 /usr/local/lib/python3.6/site-packages/*

That should have been the end of that, problem solved! But in my case, installation of the pip modules had been handled by Ansible. I needed to fix the Ansible tasks to account for restrictive umask settings on future deployments. See the umask parameter in the documentation for the Ansible pip module. I updated the task:

- name: Install specified python requirements
  pip:
    executable: /usr/local/bin/pip3
    requirements: /tmp/pip3-packages
    umask: 0022

Running the playbook with that task, I received an error:

fatal: [trinculo.osric.net]: FAILED! => {"changed": false, "details": "invalid literal for int() with base 8: '18'", "msg": "umask must be an octal integer"}

Another helpful StackOverflow post suggested the value needed to be in quotes:

- name: Install specified python requirements
  pip:
    executable: /usr/local/bin/pip3
    requirements: /tmp/pip3-packages
    umask: "0022"

Now the playbook runs without error, but it doesn’t change the existing permissions. The task does nothing, since the Python pip modules are already installed. To really test the playbook, I need to clear out the existing modules first.

Warning: this breaks things!:

$ sudo rm -rf /usr/local/lib/python3.6/site-packages/*

I tried running the playbook again and received this error message:

stderr: Traceback (most recent call last):\n  File "/usr/local/bin/pip3", line 7, in <module>\n    from pip import main\nModuleNotFoundError: No module named 'pip'\n

I wasn’t able to run pip at all:

$ pip3
Traceback (most recent call last):
  File "/usr/local/bin/pip3", line 7, in <module>
    from pip import main
ModuleNotFoundError: No module named 'pip'

Clearly, I had deleted something important! I reinstalled Python from the gzipped source tarball for Python 3.6.1 (newer versions available from Python source releases) and then everything worked as expected.

Python Flask and VirtualBox networking

I had been using the Python socket module to create a very basic client-server for testing purposes, but soon I wanted to have something slightly more standard, like an HTTP server. I decided to try the Python Flask framework.

First I set up a Flask server on a CentOS 7 Linux VM running on VirtualBox:

# yum install python-pip
# pip install Flask
# mkdir flask-server && cd flask-server

I created the hello.py file as described on the Flask homepage:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

Likewise, I started running Flask:

# FLASK_APP=hello.py flask run
 * Serving Flask app "hello"
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Then I set up port forwarding in VirtualBox on my desktop host so that I could communicate with the virtual machine, using the following settings:

Name: flask
Protocol: TCP
Host IP: 127.0.0.1
Host Port: 9500
Guest IP: 10.0.2.16
Guest Port: 5000

VirtualBox port forwarding rules
VirtualBox port forwarding rules

I tested it in a browser (Firefox) on my desktop at http://127.0.0.1:9500/

No connection. Firefox endlessly tries to load the file.

I tried from the local machine itself:

# curl http://localhost:5000/
Hello World!

I tried running tcpdump to see what the network traffic to that port looked like:

# tcpdump -n -i enp0s3 port 5000
...
14:54:11.938625 IP 10.0.2.2.63923 > 10.0.2.16.commplex-main: Flags [S], seq 3067208705, win 65535, options [mss 1460], length 0
...

Over and over I saw the same SYN packet from the client host, but the server never replied with a SYN-ACK.

I also noted that the local port was labeled commplex-main. This label is from /etc/services:

# grep commplex /etc/services
commplex-main   5000/tcp                #
commplex-main   5000/udp                #
commplex-link   5001/tcp                #
commplex-link   5001/udp                #

I don’t know what commplex-main is, but since I’m not running anything else on port 5000 other than Flask, it shouldn’t matter.

It turned out there were 2 separate problems:

  1. Flask was listening only on localhost
  2. firewalld was blocking the requests from external hosts

To fix the first, run Flask with the host flag:

# FLASK_APP=hello.py flask run --host=0.0.0.0
 * Serving Flask app "hello"
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

(This is mentioned in the Flask Quickstart guide, under Externally Visible Server.)

You can also specify an alternative port, e.g.:

# FLASK_APP=hello.py flask run --host=0.0.0.0 --port=56789
 * Serving Flask app "hello"
 * Running on http://0.0.0.0:56789/ (Press CTRL+C to quit)

To fix the latter temporarily, I disabled firewalld:

systemctl stop firewalld
systemctl disable firewalld

Obviously, if you are dealing with a machine connected directly to the Internet, this would be a terrible solution. You’d want to add rules allowing only the hosts and ports from which you expect to receive connections. But for testing communications between my desktop and a virtual host running on it, this seemed like a quick solution.

After those 2 changes, I was able to load the sample “hello” Flask app in a browser:

The text "Hello World!" loaded in Firefox
The text “Hello World!” loaded in Firefox

Analyzing text to find common terms using Python and NLTK

I just recently started playing with the Python NLTK (Natural Language ToolKit) to analyze text. The book Natural Language Processing with Python is available online and is very helpful if you’re just getting started.

At the beginning of the book the examples cover importing and analyzing text (primarily books) that you import from nltk (Getting Started with NLTK). It includes texts like Moby-Dick and Sense and Sensibility.

But you will probably want to analyze a source of your own. For example, I had text from a series of tweets debating political issues. The third chapter (Accessing Text from the Web and from Disk) has the answers:

First you need to turn raw text into tokens:

tokens = word_tokenize(raw)

Next turn your tokens into NLTK text:

text = nltk.Text(tokens)

Now you can treat it like the book examples in chapter 1.

I was analyzing a number number of tweets. One of the things I wanted to do was find common words in the tweets, to see if there were particular keywords that were common.

I was using the Python interpreter for my tests, and I did run into a couple errors with word_tokenize and later FreqDist, such as:

>>> fdist1 = FreqDist(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'FreqDist' is not defined

You can address this by importing the specific libraries:

>>> from nltk import FreqDist

Here are the commands, in order, that I ran to produce my list of common words — in this case, I was looking for words that appeared at least 3 times and that were at least 5 characters long:

>>> import nltk
>>> from nltk import word_tokenize
>>> from nltk import FreqDist

>>> with open("corpus-twitter", "r") as myfile:
...     raw = myfile.read().decode("utf8")

>>> tokens = word_tokenize(raw)
>>> text = nltk.Text(tokens)

>>> fdist = FreqDist(text)
>>> sorted(w for w in set(text) if len(w) >= 5 and fdist[w] >= 3)

[u'Americans', u'Detroit', u'Please', u'TaxReform', u'Thanks', u'There', u'Trump', u'about', u'against', u'always', u'anyone', u'argument', u'because', u'being', u'believe', u'context', u'could', u'debate', u'defend', u'diluted', u'dollars', u'enough', u'every', u'going', u'happened', u'heard', u'human', u'ideas', u'immigration', u'indefensible', u'logic', u'never', u'opinion', u'people', u'point', u'pragmatic', u'problem', u'problems', u'proposed', u'public', u'question', u'really', u'restricting', u'right', u'saying', u'school', u'scope', u'serious', u'should', u'solution', u'still', u'talking', u'their', u'there', u'think', u'thinking', u'thread', u'times', u'truth', u'trying', u'tweet', u'understand', u'until', u'welfare', u'where', u'world', u'would', u'wrong', u'years', u'yesterday']

It turns out the results weren’t as interesting as I’d hoped. A few interesting items–Detroit for example–but most of the words aren’t surprising given I was looking at tweets around political debate. Perhaps with a larger corpus there would be more stand-out words.

Twitter Status IDs and Direct Message IDs

twitter-birdI recently created a Magic Eight Ball twitter-bot as a demo. Written in Python using the python-twitter API wrapper, it runs every 2 minutes and polls twitter for new replies (status updates containing @osric8ball) and direct messages (DMs) to osric8ball. If there are any, it replies with a random 8-Ball response.

Every status update and DM has an associated numeric ID. Initially, I stored the highest ID in a log file and used that when I polled twitter (i.e. “retrieve all replies and DMs with ID > highest ID”). However, I discovered that status updates and DMs apparently are stored in separate tables on twitter’s backend, as they have a separate set of IDs. Since the highest status ID was an order of magnitude larger than the highest DM ID, my bot completely ignored all DMs. This was not entirely obvious at first, as the IDs looked very similar, other than an extra digit: 2950029179 and 273876291.

My fix for this was to store both the highest status update ID and the highest DM ID is separate log files.

Another interesting twist: you have to be a follower of a user in order to send that user a DM. Continue reading Twitter Status IDs and Direct Message IDs