Python Flask, escaping HTML strings, and the Markup class

As in the previous post, I had created a simple web app using Python Flask to use as a teaching tool. The purpose was to demonstrate SQL injection and XSS (cross-site scripting) vulnerabilities and how to remediate them.

In this case, the remediation step for XSS (escaping output) tripped me up. I tried this:

return '<p>You searched for: ' + escape(user_input) + '</p>'

I expected it to escape only the user_input variable, but instead it escaped all the HTML, returning this:

&lt;p&gt;You searched for: &lt;script&gt;alert(1)&lt;/script&gt;&lt;/p&gt;


(Just want possible solutions? Scroll to the bottom. Otherwise, on to the….)

Details

The reason for this is that Flask.escape() returns a Markup object, not a string.

Both Markup and escape are imported into Flask from Jinja2:

from jinja2 import escape
from jinja2 import Markup

Which in turn comes from the module Markupsafe.

Markup is a subclass of text_type (which is essentially either str or unicode, depending on whether you are using Python2 or Python3).

The Markup class contains __add__ and __radd__ methods that handle the behavior when we apply arithmetic operators (see Emulating numeric types). In this case, those methods check to see if the other operand is compatible with strings, and if so, converts it to an escaped Markup object as well:

def __add__(self, other):
    if isinstance(other, string_types) or hasattr(other, "__html__"):
        return self.__class__(super(Markup, self).__add__(self.escape(other)))
    return NotImplemented

def __radd__(self, other):
    if hasattr(other, "__html__") or isinstance(other, string_types):
        return self.escape(other).__add__(self)
    return NotImplemented

(From the source code at src/markupsafe/__init__.py)

Surprising Results?

At first that seemed to me to violate the Principle Of Least Astonishment. But I realized I didn’t know what Python would do if I created a subclass of str and added a plain-old string to an object of my custom subclass. I decided to try it:

>>> import collections
>>> class MyStringType(collections.UserString):
...     pass
... 
>>> my_string1 = MyStringType("my test string")
>>> string1 = "plain-old string"
>>> cat_string1 = my_string1 + string1
>>> cat_string1
'my test stringplain-old string'
>>> type(my_string1)
<class '__main__.MyString'>
>>> type(string1)
<class 'str'>
>>> type(cat_string1)
<class '__main__.MyString'>

Interesting, the result of adding an object of a subclass of string and a plain-old string is an object of the subclass! It turns out, the collections.UserString object implements __add___ and __radd__ methods similar to what we saw in Markup:

def __add__(self, other):
    if isinstance(other, UserString):
        return self.__class__(self.data + other.data)
    elif isinstance(other, str):
        return self.__class__(self.data + other)
    return self.__class__(self.data + str(other))
def __radd__(self, other):
    if isinstance(other, str):
        return self.__class__(other + self.data)
    return self.__class__(str(other) + self.data)

(From the source code at cpython/Lib/collections/__init__.py)

Solutions

There are several different ways to combine the escaped string (a Markup object) and a string without converting the result to a Markup object. The following is by no means an exhaustive list:

Solution 1: str.format

>>> from flask import escape
>>> user_input = '<script>alert(1);</script>'
>>> html_output = '<p>You searched for: {}</p>'.format(escape(user_input))
>>> html_output
'<p>You searched for: &lt;script&gt;alert(1);&lt;/script&gt;</p>'

Solution 2: printf-style string formatting

>>> from flask import escape
>>> user_input = '<script>alert(1);</script>'
>>> html_output = '<p>You searched for: %s</p>' % escape(user_input)
>>> html_output
'<p>You searched for: &lt;script&gt;alert(1);&lt;/script&gt;</p>'

Solution 3: cast the Markup object to a str object:

>>> from flask import escape
>>> user_input = '<script>alert(1);</script>'
>>> html_output = '<p>You searched for: ' + str(escape(user_input)) + '</p>'
>>> html_output
'<p>You searched for: &lt;script&gt;alert(1);&lt;/script&gt;</p>'

Solution 4: Create a Markup object for the trusted HTML:

>>> from flask import escape, Markup
>>> user_input = '<script>alert(1);</script>'
>>> html_output = Markup('<p>You searched for: ') + escape(user_input) + Markup('</p>')
>>> html_output
Markup('<p>You searched for: &lt;script&gt;alert(1);&lt;/script&gt;</p>')

(A Markup object by default trusts the text passed to it on instantiation and does not escape it.)

A more likely solution, in practice, would be to use Jinja2 templates. Although Jinja2 templates do not automatically escape user input, they are configured to do so by Flask:

Flask configures Jinja2 to automatically escape all values unless explicitly told otherwise. This should rule out all XSS problems caused in templates….

(from Security Considerations — Flask Documentation (1.1.x): Cross-Site Scripting)

Leave a Reply

Your email address will not be published. Required fields are marked *