As in the previous post, I had created a simple web app using Python Flask to use as a teaching tool. The purpose was to demonstrate SQL injection and XSS (cross-site scripting) vulnerabilities and how to remediate them.
In this case, the remediation step for XSS (escaping output) tripped me up. I tried this:
return '<p>You searched for: ' + escape(user_input) + '</p>'
I expected it to escape only the user_input
variable, but instead it escaped all the HTML, returning this:
<p>You searched for: <script>alert(1)</script></p>
(Just want possible solutions? Scroll to the bottom. Otherwise, on to the….)
Details
The reason for this is that Flask.escape() returns a Markup
object, not a string.
Both Markup
and escape
are imported into Flask from Jinja2:
from jinja2 import escape
from jinja2 import Markup
Which in turn comes from the module Markupsafe.
Markup
is a subclass of text_type
(which is essentially either str
or unicode
, depending on whether you are using Python2 or Python3).
The Markup class contains __add__
and __radd__
methods that handle the behavior when we apply arithmetic operators (see Emulating numeric types). In this case, those methods check to see if the other operand is compatible with strings, and if so, converts it to an escaped Markup
object as well:
def __add__(self, other):
if isinstance(other, string_types) or hasattr(other, "__html__"):
return self.__class__(super(Markup, self).__add__(self.escape(other)))
return NotImplemented
def __radd__(self, other):
if hasattr(other, "__html__") or isinstance(other, string_types):
return self.escape(other).__add__(self)
return NotImplemented
(From the source code at src/markupsafe/__init__.py)
Surprising Results?
At first that seemed to me to violate the Principle Of Least Astonishment. But I realized I didn’t know what Python would do if I created a subclass of str
and added a plain-old string to an object of my custom subclass. I decided to try it:
>>> import collections
>>> class MyStringType(collections.UserString):
... pass
...
>>> my_string1 = MyStringType("my test string")
>>> string1 = "plain-old string"
>>> cat_string1 = my_string1 + string1
>>> cat_string1
'my test stringplain-old string'
>>> type(my_string1)
<class '__main__.MyString'>
>>> type(string1)
<class 'str'>
>>> type(cat_string1)
<class '__main__.MyString'>
Interesting, the result of adding an object of a subclass of string and a plain-old string is an object of the subclass! It turns out, the collections.UserString
object implements __add___
and __radd__
methods similar to what we saw in Markup
:
def __add__(self, other):
if isinstance(other, UserString):
return self.__class__(self.data + other.data)
elif isinstance(other, str):
return self.__class__(self.data + other)
return self.__class__(self.data + str(other))
def __radd__(self, other):
if isinstance(other, str):
return self.__class__(other + self.data)
return self.__class__(str(other) + self.data)
(From the source code at cpython/Lib/collections/__init__.py)
Solutions
There are several different ways to combine the escaped string (a Markup
object) and a string without converting the result to a Markup
object. The following is by no means an exhaustive list:
Solution 1: str.format
>>> from flask import escape
>>> user_input = '<script>alert(1);</script>'
>>> html_output = '<p>You searched for: {}</p>'.format(escape(user_input))
>>> html_output
'<p>You searched for: <script>alert(1);</script></p>'
Solution 2: printf-style string formatting
>>> from flask import escape
>>> user_input = '<script>alert(1);</script>'
>>> html_output = '<p>You searched for: %s</p>' % escape(user_input)
>>> html_output
'<p>You searched for: <script>alert(1);</script></p>'
Solution 3: cast the Markup
object to a str
object:
>>> from flask import escape
>>> user_input = '<script>alert(1);</script>'
>>> html_output = '<p>You searched for: ' + str(escape(user_input)) + '</p>'
>>> html_output
'<p>You searched for: <script>alert(1);</script></p>'
Solution 4: Create a Markup
object for the trusted HTML:
>>> from flask import escape, Markup
>>> user_input = '<script>alert(1);</script>'
>>> html_output = Markup('<p>You searched for: ') + escape(user_input) + Markup('</p>')
>>> html_output
Markup('<p>You searched for: <script>alert(1);</script></p>')
(A Markup
object by default trusts the text passed to it on instantiation and does not escape it.)
A more likely solution, in practice, would be to use Jinja2 templates. Although Jinja2 templates do not automatically escape user input, they are configured to do so by Flask:
Flask configures Jinja2 to automatically escape all values unless explicitly told otherwise. This should rule out all XSS problems caused in templates….
(from Security Considerations — Flask Documentation (1.1.x): Cross-Site Scripting)