{"id":3137,"date":"2019-09-21T18:42:05","date_gmt":"2019-09-21T23:42:05","guid":{"rendered":"http:\/\/osric.com\/chris\/accidental-developer\/?p=3137"},"modified":"2019-09-21T18:42:24","modified_gmt":"2019-09-21T23:42:24","slug":"python-flask-escaping-html-strings-the-markup-class","status":"publish","type":"post","link":"https:\/\/osric.com\/chris\/accidental-developer\/2019\/09\/python-flask-escaping-html-strings-the-markup-class\/","title":{"rendered":"Python Flask, escaping HTML strings, and the Markup class"},"content":{"rendered":"<p>As in the previous post, I had created a simple web app using Python Flask to use as a teaching tool. The purpose was to demonstrate SQL injection and XSS (cross-site scripting) vulnerabilities and how to remediate them.<\/p>\n<p>In this case, the remediation step for XSS (escaping output) tripped me up. I tried this:<\/p>\n<pre><code>return '&lt;p&gt;You searched for: ' + escape(user_input) + '&lt;\/p&gt;'<\/code><\/pre>\n<p>I expected it to escape only the <code>user_input<\/code> variable, but instead it escaped all the HTML, returning this:<\/p>\n<pre><code>&amp;lt;p&amp;gt;You searched for: &amp;lt;script&amp;gt;alert(1)&amp;lt;\/script&amp;gt;&amp;lt;\/p&amp;gt;<\/code><\/pre>\n<p><!--more--><br \/>\n(Just want possible solutions? Scroll to the bottom. Otherwise, on to the&#8230;.)<\/p>\n<p><strong>Details<\/strong><\/p>\n<p>The reason for this is that <a href=\"https:\/\/flask.palletsprojects.com\/en\/1.1.x\/api\/#flask.escape\">Flask.escape()<\/a> returns a <code>Markup<\/code> object, not a string.<\/p>\n<p>Both <code>Markup<\/code> and <code>escape<\/code> are imported into Flask from <a href=\"https:\/\/jinja.palletsprojects.com\/en\/2.10.x\/api\/#jinja2.Markup\">Jinja2<\/a>:<\/p>\n<pre><code>from jinja2 import escape\r\nfrom jinja2 import Markup<\/code><\/pre>\n<p>Which in turn comes from the module <a href=\"https:\/\/github.com\/pallets\/markupsafe\">Markupsafe<\/a>. <\/p>\n<p><code>Markup<\/code> is a subclass of <code>text_type<\/code> (which is essentially either <code>str<\/code> or <code>unicode<\/code>, depending on whether you are using Python2 or Python3).<\/p>\n<p>The Markup class contains <code>__add__<\/code> and <code>__radd__<\/code> methods that handle the behavior when we apply arithmetic operators (see <a href=\"https:\/\/docs.python.org\/3\/reference\/datamodel.html#emulating-numeric-types\">Emulating numeric types<\/a>). In this case, those methods check to see if the other operand is compatible with strings, and if so, converts it to an escaped <code>Markup<\/code> object as well:<\/p>\n<pre><code>def __add__(self, other):\r\n    if isinstance(other, string_types) or hasattr(other, \"__html__\"):\r\n        return self.__class__(super(Markup, self).__add__(self.escape(other)))\r\n    return NotImplemented\r\n\r\ndef __radd__(self, other):\r\n    if hasattr(other, \"__html__\") or isinstance(other, string_types):\r\n        return self.escape(other).__add__(self)\r\n    return NotImplemented<\/code><\/pre>\n<p>(From the source code at <a href=\"https:\/\/github.com\/pallets\/markupsafe\/blob\/master\/src\/markupsafe\/__init__.py\">src\/markupsafe\/__init__.py<\/a>)<\/p>\n<p><strong>Surprising Results?<\/strong><\/p>\n<p>At first that seemed to me to violate the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Principle_of_least_astonishment\">Principle Of Least Astonishment<\/a>. But I realized I didn&#8217;t know what Python would do if I created a subclass of <code>str<\/code> and added a plain-old string to an object of my custom subclass. I decided to try it:<\/p>\n<pre><code>&gt;&gt;&gt; import collections\r\n&gt;&gt;&gt; class MyStringType(collections.UserString):\r\n...     pass\r\n... \r\n&gt;&gt;&gt; my_string1 = MyStringType(\"my test string\")\r\n&gt;&gt;&gt; string1 = \"plain-old string\"\r\n&gt;&gt;&gt; cat_string1 = my_string1 + string1\r\n&gt;&gt;&gt; cat_string1\r\n'my test stringplain-old string'\r\n&gt;&gt;&gt; type(my_string1)\r\n&lt;class '__main__.MyString'&gt;\r\n&gt;&gt;&gt; type(string1)\r\n&lt;class 'str'&gt;\r\n&gt;&gt;&gt; type(cat_string1)\r\n&lt;class '__main__.MyString'&gt;<\/code><\/pre>\n<p>Interesting, the result of adding an object of a subclass of string and a plain-old string is an object of the subclass! It turns out, the <code>collections.UserString<\/code> object implements <code>__add___<\/code> and <code>__radd__<\/code> methods similar to what we saw in <code>Markup<\/code>:<\/p>\n<pre><code>def __add__(self, other):\r\n    if isinstance(other, UserString):\r\n        return self.__class__(self.data + other.data)\r\n    elif isinstance(other, str):\r\n        return self.__class__(self.data + other)\r\n    return self.__class__(self.data + str(other))\r\ndef __radd__(self, other):\r\n    if isinstance(other, str):\r\n        return self.__class__(other + self.data)\r\n    return self.__class__(str(other) + self.data)<\/code><\/pre>\n<p>(From the source code at <a href=\"https:\/\/github.com\/python\/cpython\/blob\/master\/Lib\/collections\/__init__.py\">cpython\/Lib\/collections\/__init__.py<\/a>)<\/p>\n<p><strong>Solutions<\/strong><\/p>\n<p>There are several different ways to combine the escaped string (a <code>Markup<\/code> object) and a string without converting the result to a <code>Markup<\/code> object. The following is by no means an exhaustive list:<\/p>\n<p><strong>Solution 1: str.format<\/strong><\/p>\n<pre><code>&gt;&gt;&gt; from flask import escape\r\n&gt;&gt;&gt; user_input = '&lt;script&gt;alert(1);&lt;\/script&gt;'\r\n&gt;&gt;&gt; html_output = '&lt;p&gt;You searched for: {}&lt;\/p&gt;'.format(escape(user_input))\r\n&gt;&gt;&gt; html_output\r\n'&lt;p&gt;You searched for: &amp;lt;script&amp;gt;alert(1);&amp;lt;\/script&amp;gt;&lt;\/p&gt;'<\/code><\/pre>\n<p><strong>Solution 2: printf-style string formatting<\/strong><\/p>\n<pre><code>&gt;&gt;&gt; from flask import escape\r\n&gt;&gt;&gt; user_input = '&lt;script&gt;alert(1);&lt;\/script&gt;'\r\n&gt;&gt;&gt; html_output = '&lt;p&gt;You searched for: %s&lt;\/p&gt;' % escape(user_input)\r\n&gt;&gt;&gt; html_output\r\n'&lt;p&gt;You searched for: &amp;lt;script&amp;gt;alert(1);&amp;lt;\/script&amp;gt;&lt;\/p&gt;'<\/code><\/pre>\n<p><strong>Solution 3: cast the <code>Markup<\/code> object to a <code>str<\/code> object:<\/strong><\/p>\n<pre><code>&gt;&gt;&gt; from flask import escape\r\n&gt;&gt;&gt; user_input = '&lt;script&gt;alert(1);&lt;\/script&gt;'\r\n&gt;&gt;&gt; html_output = '&lt;p&gt;You searched for: ' + str(escape(user_input)) + '&lt;\/p&gt;'\r\n&gt;&gt;&gt; html_output\r\n'&lt;p&gt;You searched for: &amp;lt;script&amp;gt;alert(1);&amp;lt;\/script&amp;gt;&lt;\/p&gt;'<\/code><\/pre>\n<p><strong>Solution 4: Create a <code>Markup<\/code> object for the trusted HTML:<\/strong><\/p>\n<pre><code>&gt;&gt;&gt; from flask import escape, Markup\r\n&gt;&gt;&gt; user_input = '&lt;script&gt;alert(1);&lt;\/script&gt;'\r\n&gt;&gt;&gt; html_output = Markup('&lt;p&gt;You searched for: ') + escape(user_input) + Markup('&lt;\/p&gt;')\r\n&gt;&gt;&gt; html_output\r\nMarkup('&lt;p&gt;You searched for: &amp;lt;script&amp;gt;alert(1);&amp;lt;\/script&amp;gt;&lt;\/p&gt;')<\/code><\/pre>\n<p>(A <code>Markup<\/code> object by default trusts the text passed to it on instantiation and does not escape it.)<\/p>\n<p>A more likely solution, in practice, would be to use Jinja2 templates. Although Jinja2 templates do not automatically escape user input, they are configured to do so by Flask:<\/p>\n<blockquote><p>Flask configures Jinja2 to automatically escape all values unless explicitly told otherwise. This should rule out all XSS problems caused in templates&#8230;.<\/p><\/blockquote>\n<p>(from <a href=\"https:\/\/flask.palletsprojects.com\/en\/1.1.x\/security\/#cross-site-scripting-xss\">Security Considerations &#8212; Flask Documentation (1.1.x): Cross-Site Scripting<\/a>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As in the previous post, I had created a simple web app using Python Flask to use as a teaching tool. The purpose was to demonstrate SQL injection and XSS (cross-site scripting) vulnerabilities and how to remediate them. In this case, the remediation step for XSS (escaping output) tripped me up. I tried this: return &hellip; <a href=\"https:\/\/osric.com\/chris\/accidental-developer\/2019\/09\/python-flask-escaping-html-strings-the-markup-class\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Python Flask, escaping HTML strings, and the Markup class<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[86],"tags":[492,469,358,72],"class_list":["post-3137","post","type-post","status-publish","format-standard","hentry","category-python","tag-flask","tag-jinja2","tag-python","tag-xss"],"_links":{"self":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/comments?post=3137"}],"version-history":[{"count":7,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3137\/revisions"}],"predecessor-version":[{"id":3144,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3137\/revisions\/3144"}],"wp:attachment":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/media?parent=3137"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/categories?post=3137"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/tags?post=3137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}