|
| 1 | +<!DOCTYPE qhelp PUBLIC |
| 2 | + "-//Semmle//qhelp//EN" |
| 3 | + "qhelp.dtd"> |
| 4 | +<qhelp> |
| 5 | + |
| 6 | +<overview> |
| 7 | +<p> |
| 8 | +It is possible to match some single HTML tags using regular expressions (parsing general HTML using |
| 9 | +regular expressions is impossible). However, if the regular expression is not written well it might |
| 10 | +be possible to circumvent it, which can lead to cross-site scripting or other security issues. |
| 11 | +</p> |
| 12 | +<p> |
| 13 | +Some of these mistakes are caused by browsers having very forgiving HTML parsers, and |
| 14 | +will often render invalid HTML containing syntax errors. |
| 15 | +Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors. |
| 16 | +</p> |
| 17 | +</overview> |
| 18 | + |
| 19 | +<recommendation> |
| 20 | +<p> |
| 21 | +Use a well-tested sanitization or parser library if at all possible. These libraries are much more |
| 22 | +likely to handle corner cases correctly than a custom implementation. |
| 23 | +</p> |
| 24 | +</recommendation> |
| 25 | + |
| 26 | +<example> |
| 27 | +<p> |
| 28 | +The following example attempts to filters out all <code><script></code> tags. |
| 29 | +</p> |
| 30 | + |
| 31 | +<sample src="examples/BadTagFilter.rb" /> |
| 32 | + |
| 33 | +<p> |
| 34 | +The above sanitizer does not filter out all <code><script></code> tags. |
| 35 | +Browsers will not only accept <code></script></code> as script end tags, but also tags such as <code></script foo="bar"></code> even though it is a parser error. |
| 36 | +This means that an attack string such as <code><script>alert(1)</script foo="bar"></code> will not be filtered by |
| 37 | +the function, and <code>alert(1)</code> will be executed by a browser if the string is rendered as HTML. |
| 38 | +</p> |
| 39 | + |
| 40 | +<p> |
| 41 | +Other corner cases include that HTML comments can end with <code>--!></code>, |
| 42 | +and that HTML tag names can contain upper case characters. |
| 43 | +</p> |
| 44 | +</example> |
| 45 | + |
| 46 | +<references> |
| 47 | +<li>Securitum: <a href="https://research.securitum.com/the-curious-case-of-copy-paste/">The Curious Case of Copy & Paste</a>.</li> |
| 48 | +<li>stackoverflow.com: <a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454">You can't parse [X]HTML with regex</a>.</li> |
| 49 | +<li>HTML Standard: <a href="https://html.spec.whatwg.org/multipage/parsing.html#comment-end-bang-state">Comment end bang state</a>.</li> |
| 50 | +<li>stackoverflow.com: <a href="https://stackoverflow.com/questions/25559999/why-arent-browsers-strict-about-html">Why aren't browsers strict about HTML?</a>.</li> |
| 51 | +</references> |
| 52 | +</qhelp> |
| 53 | + |
| 54 | + |
0 commit comments