|
1 | 1 | <!DOCTYPE qhelp PUBLIC
|
2 |
| -"-//Semmle//qhelp//EN" |
3 |
| -"qhelp.dtd"> |
| 2 | + "-//Semmle//qhelp//EN" |
| 3 | + "qhelp.dtd"> |
4 | 4 | <qhelp>
|
5 | 5 |
|
6 |
| - <include src="IncompleteSanitization.qhelp" /> |
| 6 | +<overview> |
| 7 | +<p> |
| 8 | +Sanitizing untrusted input is a common technique for preventing injection attacks and other security |
| 9 | +vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the |
| 10 | +regular expression matches multiple consecutive characters, replacing it just once |
| 11 | +can result in the unsafe text reappearing in the sanitized input. |
| 12 | +</p> |
| 13 | +<p> |
| 14 | +Attackers can exploit this issue by crafting inputs that, when sanitized with an ineffective regular |
| 15 | +expression, still contain malicious code or content. This can lead to code execution, data exposure, |
| 16 | +or other vulnerabilities. |
| 17 | +</p> |
| 18 | +</overview> |
7 | 19 |
|
| 20 | +<recommendation> |
| 21 | +<p> |
| 22 | +To prevent this issue, it is highly recommended to use a well-tested sanitization library whenever |
| 23 | +possible. These libraries are more likely to handle corner cases and ensure effective sanitization. |
| 24 | +</p> |
| 25 | + |
| 26 | +<p> |
| 27 | +If a library is not an option, you can consider alternative strategies to fix the issue. For example, |
| 28 | +applying the regular expression replacement repeatedly until no more replacements can be performed, or rewriting the regular |
| 29 | +expression to match single characters instead of the entire unsafe text. |
| 30 | +</p> |
| 31 | +</recommendation> |
| 32 | + |
| 33 | +<example> |
| 34 | +<p> |
| 35 | +Consider the following JavaScript code that aims to remove all HTML comment start and end tags: |
| 36 | +</p> |
| 37 | + |
| 38 | +<sample language="javascript"> |
| 39 | +str.replace(/<!--|--!?>/g, ""); |
| 40 | +</sample> |
| 41 | + |
| 42 | +<p> |
| 43 | +Given the input string "<!<!--- comment --->>", the output will be "<!-- comment -->", |
| 44 | +which still contains an HTML comment. |
| 45 | +</p> |
| 46 | + |
| 47 | +<p> |
| 48 | +One possible fix for this issue is to apply the regular expression replacement repeatedly until no |
| 49 | +more replacements can be performed. This ensures that the unsafe text does not re-appear in the sanitized input, effectively |
| 50 | +removing all instances of the targeted pattern: |
| 51 | +</p> |
| 52 | + |
| 53 | +<sample language="javascript"> |
| 54 | +function removeHtmlComments(input) { |
| 55 | + let previous; |
| 56 | + do { |
| 57 | + previous = input; |
| 58 | + input = input.replace(/<!--|--!?>/g, ""); |
| 59 | + } while (input !== previous); |
| 60 | + return input; |
| 61 | +} |
| 62 | +</sample> |
| 63 | +</example> |
| 64 | + |
| 65 | +<example> |
| 66 | +<p> |
| 67 | +Another example is the following regular expression intended to remove script tags: |
| 68 | +</p> |
| 69 | + |
| 70 | +<sample language="javascript"> |
| 71 | +str.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/g, ""); |
| 72 | +</sample> |
| 73 | + |
| 74 | +<p> |
| 75 | +If the input string is "<scrip<script>is removed</script>t>alert(123)</script>", |
| 76 | +the output will be "<script>alert(123)</script>", which still contains a script tag. |
| 77 | +</p> |
| 78 | +<p> |
| 79 | +A fix for this issue is to rewrite the regular expression to match single characters |
| 80 | +("<" and ">") instead of the entire unsafe text. This simplifies the sanitization process |
| 81 | +and ensures that all potentially unsafe characters are removed: |
| 82 | +</p> |
| 83 | +<sample language="javascript"> |
| 84 | +function removeAllHtmlTags(input) { |
| 85 | + return input.replace(/<|>/g, ""); |
| 86 | +} |
| 87 | +</sample> |
| 88 | +<p> |
| 89 | +Another potential fix is to use the popular <code>sanitize-html</code> npm library. |
| 90 | +It keeps most of the safe HTML tags while removing all unsafe tags and attributes. |
| 91 | +</p> |
| 92 | +<sample language="javascript"> |
| 93 | +const sanitizeHtml = require("sanitize-html"); |
| 94 | +function removeAllHtmlTags(input) { |
| 95 | + return sanitizeHtml(input); |
| 96 | +} |
| 97 | +</sample> |
| 98 | + |
| 99 | +</example> |
| 100 | + |
| 101 | +<example> |
| 102 | +<p> |
| 103 | +Lastly, consider a path sanitizer using the regular expression <code>/\.\.\//</code>: |
| 104 | +</p> |
| 105 | + |
| 106 | +<sample language="javascript"> |
| 107 | +str.replace(/\.\.\//g, ""); |
| 108 | +</sample> |
| 109 | + |
| 110 | +<p> |
| 111 | +The regular expression attempts to strip out all occurences of <code>/../</code> from <code>str</code>. |
| 112 | +This will not work as expected: for the string <code>/./.././</code>, for example, it will remove the single |
| 113 | +occurrence of <code>/../</code> in the middle, but the remainder of the string then becomes |
| 114 | +<code>/../</code>, which is another instance of the substring we were trying to remove. |
| 115 | +</p> |
| 116 | + |
| 117 | +<p> |
| 118 | +A possible fix for this issue is to use the "sanitize-filename" npm library for path sanitization. |
| 119 | +This library is specifically designed to handle path sanitization, and should handle all corner cases |
| 120 | +and ensure effective sanitization: |
| 121 | +</p> |
| 122 | + |
| 123 | +<sample language="javascript"> |
| 124 | +const sanitize = require("sanitize-filename"); |
| 125 | + |
| 126 | +function sanitizePath(input) { |
| 127 | + return sanitize(input); |
| 128 | +} |
| 129 | +</sample> |
| 130 | + |
| 131 | +</example> |
| 132 | + |
| 133 | +<references> |
| 134 | +<li>OWASP Top 10: <a href="https://www.owasp.org/index.php/Top_10-2017_A1-Injection">A1 Injection</a>.</li> |
| 135 | +<li>Stack Overflow: <a href="https://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression">Removing all script tags from HTML with JS regular expression</a>.</li> |
| 136 | +</references> |
8 | 137 | </qhelp>
|
0 commit comments