Potential fix for code scanning alert no. 3: Incomplete URL substring sanitization#15
Potential fix for code scanning alert no. 3: Incomplete URL substring sanitization#15BetterMint merged 1 commit intomainfrom
Conversation
… sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
| def handle_starttag(self, tag, attrs): | ||
| for attr in attrs: | ||
| if attr[0] in ('href', 'src'): | ||
| if attr[1].startswith("http") and "://example.com" in attr[1]: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix the substring-based check of "://example.com" in the parsing logic, replace it with code that parses the URL and accurately checks the hostname. Specifically, use Python's standard library urllib.parse.urlparse to extract the hostname of the URL in attr[1], then compare the hostname (case-insensitively) to example.com (optionally allowing for subdomains, based on requirements). This avoids errors due to the substring appearing outside the actual hostname.
Required file changes:
- In
ExampleComLinkParser.handle_starttag, replace the conditionattr[1].startswith("http") and "://example.com" in attr[1]with:- Parse
attr[1]withurlparse - Check if
parsed.hostname(lowercased) is exactlyexample.com(or startswith/endswith checks if subdomains are desired).
- Parse
Implementation additions:
- Import
urllib.parse.urlparseat the top of the file, if not present.
| @@ -1,6 +1,7 @@ | ||
| import json | ||
| from unittest import mock | ||
| from html.parser import HTMLParser | ||
| from urllib.parse import urlparse | ||
| from examples.contrib.webscanner_helper.urlinjection import HTMLInjection | ||
| from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator | ||
| from examples.contrib.webscanner_helper.urlinjection import logger | ||
| @@ -46,8 +47,12 @@ | ||
| def handle_starttag(self, tag, attrs): | ||
| for attr in attrs: | ||
| if attr[0] in ('href', 'src'): | ||
| if attr[1].startswith("http") and "://example.com" in attr[1]: | ||
| self.example_com_links.append(attr[1]) | ||
| try: | ||
| parsed = urlparse(attr[1]) | ||
| if parsed.scheme in ("http", "https") and parsed.hostname and parsed.hostname.lower() == "example.com": | ||
| self.example_com_links.append(attr[1]) | ||
| except Exception: | ||
| pass | ||
|
|
||
| parser = ExampleComLinkParser() | ||
| parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content)) |
|
|
||
| parser = ExampleComLinkParser() | ||
| parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content)) | ||
| assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links) |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix the issue, you should parse each URL using Python's urllib.parse.urlparse function and then inspect the .hostname attribute of the resulting object. Your test should then confirm that the hostname of each found URL is exactly example.com or, if the intention is to allow subdomains, that it ends with .example.com. This prevents various hostname tricks, such as example.com.attacker.com or example.com@evil.com, which would otherwise match a naive prefix check.
Specifically, in file examples/contrib/webscanner_helper/test_urlinjection.py, locate line 54:
assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links)and replace it with:
from urllib.parse import urlparse
assert any(urlparse(url).hostname == "example.com" for url in parser.example_com_links)Also, ensure from urllib.parse import urlparse is imported at the top of the file (after existing imports).
| @@ -1,6 +1,7 @@ | ||
| import json | ||
| from unittest import mock | ||
| from html.parser import HTMLParser | ||
| from urllib.parse import urlparse | ||
| from examples.contrib.webscanner_helper.urlinjection import HTMLInjection | ||
| from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator | ||
| from examples.contrib.webscanner_helper.urlinjection import logger | ||
| @@ -51,7 +52,7 @@ | ||
|
|
||
| parser = ExampleComLinkParser() | ||
| parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content)) | ||
| assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links) | ||
| assert any(urlparse(url).hostname == "example.com" for url in parser.example_com_links) | ||
|
|
||
| def test_inject_insert_body(self): | ||
| html_injection = HTMLInjection(insert=True) |
|
|
||
| parser = ExampleComLinkParser() | ||
| parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content)) | ||
| assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links) |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix this problem, the test should parse each URL using urllib.parse, extract the hostname, and check if it matches exactly example.com instead of relying on substring or prefix matches. This means replacing the startswith checks in line 54 with logic using urllib.parse.urlparse(url).hostname == "example.com". You will need to import the urllib.parse module. Only examples/contrib/webscanner_helper/test_urlinjection.py, line 54, needs changes, plus the required import.
| @@ -1,6 +1,7 @@ | ||
| import json | ||
| from unittest import mock | ||
| from html.parser import HTMLParser | ||
| import urllib.parse | ||
| from examples.contrib.webscanner_helper.urlinjection import HTMLInjection | ||
| from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator | ||
| from examples.contrib.webscanner_helper.urlinjection import logger | ||
| @@ -51,7 +52,7 @@ | ||
|
|
||
| parser = ExampleComLinkParser() | ||
| parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content)) | ||
| assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links) | ||
| assert any(urllib.parse.urlparse(url).hostname == "example.com" for url in parser.example_com_links) | ||
|
|
||
| def test_inject_insert_body(self): | ||
| html_injection = HTMLInjection(insert=True) |
Potential fix for https://github.com/BetterMint/BetterMITM/security/code-scanning/3
The test should not rely on a simple substring match, but rather should parse and analyze the output to ensure that "example.com" is present in the intended, safe context. If the output is HTML that should contain a link to "example.com", the test should parse the HTML and check for the presence of an
href(or similar) attribute referencing it (using, for example,html.parserfrom the Python standard library orBeautifulSoup).Concretely:
assert "example.com" in str(f.response.content), use an HTML parser to extract all links/URLs from the response content and check that at least one is to "example.com" (and, optionally, NOT as a substring of some other, unrelated domain).html.parser(with some code) or more robustly withBeautifulSoup(which is commonly available in test environments); since the file hasn't imported BeautifulSoup, use the built-inhtml.parser.Required changes:
HTMLParserfromhtml.parser.f.response.contentand collects all URLs containing "example.com" in, e.g.,hrefattributes.Suggested fixes powered by Copilot Autofix. Review carefully before merging.