Skip to content

Potential fix for code scanning alert no. 3: Incomplete URL substring sanitization#15

Merged
BetterMint merged 1 commit intomainfrom
alert-autofix-3
Dec 15, 2025
Merged

Potential fix for code scanning alert no. 3: Incomplete URL substring sanitization#15
BetterMint merged 1 commit intomainfrom
alert-autofix-3

Conversation

@BetterMint
Copy link
Owner

Potential fix for https://github.com/BetterMint/BetterMITM/security/code-scanning/3

The test should not rely on a simple substring match, but rather should parse and analyze the output to ensure that "example.com" is present in the intended, safe context. If the output is HTML that should contain a link to "example.com", the test should parse the HTML and check for the presence of an href (or similar) attribute referencing it (using, for example, html.parser from the Python standard library or BeautifulSoup).

Concretely:

  • In the affected assertion, instead of assert "example.com" in str(f.response.content), use an HTML parser to extract all links/URLs from the response content and check that at least one is to "example.com" (and, optionally, NOT as a substring of some other, unrelated domain).
  • This can be done using the standard library's html.parser (with some code) or more robustly with BeautifulSoup (which is commonly available in test environments); since the file hasn't imported BeautifulSoup, use the built-in html.parser.

Required changes:

  • Add an import of HTMLParser from html.parser.
  • Define a small helper class/function within the test method or at the module level, which parses f.response.content and collects all URLs containing "example.com" in, e.g., href attributes.
  • Replace the substring assertion with one that asserts the correct and intended presence of "example.com".

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

… sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
def handle_starttag(self, tag, attrs):
for attr in attrs:
if attr[0] in ('href', 'src'):
if attr[1].startswith("http") and "://example.com" in attr[1]:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
://example.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 months ago

To fix the substring-based check of "://example.com" in the parsing logic, replace it with code that parses the URL and accurately checks the hostname. Specifically, use Python's standard library urllib.parse.urlparse to extract the hostname of the URL in attr[1], then compare the hostname (case-insensitively) to example.com (optionally allowing for subdomains, based on requirements). This avoids errors due to the substring appearing outside the actual hostname.
Required file changes:

  • In ExampleComLinkParser.handle_starttag, replace the condition attr[1].startswith("http") and "://example.com" in attr[1] with:
    • Parse attr[1] with urlparse
    • Check if parsed.hostname (lowercased) is exactly example.com (or startswith/endswith checks if subdomains are desired).

Implementation additions:

  • Import urllib.parse.urlparse at the top of the file, if not present.

Suggested changeset 1
examples/contrib/webscanner_helper/test_urlinjection.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/examples/contrib/webscanner_helper/test_urlinjection.py b/examples/contrib/webscanner_helper/test_urlinjection.py
--- a/examples/contrib/webscanner_helper/test_urlinjection.py
+++ b/examples/contrib/webscanner_helper/test_urlinjection.py
@@ -1,6 +1,7 @@
 import json
 from unittest import mock
 from html.parser import HTMLParser
+from urllib.parse import urlparse
 from examples.contrib.webscanner_helper.urlinjection import HTMLInjection
 from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator
 from examples.contrib.webscanner_helper.urlinjection import logger
@@ -46,8 +47,12 @@
             def handle_starttag(self, tag, attrs):
                 for attr in attrs:
                     if attr[0] in ('href', 'src'):
-                        if attr[1].startswith("http") and "://example.com" in attr[1]:
-                            self.example_com_links.append(attr[1])
+                        try:
+                            parsed = urlparse(attr[1])
+                            if parsed.scheme in ("http", "https") and parsed.hostname and parsed.hostname.lower() == "example.com":
+                                self.example_com_links.append(attr[1])
+                        except Exception:
+                            pass
 
         parser = ExampleComLinkParser()
         parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content))
EOF
@@ -1,6 +1,7 @@
import json
from unittest import mock
from html.parser import HTMLParser
from urllib.parse import urlparse
from examples.contrib.webscanner_helper.urlinjection import HTMLInjection
from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator
from examples.contrib.webscanner_helper.urlinjection import logger
@@ -46,8 +47,12 @@
def handle_starttag(self, tag, attrs):
for attr in attrs:
if attr[0] in ('href', 'src'):
if attr[1].startswith("http") and "://example.com" in attr[1]:
self.example_com_links.append(attr[1])
try:
parsed = urlparse(attr[1])
if parsed.scheme in ("http", "https") and parsed.hostname and parsed.hostname.lower() == "example.com":
self.example_com_links.append(attr[1])
except Exception:
pass

parser = ExampleComLinkParser()
parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content))
Copilot is powered by AI and may make mistakes. Always verify output.

parser = ExampleComLinkParser()
parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content))
assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links)

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
http://example.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 months ago

To fix the issue, you should parse each URL using Python's urllib.parse.urlparse function and then inspect the .hostname attribute of the resulting object. Your test should then confirm that the hostname of each found URL is exactly example.com or, if the intention is to allow subdomains, that it ends with .example.com. This prevents various hostname tricks, such as example.com.attacker.com or example.com@evil.com, which would otherwise match a naive prefix check.

Specifically, in file examples/contrib/webscanner_helper/test_urlinjection.py, locate line 54:

assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links)

and replace it with:

from urllib.parse import urlparse
assert any(urlparse(url).hostname == "example.com" for url in parser.example_com_links)

Also, ensure from urllib.parse import urlparse is imported at the top of the file (after existing imports).


Suggested changeset 1
examples/contrib/webscanner_helper/test_urlinjection.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/examples/contrib/webscanner_helper/test_urlinjection.py b/examples/contrib/webscanner_helper/test_urlinjection.py
--- a/examples/contrib/webscanner_helper/test_urlinjection.py
+++ b/examples/contrib/webscanner_helper/test_urlinjection.py
@@ -1,6 +1,7 @@
 import json
 from unittest import mock
 from html.parser import HTMLParser
+from urllib.parse import urlparse
 from examples.contrib.webscanner_helper.urlinjection import HTMLInjection
 from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator
 from examples.contrib.webscanner_helper.urlinjection import logger
@@ -51,7 +52,7 @@
 
         parser = ExampleComLinkParser()
         parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content))
-        assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links)
+        assert any(urlparse(url).hostname == "example.com" for url in parser.example_com_links)
 
     def test_inject_insert_body(self):
         html_injection = HTMLInjection(insert=True)
EOF
@@ -1,6 +1,7 @@
import json
from unittest import mock
from html.parser import HTMLParser
from urllib.parse import urlparse
from examples.contrib.webscanner_helper.urlinjection import HTMLInjection
from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator
from examples.contrib.webscanner_helper.urlinjection import logger
@@ -51,7 +52,7 @@

parser = ExampleComLinkParser()
parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content))
assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links)
assert any(urlparse(url).hostname == "example.com" for url in parser.example_com_links)

def test_inject_insert_body(self):
html_injection = HTMLInjection(insert=True)
Copilot is powered by AI and may make mistakes. Always verify output.

parser = ExampleComLinkParser()
parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content))
assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links)

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
https://example.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 months ago

To fix this problem, the test should parse each URL using urllib.parse, extract the hostname, and check if it matches exactly example.com instead of relying on substring or prefix matches. This means replacing the startswith checks in line 54 with logic using urllib.parse.urlparse(url).hostname == "example.com". You will need to import the urllib.parse module. Only examples/contrib/webscanner_helper/test_urlinjection.py, line 54, needs changes, plus the required import.


Suggested changeset 1
examples/contrib/webscanner_helper/test_urlinjection.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/examples/contrib/webscanner_helper/test_urlinjection.py b/examples/contrib/webscanner_helper/test_urlinjection.py
--- a/examples/contrib/webscanner_helper/test_urlinjection.py
+++ b/examples/contrib/webscanner_helper/test_urlinjection.py
@@ -1,6 +1,7 @@
 import json
 from unittest import mock
 from html.parser import HTMLParser
+import urllib.parse
 from examples.contrib.webscanner_helper.urlinjection import HTMLInjection
 from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator
 from examples.contrib.webscanner_helper.urlinjection import logger
@@ -51,7 +52,7 @@
 
         parser = ExampleComLinkParser()
         parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content))
-        assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links)
+        assert any(urllib.parse.urlparse(url).hostname == "example.com" for url in parser.example_com_links)
 
     def test_inject_insert_body(self):
         html_injection = HTMLInjection(insert=True)
EOF
@@ -1,6 +1,7 @@
import json
from unittest import mock
from html.parser import HTMLParser
import urllib.parse
from examples.contrib.webscanner_helper.urlinjection import HTMLInjection
from examples.contrib.webscanner_helper.urlinjection import InjectionGenerator
from examples.contrib.webscanner_helper.urlinjection import logger
@@ -51,7 +52,7 @@

parser = ExampleComLinkParser()
parser.feed(f.response.text if hasattr(f.response, "text") else str(f.response.content))
assert any(url.startswith("http://example.com") or url.startswith("https://example.com") for url in parser.example_com_links)
assert any(urllib.parse.urlparse(url).hostname == "example.com" for url in parser.example_com_links)

def test_inject_insert_body(self):
html_injection = HTMLInjection(insert=True)
Copilot is powered by AI and may make mistakes. Always verify output.
@BetterMint BetterMint marked this pull request as ready for review December 15, 2025 01:39
@BetterMint BetterMint merged commit d10a380 into main Dec 15, 2025
11 of 46 checks passed
@BetterMint BetterMint deleted the alert-autofix-3 branch December 15, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant