Skip to content

Conversation

@chinyeungli
Copy link
Contributor

Issue: #1763

Create a pipeline for Maven package

@chinyeungli chinyeungli requested a review from tdruez November 13, 2025 10:40
Signed-off-by: Chin Yeung Li <[email protected]>
input_source_url = input_source.get("download_url", "")

parsed_url = urlparse(input_source_url)
if input_source_url and parsed_url.netloc.endswith("maven.org"):

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
maven.org
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 days ago

To fix the problem, the code should validate the URL host more strictly by parsing the host and ensuring it matches a whitelist of allowed hosts for Maven repositories. The safest approach is to define a set of permitted Maven repository hostnames (such as repo1.maven.org or search.maven.org) and check the parsed hostname against this whitelist. This change should be made in the relevant block in the get_pom_url_list function (lines 594–601). Additional context: ensure case-insensitive comparison and that the check only acts on the hostname portion (already parsed as parsed_url.netloc or parsed_url.hostname). You may need to replace .endswith("maven.org") with an exact match or a match on known subdomains (e.g., using in with a whitelist).

In terms of implementation, you may need to:

  • Define an allowed hostnames list (e.g., {"repo1.maven.org", "search.maven.org"}).
  • Use parsed_url.hostname for comparison.
  • Do a strict compare (in allowed_hosts), rather than a substring or suffix check.
  • No additional imports are needed.

Edit only the code shown in file scanpipe/pipes/resolve.py, lines 575–613.

Suggested changeset 1
scanpipe/pipes/resolve.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scanpipe/pipes/resolve.py b/scanpipe/pipes/resolve.py
--- a/scanpipe/pipes/resolve.py
+++ b/scanpipe/pipes/resolve.py
@@ -592,7 +592,8 @@
         input_source_url = input_source.get("download_url", "")
 
         parsed_url = urlparse(input_source_url)
-        if input_source_url and parsed_url.netloc.endswith("maven.org"):
+        allowed_hosts = {"repo1.maven.org", "search.maven.org"}
+        if input_source_url and parsed_url.hostname and parsed_url.hostname.lower() in allowed_hosts:
             base_url = input_source_url.rsplit("/", 1)[0]
             pom_url = (
                 base_url + "/" + "-".join(base_url.rstrip("/").split("/")[-2:]) + ".pom"
EOF
@@ -592,7 +592,8 @@
input_source_url = input_source.get("download_url", "")

parsed_url = urlparse(input_source_url)
if input_source_url and parsed_url.netloc.endswith("maven.org"):
allowed_hosts = {"repo1.maven.org", "search.maven.org"}
if input_source_url and parsed_url.hostname and parsed_url.hostname.lower() in allowed_hosts:
base_url = input_source_url.rsplit("/", 1)[0]
pom_url = (
base_url + "/" + "-".join(base_url.rstrip("/").split("/")[-2:]) + ".pom"
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants