Skip to content

Conversation

@orange-guo
Copy link

@orange-guo orange-guo commented Sep 24, 2025

User description

Branch names with special characters like '#' caused broken URLs because they weren't encoded correctly. This often happens when branch names come from GitLab issues (e.g., 'project#456'), since the hash symbol is mistakenly seen as a URL fragment.

This change uses urllib.parse.quote() to encode the branch name component when generating links. This ensures special characters are correctly handled (e.g., '#test' becomes '%23test'), making the URLs robust.

The fix has been applied to the link generation logic for the following providers:

  • GitLab
  • GitHub
  • Gitea
  • Bitbucket
  • Bitbucket Server

PR Type

Bug fix


Description

  • URL-encode branch names in git provider links

  • Fix broken URLs with special characters like '#'

  • Apply fix across GitLab, GitHub, Gitea, Bitbucket providers

  • Use urllib.parse.quote() for proper encoding


Diagram Walkthrough

flowchart LR
  A["Branch names with special chars"] --> B["urllib.parse.quote()"]
  B --> C["Encoded URLs"]
  C --> D["GitLab links"]
  C --> E["GitHub links"] 
  C --> F["Gitea links"]
  C --> G["Bitbucket links"]
Loading

File Walkthrough

Relevant files
Bug fix
bitbucket_provider.py
Add URL encoding for Bitbucket branch names                           

pr_agent/git_providers/bitbucket_provider.py

  • Import quote from urllib.parse
  • Apply URL encoding to branch name in get_canonical_url_parts()
+2/-2     
bitbucket_server_provider.py
Add URL encoding for Bitbucket Server branch names             

pr_agent/git_providers/bitbucket_server_provider.py

  • Import quote from urllib.parse
  • Apply URL encoding to branch name in query parameter
+2/-2     
gitea_provider.py
Add URL encoding for Gitea branch names                                   

pr_agent/git_providers/gitea_provider.py

  • Import quote from urllib.parse
  • Apply URL encoding to branch names in get_line_link() method
  • Fix encoding for all three link generation scenarios
+4/-4     
github_provider.py
Add URL encoding for GitHub branch names                                 

pr_agent/git_providers/github_provider.py

  • Import quote from urllib.parse
  • Apply URL encoding to branch name in get_canonical_url_parts()
+2/-2     
gitlab_provider.py
Add URL encoding for GitLab branch names                                 

pr_agent/git_providers/gitlab_provider.py

  • Import quote from urllib.parse
  • Apply URL encoding to branch names in get_canonical_url_parts()
  • Fix encoding in get_line_link() and
    generate_link_to_relevant_line_number()
+6/-6     

Branch names with special characters like '#' caused broken URLs because they weren't encoded correctly. This often happens when branch names come from GitLab issues (e.g., 'project#456'), since the hash symbol is mistakenly seen as a URL fragment.

This change uses `urllib.parse.quote()` to encode the branch name component when generating links. This ensures special characters are correctly handled (e.g., '#test' becomes '%23test'), making the URLs robust.

The fix has been applied to the link generation logic for the following providers:
- GitLab
- GitHub
- Gitea
- Bitbucket
- Bitbucket Server
@qodo-merge-for-open-source
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

456 - Partially compliant

Compliant requirements:

Non-compliant requirements:

  • Investigate and resolve "POST / HTTP/1.1 405 Method Not Allowed" when running as a GitHub App.
  • Ensure webhook endpoint/method configuration works correctly for GitHub App deployments.
  • Provide a fix so the app responds with the correct HTTP method handling on the root or configured path.
  • Validate that deployment no longer returns 405 for valid GitHub webhook requests.

Requires further human verification:

  • Confirm via deployment logs and webhook delivery dashboard that 405 no longer occurs.
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
✅ No TODO sections
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible None branch

If desired_branch is None (e.g., when repo_git_url is passed without a branch), quoting None will raise; ensure desired_branch is always set or defaulted before quote().

    desired_branch = self.get_repo_default_branch()
    parsed_pr_url = urlparse(self.pr_url)
    scheme_and_netloc = parsed_pr_url.scheme + "://" + parsed_pr_url.netloc
    workspace_name, project_name = (self.workspace_slug, self.repo_slug)
prefix = f"{scheme_and_netloc}/{workspace_name}/{project_name}/src/{quote(desired_branch, safe='')}"
suffix = "" #None
return (prefix, suffix)
Unused imports

quote_plus is imported but not used; consider removing to keep imports clean.

import difflib
import re

from packaging.version import parse as parse_version
from typing import Optional, Tuple
from urllib.parse import quote_plus, urlparse, quote

from atlassian.bitbucket import Bitbucket
from requests.exceptions import HTTPError
import shlex
URL encoding of path segments

Only branch names are encoded; if repo_path can contain URL-unsafe chars (e.g., nested groups with spaces), consider ensuring repo_path is already canonical/encoded elsewhere.

def get_canonical_url_parts(self, repo_git_url:str=None, desired_branch:str=None) -> Tuple[str, str]:
    repo_path = ""
    if not repo_git_url and not self.pr_url:
        get_logger().error("Cannot get canonical URL parts: missing either context PR URL or a repo GIT URL")
        return ("", "")
    if not repo_git_url: #Use PR url as context
        repo_path = self._get_project_path_from_pr_or_issue_url(self.pr_url)
        try:
            desired_branch = self.gl.projects.get(self.id_project).default_branch
        except Exception as e:
            get_logger().exception(f"Cannot get PR: {self.pr_url} default branch. Tried project ID: {self.id_project}")
            return ("", "")
    else: #Use repo git url
        repo_path = repo_git_url.split('.git')[0].split('.com/')[-1]
    prefix = f"{self.gitlab_url}/{repo_path}/-/blob/{quote(desired_branch, safe='')}"
    suffix = "?ref_type=heads"  # gitlab cloud adds this suffix. gitlab server does not, but it is harmless.
    return (prefix, suffix)

@qodo-merge-for-open-source
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
URL encoding might break branch-name paths

The use of quote(branch_name, safe='') incorrectly encodes slashes in branch
names, which will break URLs. Removing safe='' will fix this by using the
default behavior of quote, which preserves slashes.

Examples:

pr_agent/git_providers/gitea_provider.py [509-513]
            link = f"{self.base_url}/{self.owner}/{self.repo}/src/branch/{quote(self.get_pr_branch(), safe='')}/{relevant_file}"
        elif relevant_line_end:
            link = f"{self.base_url}/{self.owner}/{self.repo}/src/branch/{quote(self.get_pr_branch(), safe='')}/{relevant_file}#L{relevant_line_start}-L{relevant_line_end}"
        else:
            link = f"{self.base_url}/{self.owner}/{self.repo}/src/branch/{quote(self.get_pr_branch(), safe='')}/{relevant_file}#L{relevant_line_start}"
pr_agent/git_providers/github_provider.py [141]
        prefix = f"{scheme_and_netloc}/{owner}/{repo}/blob/{quote(desired_branch, safe='')}"

Solution Walkthrough:

Before:

from urllib.parse import quote

def get_line_link(branch_name, file_path):
    # Slashes in branch_name are encoded, e.g., 'feature/foo' -> 'feature%2Ffoo'
    # This breaks the URL path structure for many git providers.
    encoded_branch = quote(branch_name, safe='')
    return f"https://gitserver.com/repo/src/branch/{encoded_branch}/{file_path}"

def get_canonical_url_parts(branch_name):
    encoded_branch = quote(branch_name, safe='')
    prefix = f"https://gitserver.com/repo/blob/{encoded_branch}"
    return (prefix, "")

After:

from urllib.parse import quote

def get_line_link(branch_name, file_path):
    # Slashes are not encoded, preserving path structure.
    # Other special characters like '#' are still encoded.
    encoded_branch = quote(branch_name) # Uses default safe='/'
    return f"https://gitserver.com/repo/src/branch/{encoded_branch}/{file_path}"

def get_canonical_url_parts(branch_name):
    encoded_branch = quote(branch_name) # Uses default safe='/'
    prefix = f"https://gitserver.com/repo/blob/{encoded_branch}"
    return (prefix, "")
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical flaw where encoding slashes in branch names with quote(..., safe='') will break URLs for branches with path-like names (e.g., feature/name), which is a common convention.

High
Learned
best practice
Guard missing branch names

Guard against None or empty desired_branch before quoting to avoid generating
malformed URLs. Provide a safe default or return early with logging.

pr_agent/git_providers/bitbucket_provider.py [118]

+if not desired_branch:
+    get_logger().error("Missing desired_branch for canonical URL")
+    return ("", "")
 prefix = f"{scheme_and_netloc}/{workspace_name}/{project_name}/src/{quote(desired_branch, safe='')}"
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Add defensive null checks for nested data and optional parameters when constructing URLs (Pattern 3).

Low
Validate branch before quoting

Validate desired_branch before quoting and building the query to prevent empty
or None values producing invalid links.

pr_agent/git_providers/bitbucket_server_provider.py [84]

+if not desired_branch:
+    get_logger().error("Missing desired_branch for Bitbucket Server canonical URL")
+    return ("", "")
 suffix = f"?at=refs%2Fheads%2F{quote(desired_branch, safe='')}"
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why:
Relevant best practice - Add defensive null checks for nested data and optional parameters when constructing URLs (Pattern 3).

Low
  • More
  • Author self-review: I have reviewed the PR code suggestions, and addressed the relevant ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant