Skip to content

Conversation

sirosen
Copy link
Member

@sirosen sirosen commented Aug 4, 2025

Resolves #2223

The -r and -c path normalization logic did not check if the
requested requirements or constraint files were remote files being
loaded, e.g., via HTTPS. As a result, a usage like
-c https://example.com/constraints.txt incorrectly got converted to
a path and normalized, stripping one of the two slashes which
separates the scheme from the netloc.

A regression test is added which reproduces the bug, and the gap in
detection is closed by checking explicitly if inputs can parse as
URIs. Any given URI (including file URIs) will be exempted from the
path rewrites.

The check is implemented by calling urllib.parse.urlparse() and
checking the scheme attribute of the parse result.

Contributor checklist
  • Included tests for the changes.
  • A change note is created in changelog.d/ (see changelog.d/README.md for instructions) or the PR text says "no changelog needed".
Maintainer checklist
  • If no changelog is needed, apply the skip-changelog label.
  • Assign the PR to an existing or new milestone for the target version (following Semantic Versioning).

@sirosen sirosen added this to the 7.5.1 milestone Aug 4, 2025
@sirosen sirosen force-pushed the fix-url-normalization branch from 13e8cb4 to 748d229 Compare August 4, 2025 23:09
The `-r` and `-c` path normalization logic did not check if the
requested requirements or constraint files were remote files being
loaded, e.g., via HTTPS. As a result, a usage like
`-c https://example.com/constraints.txt` incorrectly got converted to
a path and normalized, stripping one of the two slashes which
separates the scheme from the netloc.

A regression test is added which reproduces the bug, and the gap in
detection is closed by checking explicitly if inputs can parse as
URIs. Any given URI (including file URIs) will be exempted from the
path rewrites.

The check is implemented by calling `urllib.parse.urlparse()` and
checking the `scheme` attribute of the parse result.
@sirosen sirosen force-pushed the fix-url-normalization branch from 748d229 to 710e43a Compare August 4, 2025 23:12
@sirosen
Copy link
Member Author

sirosen commented Aug 4, 2025

This fix was not thought all the way through by me for the Windows context. urllib.parse.urlparse isn't going to cut it, since C:/foo parses to a "url" with a scheme of "c". I think the rest of the framework for the fix is right, but more sophisticated detection is needed.

This fixes handling of Windows paths where the drive-letter can be
misinterpreted as a URI scheme.

Matching the `pip` implementation details in this case is the simplest
way to handle the inherent ambiguity.
@sirosen sirosen force-pushed the fix-url-normalization branch from e0aa450 to 56b7497 Compare August 5, 2025 22:15
@sirosen
Copy link
Member Author

sirosen commented Aug 5, 2025

Thanks to @ichard26 for a useful pointer when I asked in pypa discord about how pip treats these URIs!

Specifically:
https://github.com/pypa/pip/blob/20b39ec104b94181c01731192d03cbe0d1f9f3a7/src/pip/_internal/req/req_file.py#L564-L591

I've updated to simply mirror that set of known schemes here, which nicely solves the problem of C:/foo/requirements.txt looking like a URL with scheme = "C".

@webknjaz webknjaz moved this to 🧐 @webknjaz's review queue 📋 in 📅 Procrastinating in public Aug 7, 2025
Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though, I'm including a few minor/optional suggestions.


The test is performed by trying a URL parse and reading the scheme.
"""
scheme = urllib.parse.urlsplit(value).scheme
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any cases where urlsplit() would raise an exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there are any (reasonably reachable) cases. pip's usage doesn't take any effort to sanitize the inputs or check for errors. I looked at the definition and tried some strange inputs and didn't find any error cases (including obvious things like "") either by reading/deduction or by experimentation.

Comment on lines +4085 to +4093
def fake_url_get(url):
response = mock.Mock()
response.reason = "Ok"
response.status_code = 200
response.url = url
response.text = "small-fake-a==0.2"
return response

mock_get = mock.Mock(side_effect=fake_url_get)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, I'd reach the mocking API via mocker (pytest-mock). But it looks like it's not in the deps so we can postpone that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not very attached to this manner of mocking. It's not ideal but I don't think it's too terribly fragile or awkward.

I'd like to see if we can use responses, since pip is using a modified requests session object for its requests. My experiences with responses have been excellent, so I think that if it works, that would be great.

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>
@sirosen sirosen added this pull request to the merge queue Aug 13, 2025
Merged via the queue into jazzband:main with commit 78f18e1 Aug 13, 2025
43 checks passed
@github-project-automation github-project-automation bot moved this from 🧐 @webknjaz's review queue 📋 to 🌈 Done 🦄 in 📅 Procrastinating in public Aug 13, 2025
jayaddison added a commit to openculinary/pip-tools that referenced this pull request Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip-tools 7.5.0 strips a slash from constraint URLs
2 participants