-
-
Notifications
You must be signed in to change notification settings - Fork 629
Bugfix: path normalization applied to URLs #2224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
13e8cb4
to
748d229
Compare
The `-r` and `-c` path normalization logic did not check if the requested requirements or constraint files were remote files being loaded, e.g., via HTTPS. As a result, a usage like `-c https://example.com/constraints.txt` incorrectly got converted to a path and normalized, stripping one of the two slashes which separates the scheme from the netloc. A regression test is added which reproduces the bug, and the gap in detection is closed by checking explicitly if inputs can parse as URIs. Any given URI (including file URIs) will be exempted from the path rewrites. The check is implemented by calling `urllib.parse.urlparse()` and checking the `scheme` attribute of the parse result.
748d229
to
710e43a
Compare
This fix was not thought all the way through by me for the Windows context. |
This fixes handling of Windows paths where the drive-letter can be misinterpreted as a URI scheme. Matching the `pip` implementation details in this case is the simplest way to handle the inherent ambiguity.
e0aa450
to
56b7497
Compare
Thanks to @ichard26 for a useful pointer when I asked in pypa discord about how Specifically: I've updated to simply mirror that set of known schemes here, which nicely solves the problem of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though, I'm including a few minor/optional suggestions.
|
||
The test is performed by trying a URL parse and reading the scheme. | ||
""" | ||
scheme = urllib.parse.urlsplit(value).scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any cases where urlsplit()
would raise an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there are any (reasonably reachable) cases. pip
's usage doesn't take any effort to sanitize the inputs or check for errors. I looked at the definition and tried some strange inputs and didn't find any error cases (including obvious things like ""
) either by reading/deduction or by experimentation.
def fake_url_get(url): | ||
response = mock.Mock() | ||
response.reason = "Ok" | ||
response.status_code = 200 | ||
response.url = url | ||
response.text = "small-fake-a==0.2" | ||
return response | ||
|
||
mock_get = mock.Mock(side_effect=fake_url_get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, I'd reach the mocking API via mocker
(pytest-mock
). But it looks like it's not in the deps so we can postpone that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not very attached to this manner of mocking. It's not ideal but I don't think it's too terribly fragile or awkward.
I'd like to see if we can use responses
, since pip
is using a modified requests
session object for its requests. My experiences with responses
have been excellent, so I think that if it works, that would be great.
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>
Resolves #2223
The
-r
and-c
path normalization logic did not check if therequested requirements or constraint files were remote files being
loaded, e.g., via HTTPS. As a result, a usage like
-c https://example.com/constraints.txt
incorrectly got converted toa path and normalized, stripping one of the two slashes which
separates the scheme from the netloc.
A regression test is added which reproduces the bug, and the gap in
detection is closed by checking explicitly if inputs can parse as
URIs. Any given URI (including file URIs) will be exempted from the
path rewrites.
The check is implemented by calling
urllib.parse.urlparse()
andchecking the
scheme
attribute of the parse result.Contributor checklist
changelog.d/
(seechangelog.d/README.md
for instructions) or the PR text says "no changelog needed".Maintainer checklist
skip-changelog
label.