- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
          GH-123599: url2pathname(): handle authority section in file URL
          #126844
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… on POSIX Adjust `urllib.request.url2pathname()` to parse the URL authority and path with `urlsplit()` on POSIX. If the authority is empty or resolves to the current host, it is ignored and the URL path is used as the pathname. If not, we raise `URLError`.
url2pathname(): handle non-empty authority section on POSIXurl2pathname(): handle non-empty authority section on POSIX
      url2pathname(): handle non-empty authority section on POSIXurl2pathname(): handle authority section on POSIX
      url2pathname(): handle authority section on POSIXurl2pathname(): handle authority section in file URL
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reviews for the price of one!
Co-authored-by: Adam Turner <[email protected]>
| Quick note on timing for this PR and #125866. If possible I'd like to land this PR in time for 3.14 beta 1, in ~6 weeks time. It would mean we restrict backwards-incompatible changes to 3.14, with none planned for 3.15. I think that would be better for users who might be affected by the changes - e.g. folks who previously wrapped the  I'll wait until 3.15 before I look at #125866 so I don't overload 3.14. That change will be 100% backwards-compatible. Happy to re-think if anyone is unhappy with this plan. Cheers | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes may be not so innocent:
- url2pathname()now performs network requests (and hang for a time).
- url2pathname()now can raise URLError.
- The result of gethostbyname_ex()andgethostname()is cached. Previously, you could reset this by settingFileHandler.names = None.
- There may be a difference in handling authorities with port. It is not covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some docs comments.
Co-authored-by: Bénédikt Tran <[email protected]>
| Thank you for the reviews! 
 That seems to be needed per RFC 8089 section 3 (ref) and section 2 less explicitly. And it only comes up if the hostname isn't empty or "localhost" Even so, perhaps we should put the new behaviour behind an argument like  
 I think this is preferable to returning a nonsense local path, e.g.  
 Fixed! 
 I've added the test cases you suggested. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
A
| Thank you both :) | 
| 
 | 
      
        
              This comment was marked as resolved.
        
        
      
    
  This comment was marked as resolved.
…RL (python#126844) In `urllib.request.url2pathname()`, if the authority resolves to the current host, discard it. If an authority is present but resolves somewhere else, then on Windows we return a UNC path (as before), and on other platforms we raise `URLError`. Affects `pathlib.Path.from_uri()` in the same way. Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
In
urllib.request.url2pathname(), if the authority resolves to the current host, discard it. If an authority is present but resolves somewhere else, then on Windows we return a UNC path (as before), and on other platforms we raiseURLError.Affects
pathlib.Path.from_uri()in the same way.I'm indebted to Eryk Sun for his analysis.
Path.from_uri()doesn't work if the URI contains host component #123599