Spurious slash added to my base_url, causing 404 errors #2663
Replies: 2 comments 2 replies
-
|
Confirmed that Is there any reason that httpx doesn't simply delegate to |
Beta Was this translation helpful? Give feedback.
-
Because as you mention... "urljoin()'s semantics are pretty unintuitive to me" The code comment in Lines 378 to 388 in df5dbc0 Either we need to...
There might be some more useful context if you dig into the code history I recall that the previous behaviour was "use |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is not good, because this URL is clearly not what I told the library to target. I told it to target 'db.php' and suddenly, as if by magic, it's targeting 'db.php/', which doesn't exist.
My actual code is async, but same result. When I run this on my real target, I get 404 errors naturally:
(And yes, I confirmed that when I discard
base_urland just pass the URL whole to everyget(), I do in fact get a proper response from the DB API -- the 404 is due exclusively to the spurious slash added by theClient.)I struggle to imagine that this is anything other than a bug. I only ever query this one and simple url, I don't want the backend trying to think it's smarter than I am at basic string processing.
My use case, as you might guess, is only and entirely interacting with this DB API. My URL will never, ever change unless the remote server API changes. (I'm trying to build an Python interface to the API and scripts atop that to interact with it. In principle, I want to enable users to have dozens or hundreds of queries in parallel, all to this one and only URL.)
In fact at first I was even a little bit surprised that
get()requiredurlat all, when I thought I should be able to setClient(base_url=blah)once and forget it forever more, but I got over that. But then I was terribly surprised to find that mybase_urlwas essentially worthless regardless, as I am effectively forced to just pass the actual, proper URL in everyget(), quite defeating the purpose ofbase_urlas far as I can see.And certainly the documentation doesn't mention any sort of backend processing of URLs, I've read it multiple times before even writing code, and a couple more times since. What I see there:
All of this documentation leads me to believe that the final URL will merely be
base_url+url, a simple string concatenation, with no attempt to second guess my/placements, but alas it appears the documentation misleads me.A brief search did point me to #846, which seems fairly related, and that points to #1139.
urljoin()'s semantics are pretty unintuitive to me (I put my slashes where I mean dammit, and I don't want any other code to implicitly add them willy nilly), but in any case the behavior I see here is compatible neither with the docs nor withurljoin(). And #1139 was apparently closed for "the docs appear adequate", which, well, no the heck they aren't. (To be fair, evenurljoin()'s docs are abysmal, I'm not actually sure what its semantics are supposed to be either, those docs don't actually say what the function does... incredible.)I'd be willing to make a pull request for the httpx docs on
base_url, if that meets with maintainer approval, but frankly I think some behavior needs to change here as well. (I was inclined to directly file an issue, but the repo directs that issues cannot be filed without first filing a discussion.)But hey what do I know. This is after all essentially my first foray into programmatic HTTP requests, so I'm very much a noob in this field... a very surprised noob
Edit: I see now also #843, and I can say that had that been accepted, it would have saved me the bother of writing this discussion post... also the relevant RFC absolutely should be mentioned, as that too would have saved me some trouble. I see now that that's what
urljoin()is supposed to implement, altho again, theurljoin()docs should mention the RFC as much as the httpx docs. The relevant section of the RFC: https://www.rfc-editor.org/rfc/rfc3986#section-5.4.1Actually, reading that now, that specifies that an empty string appended to an exisisting URL preserves the URL as-is, so the httpx behavior breaks the specification (presuming that it's intended to meet the specification, which the docs don't say one way or the other).
Beta Was this translation helpful? Give feedback.
All reactions