-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
gh-86809: Add support for HTTP Range header in HTTPServer #118949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I didn't use it, but LGTM. Except I would prefer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've just reviewed at the same time you marked some comments as outdated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments (I won't be available afterwards but I may comment tomorrow)
|
Thanks for the suggestions, I've made some changes based on them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaaand some final comments I think. Sorry to break down my reviews like that but I usually comment incrementally but I don't want people to be frustrated by my nitpicking (also, I don't see the point of reviewing something that could change after addressing other comments).
No worries, and thanks a lot for the detailed suggestions for my code! Your suggestions are very helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't have more comments to add (but as always, it's easy to miss something obvious when you've reviewd the same thing multiple times).
|
@arhadthedev Please review if possible. |
|
Ping! (I hope it's ok to ping every 6 months or so, especially if it seems to be so nearly done :) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I've got more insights on how CPython's workflow works, I'd like you to address the following changes.
Misc/NEWS.d/next/Library/2024-05-12-00-15-44.gh-issue-86809._5vdGa.rst
Outdated
Show resolved
Hide resolved
Additional work is needed (work that I wasn't aware of in May...)
… change function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed the What's New entry where using ~ would only show send_error and not BaseHTTPRequestHandler.send_error.
I'll let @vadmium decide when to merge after you've addressed that nit.
Co-authored-by: Bénédikt Tran <[email protected]>
|
@vadmium |
|
Oh... I suddenly realized that Python 3.14 has already reached beta freeze, but this PR hasn't been merged yet, should I change the target version number to 3.15? |
|
Yes, I'm sorry I forgot about it. |
|
No worries, I didn't notice that either. Let me change it tomorrow. |
|
Sorry for the thread bump: |
Since we forgot to merge it in before the 3.14 feature freeze, you'll only see it in Python 3.15 at the earliest (a little over a year from now).
Sorry I haven't changed it yet. I see that there are still a lot of TODOs in the 3.15 whatsnew file, and I'm worried that there will be conflicts soon in the future if I change it now. Maybe I'll wait for the 3.15 whatsnew file to have a little bit more content. |
|
Don't worry about the todos. Just add a new http (or http.client I don't remember where the code is) section under improved modules (it there isn't oe already). |
|
Are you still working on this one? |
Sorry for the delay, I went traveling for a while, and recently I've been busy with my full-time job. It looks like I just need to resolve some merge conflicts and migrate some documents, and I should be able to finish it in the next few days. |
|
@picnixz I think I've finished the docs work. Could you please take another look at it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but this PR always slips under my radar.
- Can you resolve the conflicts?
- Let's say I were to ask you to support multi-range. How much of code change do you expect? We first decided to only support single-range but maybe we should also support multi-range in order to have a complete "support" and to be sure that this interface will actually be compatible when supporting multi-range.
- At the same time, we had another PR about adding headers in the CLI (for supporting CORS) in #135057 and we still discuss the API. Since the above PR actually changes the signature, I'd like to synchronize the discussion on the parameter names for the additional headers. extra_header for
send_errorlooks good but the problem is that, if we were to have a "default" extra headers mapping, then we would be annoyed because of that. We should also have the same "defaults" and expected types to make both API consistent.
For that reason, I'd like to apologize as I will need to decide whether this one or the other PR will take precedence in the merge order.
On a side-note, I wonder if using range as a named parameter is actually a good idea... it kinda shadows a built-in that could be useful but I don't have a better name for it...
| return None | ||
|
|
||
| self.send_response(HTTPStatus.OK) | ||
| if self._range: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have multi-range, we'll need to do "multiple" passes so I think we can have a method that takes the range we're trying to parse first and then iterate over the parsed ranges:
self._ranges = self.parse_ranges()
...
for i, r in enumerate(self._ranges):
self._ranges[i] = self._adjust_range(r, fs)
...
self.send_response(HTTPStatus.PARTIAL_CONTENT)
# and write the correct multi-range here
| start, end = range | ||
| length = end - start + 1 | ||
| source.seek(start) | ||
| while length > 0: | ||
| buf = source.read(min(length, shutil.COPY_BUFSIZE)) | ||
| if not buf: | ||
| raise EOFError('File shrank after size was checked') | ||
| length -= len(buf) | ||
| outputfile.write(buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have a multi-range, this will be called for each of the range we constructed. So I think this part will need to be a private method:
def _copyfile(self, out, src, size):
...
and it will be used as
for start, end in ranges:
src.seek(start)
self._copyfile(self, out, src, end - start + 1)
However, for multi-ranges, we'll likely need to rename range into ranges and accept an iterable of "ranges". As such, it might be better to already accept such iterables and reject those that are not of length 1 for now.
Alternatively, and this could be perhaps better, we can implement copyfile only for a single range, and be responsible to call it for all range items for multi-range.
| if f: | ||
| try: | ||
| self.copyfile(f, self.wfile) | ||
| self.copyfile(f, self.wfile, range=self._range) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a multi-range, we would do:
for r in self._ranges:
self.copyfile(f, self.wfile, range=r)|
IMHO this module is not designed to be a complete http server. If a client is complex enough to request multiple ranges, it probably needs to handle that the server doesn't support it. |
Solved
In addition to the parse range header and response sending components you mentioned, we also need to change the response Content-Type to multipart/byteranges, specify a boundary, and use that boundary to separate multiple parts within the response. Moreover, to thoroughly and rigorously test this behavior, we may need numerous test cases covering various edge cases. Overall, I estimate the amount of changes will need to be doubled. I agree with @imba-tjd's point that this may be beyond the scope of this PR and maybe even beyond SimpleHttpServer.
I quickly took a look at the PR you mentioned, and it appears to add some additional custom headers at the class level. I think the The simplest approach is to sequentially send |
|
I came up with code to better handle multiple ranges. I just extended to regex to allow it: IMHO the server should accept multiple ranges. but it does not need to actually "handle" multiple ranges. it could just fallback to combining the ranges into 1 big range (as long as it's still valid). I understand that this is a "simple" http server. but it should at least try to follow the specifications where it can right? Simple and basic are different things |
What do you mean by |
|
I dont think the spec allows it. but |
|
There is this code from the I have not tried to see how this works, and this version does not use regex. But, it's at least validated against the web specs |
Add support for the HTTP
Rangeheader to SimpleHTTPServer to solve #86809