Skip to content

Conversation

@lyc8503
Copy link
Contributor

@lyc8503 lyc8503 commented May 11, 2024

Add support for the HTTP Range header to SimpleHTTPServer to solve #86809

@imba-tjd
Copy link
Contributor

imba-tjd commented May 23, 2024

I didn't use it, but LGTM. Except I would prefer parse_range than get_range.

@lyc8503 lyc8503 requested a review from picnixz May 23, 2024 14:30
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've just reviewed at the same time you marked some comments as outdated)

@lyc8503 lyc8503 requested a review from picnixz May 23, 2024 15:36
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments (I won't be available afterwards but I may comment tomorrow)

@lyc8503
Copy link
Contributor Author

lyc8503 commented May 23, 2024

Thanks for the suggestions, I've made some changes based on them.

@lyc8503 lyc8503 requested a review from picnixz May 23, 2024 16:37
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaand some final comments I think. Sorry to break down my reviews like that but I usually comment incrementally but I don't want people to be frustrated by my nitpicking (also, I don't see the point of reviewing something that could change after addressing other comments).

@lyc8503
Copy link
Contributor Author

lyc8503 commented May 24, 2024

Aaaand some final comments I think. Sorry to break down my reviews like that but I usually comment incrementally but I don't want people to be frustrated by my nitpicking (also, I don't see the point of reviewing something that could change after addressing other comments).

No worries, and thanks a lot for the detailed suggestions for my code! Your suggestions are very helpful.

@lyc8503 lyc8503 requested a review from picnixz May 24, 2024 07:40
picnixz
picnixz previously approved these changes May 24, 2024
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't have more comments to add (but as always, it's easy to miss something obvious when you've reviewd the same thing multiple times).

@hondogitsune
Copy link

hondogitsune commented Jun 28, 2024

@arhadthedev
@JelleZijlstra
This would close a nearly 4 year old issue and enhance cpython with one of the most basic features of HTTP servers, range requests.

Please review if possible.

@JCash
Copy link

JCash commented Dec 8, 2024

Ping! (I hope it's ok to ping every 6 months or so, especially if it seems to be so nearly done :) )

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I've got more insights on how CPython's workflow works, I'd like you to address the following changes.

@picnixz picnixz dismissed their stale review December 8, 2024 23:37

Additional work is needed (work that I wasn't aware of in May...)

@ghost
Copy link

ghost commented Dec 15, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed the What's New entry where using ~ would only show send_error and not BaseHTTPRequestHandler.send_error.

I'll let @vadmium decide when to merge after you've addressed that nit.

@spacesynth
Copy link

spacesynth commented Feb 6, 2025

@vadmium
Please forgive me for pinging, you are probably swamped in work! If you have time to look at this <200 LoC PR any time I'd be giggling in glee!

@lyc8503
Copy link
Contributor Author

lyc8503 commented May 9, 2025

Oh... I suddenly realized that Python 3.14 has already reached beta freeze, but this PR hasn't been merged yet, should I change the target version number to 3.15?

@picnixz
Copy link
Member

picnixz commented May 9, 2025

Yes, I'm sorry I forgot about it.

@lyc8503
Copy link
Contributor Author

lyc8503 commented May 9, 2025

No worries, I didn't notice that either. Let me change it tomorrow.

@picnixz picnixz self-requested a review May 26, 2025 23:22
@hondogitsune
Copy link

hondogitsune commented May 27, 2025

Sorry for the thread bump:
What future Python version is being targeted at the moment for the release of this good feature?

@lyc8503
Copy link
Contributor Author

lyc8503 commented May 28, 2025

What future Python version is being targeted at the moment for the release of this good feature?

Since we forgot to merge it in before the 3.14 feature freeze, you'll only see it in Python 3.15 at the earliest (a little over a year from now).

Let me change it tomorrow.

Sorry I haven't changed it yet. I see that there are still a lot of TODOs in the 3.15 whatsnew file, and I'm worried that there will be conflicts soon in the future if I change it now. Maybe I'll wait for the 3.15 whatsnew file to have a little bit more content.

@picnixz
Copy link
Member

picnixz commented May 30, 2025

Don't worry about the todos. Just add a new http (or http.client I don't remember where the code is) section under improved modules (it there isn't oe already).

@picnixz
Copy link
Member

picnixz commented Aug 2, 2025

Are you still working on this one?

@picnixz picnixz removed their request for review August 2, 2025 11:31
@lyc8503
Copy link
Contributor Author

lyc8503 commented Aug 2, 2025

Are you still working on this one?

Sorry for the delay, I went traveling for a while, and recently I've been busy with my full-time job. It looks like I just need to resolve some merge conflicts and migrate some documents, and I should be able to finish it in the next few days.

@lyc8503 lyc8503 requested a review from AA-Turner as a code owner August 7, 2025 02:46
@lyc8503
Copy link
Contributor Author

lyc8503 commented Aug 7, 2025

@picnixz I think I've finished the docs work. Could you please take another look at it?

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but this PR always slips under my radar.

  • Can you resolve the conflicts?
  • Let's say I were to ask you to support multi-range. How much of code change do you expect? We first decided to only support single-range but maybe we should also support multi-range in order to have a complete "support" and to be sure that this interface will actually be compatible when supporting multi-range.
  • At the same time, we had another PR about adding headers in the CLI (for supporting CORS) in #135057 and we still discuss the API. Since the above PR actually changes the signature, I'd like to synchronize the discussion on the parameter names for the additional headers. extra_header for send_error looks good but the problem is that, if we were to have a "default" extra headers mapping, then we would be annoyed because of that. We should also have the same "defaults" and expected types to make both API consistent.

For that reason, I'd like to apologize as I will need to decide whether this one or the other PR will take precedence in the merge order.


On a side-note, I wonder if using range as a named parameter is actually a good idea... it kinda shadows a built-in that could be useful but I don't have a better name for it...

return None

self.send_response(HTTPStatus.OK)
if self._range:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have multi-range, we'll need to do "multiple" passes so I think we can have a method that takes the range we're trying to parse first and then iterate over the parsed ranges:

self._ranges = self.parse_ranges()
...
for i, r in enumerate(self._ranges):
    self._ranges[i] = self._adjust_range(r, fs)
...
self.send_response(HTTPStatus.PARTIAL_CONTENT)
# and write the correct multi-range here

Comment on lines +955 to +963
start, end = range
length = end - start + 1
source.seek(start)
while length > 0:
buf = source.read(min(length, shutil.COPY_BUFSIZE))
if not buf:
raise EOFError('File shrank after size was checked')
length -= len(buf)
outputfile.write(buf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have a multi-range, this will be called for each of the range we constructed. So I think this part will need to be a private method:

def _copyfile(self, out, src, size):
    ...

and it will be used as

for start, end in ranges:
    src.seek(start)
    self._copyfile(self, out, src, end - start + 1)

However, for multi-ranges, we'll likely need to rename range into ranges and accept an iterable of "ranges". As such, it might be better to already accept such iterables and reject those that are not of length 1 for now.


Alternatively, and this could be perhaps better, we can implement copyfile only for a single range, and be responsible to call it for all range items for multi-range.

if f:
try:
self.copyfile(f, self.wfile)
self.copyfile(f, self.wfile, range=self._range)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a multi-range, we would do:

for r in self._ranges:
    self.copyfile(f, self.wfile, range=r)

@imba-tjd
Copy link
Contributor

IMHO this module is not designed to be a complete http server. If a client is complex enough to request multiple ranges, it probably needs to handle that the server doesn't support it.

@lyc8503
Copy link
Contributor Author

lyc8503 commented Sep 25, 2025

Can you resolve the conflicts?

Solved

Let's say I were to ask you to support multi-range. How much of code change do you expect?

In addition to the parse range header and response sending components you mentioned, we also need to change the response Content-Type to multipart/byteranges, specify a boundary, and use that boundary to separate multiple parts within the response. Moreover, to thoroughly and rigorously test this behavior, we may need numerous test cases covering various edge cases.

Overall, I estimate the amount of changes will need to be doubled. I agree with @imba-tjd's point that this may be beyond the scope of this PR and maybe even beyond SimpleHttpServer.

We should also have the same "defaults" and expected types to make both API consistent.

I quickly took a look at the PR you mentioned, and it appears to add some additional custom headers at the class level. I think the extra_headers parameter for send_error is still necessary, as we only need to send these headers additionally during this specific error.

The simplest approach is to sequentially send extra_headers and response_headers within send_error. I have to admit this does appear to be a bit messy though. Do you have better ideas on how to handle both the extra headers in error scenarios and custom headers at the class level?

@sakgoyal
Copy link

sakgoyal commented Sep 25, 2025

I came up with code to better handle multiple ranges. I just extended to regex to allow it:
https://gist.github.com/sakgoyal/813339e39bd7b33338975976790f84d0

IMHO the server should accept multiple ranges. but it does not need to actually "handle" multiple ranges. it could just fallback to combining the ranges into 1 big range (as long as it's still valid).

I understand that this is a "simple" http server. but it should at least try to follow the specifications where it can right?

Simple and basic are different things

basic: minimal features
simple: easy to use

@lyc8503
Copy link
Contributor Author

lyc8503 commented Sep 25, 2025

I think the server should accept multiple ranges. but it does not need to actually "handle" multiple ranges. it could just fallback to combining the ranges into 1 big range (as long as it's still valid). that way the server will accept multi-range, but not have to handle the complexities of multiple ranges.

What do you mean by combining the ranges into 1 big range? When a client explicitly requests multiple parts (e.g. 200-999, 2000-2499, 9500- in your gist), how should they be merged?

@sakgoyal
Copy link

sakgoyal commented Sep 25, 2025

I dont think the spec allows it. but
200-999, 2000-2499, 9500- -> 200-
200-999, 2000-2499 -> 200-2499

@sakgoyal
Copy link

There is this code from the wpt repository:
https://github.com/web-platform-tests/wpt/blob/master/tools/wptserve/wptserve/ranges.py

I have not tried to see how this works, and this version does not use regex. But, it's at least validated against the web specs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants