-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Implement --uploaded-prior-to
to filter packages by upload-time
#13520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3ce5dcf
to
d5adbda
Compare
I've made some changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my first review pass. I'll need to think about the larger design more, but I like where this is going. Thank you for working on this!
upload_time: datetime.datetime | None | ||
if upload_time_data := file_data.get("upload-time"): | ||
upload_time = parse_iso_datetime(upload_time_data) | ||
else: | ||
upload_time = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that the JSON version of the Simple API is used in the vast majority of installs, but we should probably support this feature with the HTML Simple API if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature doesn't exist in the HTML version of the API, it only exists as addition JSON fields in the spec: https://peps.python.org/pep-0700/#specification
For the HTML version of the API, there is no change from version 1.0. For the JSON version of the API, the following changes are made:
IMO this was a short sighted choice by the spec authors, as it has prevented certain simple index libraries from supporting this feature. But we can't change the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks for the information. /me wonders if it'd be useful to propose a v1.1 of the HTML spec to bring it to feature parity with the JSON API, but I don't have the time for that, haha.
("2023-01-01T00:00:00", (2023, 1, 1, 0, 0, 0)), | ||
("2023-12-31T23:59:59", (2023, 12, 31, 23, 59, 59)), | ||
# Test date only (will be extended to midnight) | ||
("2023-01-01", (2023, 1, 1, 0, 0, 0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but this means if I specify --exclude-newer-than 2025-08-14
(aka today), then pip will exclude candidates published on August 14, right? That behaves opposite to what I'd expect by the name --exclude-newer-than <day>
-> exclude anything published after this day.
Should this not be extended to (2023, 1, 2, 0, 0, 0)
or (2023, 1, 1, 23, 59, 59)
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datetime 2025-08-14 is equivalent so 2025-08-14 00:00:00 in the same way the float 1 is equivalent to the float 1.000.
To me your suggestion is the same as saying "not larger than 1" and allowing 1.1 because it is less than 2.
I agree though on the boundary "exclude newer than 2025-08-14" should allow 2025-08-14 00:00:00, I'll take a look at that.
Happy to accept a better phrase than "exclude newer than", but no one seems to have been confused by uv's "exclude newer"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me your suggestion is the same as saying "not larger than 1" and allowing 1.1 because it is less than 2.
This is personal, but I don't read --exclude-newer-than
like that. I read it like "ignore any packages published after <date/time>". exclude
means ignore and than
is exclusive (as is, it won't ignore... double negatives are hard).
The difference with uv's --exclude-newer
is that the word "than" is missing, thus it's ambiguous whether it's inclusive or exclusive. I do agree that uv's option name looks too much like a boolean flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my explanation doesn't make sense. I agree with @ofek's opinion on the inclusive/exclusive question: #12717 (comment)
For better or worse, I do think it's important that we match uv in semantics. Otherwise, the two tools will function subtly differently which is bound to become a footgun. I guess that means I am most in favour of --uploaded-before
. Fun 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good suggestion or strong option on the naming or semantics of the opinion, and am open to alternatives.
But I am strongly against mutating the user input or changing the logic to anything other than package datetime <= user datetime
or exlusive <
. It might not be semantically clear at first but it is logically simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to think on the additional feedback given and any feedback over the next couple of days and hold off from changing the name in this PR for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking hard about this and I am still against this inclusive "date" comparison approach, one problem that demonstrates its complexity is time zones:
- If I do implicit type conversion and parse the user input as a date and then naively take the date part of the upload-time then this creates wrong values, as the user date and the UTC date will not exactly align unless the user happens to be in UTC
- If I follow uv's example and add 1 day to the user datetime and then do a comparison with the upload-time that's precarious because, IMO, Python's datetime library design is not easy to understand where as uv gets to follow jiff, which is a lot more intuitive
- Which leaves me following @pfmoore's suggestion where I tack on
23:59:59
to the user input (reading the spec I think to be technically correct it would be23:59:59.999999
) and I can not bring myself to mutate user input in that way, for a few reasons, but the biggest one is having this kind of "smart" comparison (like Python package versions and pre-releases) creates hard to follow logical situations
I am therefore going to use an exclusive comparison, and then this issue of date vs. datetime goes away. Everyone agrees that "uploaded prior to 2025-08-14" is the same as "uploaded prior to 2025-08-14 00:00:00".
Though I appreciate that:
- Everyone else seems to disagree with my viewpoint about inclusive operators
- This breaks semantics with uv's option
If this causes this PR to be rejected that's fine, I won't block another PR if someone wants to follow the inclusive comparison using the semantics everyone else seems to agree on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, does this mean you will change the option name to --uploaded-prior-to
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If we're going to have different semantics a different name seems better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, does this mean you will change the option name to
--uploaded-prior-to
?
Yes, sorry, meant to do it last night but ran out of time to work on stuff, commit landed now.
Co-authored-by: Richard Si <[email protected]>
eccebcd
to
6ff91a4
Compare
9f528c7
to
c7877f2
Compare
c7877f2
to
841ae12
Compare
1b46347
to
1e44c45
Compare
--uploaded-prior-to
to filter packages by upload-time
1e44c45
to
e592e95
Compare
I believe I have addressed all outstanding review items, I also:
|
Closes #6257
Supplants #12717
Changes from the previous PR:
upload-before
->exclude-newer-than
uploaded-prior-to
Uses UTC by default so that is reproducible across environments