-
-
Notifications
You must be signed in to change notification settings - Fork 33.3k
gh-88786: Update uuid.uuid1() in uuid module to meet RFC4122
#124571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this feature may be merged as soon as possible, or before release, because this will cause many systems that use uuid1 to generate uuids that may not be unique. |
|
@rruuaanng the issue has been open for some time, and 3.13 is in rc2 and the final is due next week according to PEP 719 so I wouldn't expect this to be expedited |
Okay, Maybe it doesn't need to be handled immediately, but I think this PR is worth merging to resolve this lingering issue. |
Regarding the documentation, perhaps no changes are needed since users won't notice that the UUID has changed. |
|
Note that RFC 4122/9562 does NOT require the clock sequence to be incremented (emphasis mine):
RFC 4122 originally said something similar:
The Python implementation decided not to increment the clock sequence, probably to avoid a stable storage (as specified by the comment) and to keep a bit of unpredictability. As such, I'm not sure that this change should be done (this specific behaviour has been there for years). If you want to change this behaviour, tests must be added and this would only be a feature rather than a bug fix IMO. |
Oh, it seems I missed updating the current clock sequence with the result of the previous increment. |
|
And I don't think we need to add tests for this because there are already tests for uuid1. My changes just make it more random rather than changing the result format entirely. |
|
We still need tests. A change of implementation needs to be tested. I'm also worried about the fact that the counter is never reset. The counter should be reset at some point when you change the timestamp. (And I'm sorry but nothing is more random. It's actually more predictable since the clock sequence is incremented deterministically when possible). |
Hmmm, Maybe you're right, I'm thinking. |
But without making changes, it simply increments the timestamp, which makes no sense. timestamp = _last_timestamp + 1 |
|
@picnixz I have make change, please review! |
|
However, I remain skeptical about testing this, since there seems to be a check for uuid1 already. And my changes pass the existing test. The existing test includes a check on the duplication of large quantities of UUID. |
Without this change, there is a random clock sequence if you don't specify one so it's not an issue. The implementation IS compliant, it's just that it does not use all the suggestions from the RFC (some lines below, there is a
Yes but this doesn't check anything new. You MUST check that the counter is correctly incremented. In addition, it must be initialized to a RANDOM value (this is what the RFC states):
I will add a DO-NOT-MERGE label until all the interrogations have been answered. Note that regenerating the clock sequence for every non-specified clock sequence is fine and does not require all this. So I'm not convinced at all. |
For the suggestions mentioned above, I will make the following improvements:
|
There is a simple way to do it, namely mocking the interface. But honestly, I think all this change is really not needed and I am really not convinced by that. There is no issue with the current generation of the clock sequence. Even when you use the same timestamp, you will have different clock sequences so you will not have uniqueness issues (well, the entropy is only 14 bits so you can have collisions, but with a low probablity). Finally, UUIDv1 are historic and UUIDv6 should be used as a replacement now that RFC 9562 has been accepted. I do have a PR that implements v6 and I think v1 should not be used anymore nowawadays. If people are concerned about the uniqueness, then they should generate their own clock sequence (hence the reason why we allow them to pass the clock sequence explicitly). |
I agree with your point of view. Perhaps I just want to put an end to this legacy issue, as Jacob mentioned, this problem has been open for a long time. |
|
I almost forgot, I need to respond to your first point again.
Regarding the above, As mentioned in the RFC.
It's not that a new one is generated when the current clock sequence is empty, but rather that the current clock sequence being passed should be modified when the previous one does not exist. Regarding the tests, I think you are right. I am working on writing the relevant tests. Maybe you can help me remove the tags once they are completed. |
|
Although I’m reluctant, it seems that the existing So I think it seems that there is no need to add corresponding tests, because the original one does not implement this approach. |
Then this is an issue. The coverage is therefore insufficient. You can complete this test if you want but checking the clock sequence increment must be done in a test.
Not having tests does not mean that new behaviour should not be tested.
The RFC suggests to either increment the value or to always randomly generate a new clock sequence. The current implementation chooses this second alternative and thus is compliant with the standards.
I must disagree (again). The only issue I see with the current implementation is that the probabilty of having a collision within the same timestamp on the clock sequence is not that low (50% of chance of having a collision if you generate 128 UUIDs with the same timestamp). The implementation follows one recommendation, maybe not what the original author expected, yet it is a correct implementation. One could always reduce the possibility of a UUIDv1 collision when generating many UUIDs within the same timestamp but once the clock sequence reaches Example:
Since only the 14 lowest bits of the clock sequence are kept when generating the UUID, once So, the only reason why we would (but not necessarily should) change the current implementation is to avoid such issue. But UUIDv1 is historical and honestly no more recommended for production (UUIDv6 being the one being promoted (see #120650) or v4 if you want pure randomness). As such, I don't see the point of improving such legacy component (and for now, no one ever complained on this specific matter). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want tests. In addition, you need to take care of the existing code L694-696 which explicitly says that we do not use a stable storage.
| # if there is a previous clock sequence, then increment | ||
| # otherwise, the clock sequence will be regenerated | ||
| if _last_clock_seq is not None: | ||
| clock_seq = _last_clock_seq + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not replace this one if clock_seq has been given by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you're right about this. If the clock_seq goes past the max value for a long, it'll reset to 0. This means there's a chance it could clash with a UUID that was made when it was 0 (though for the clock to roll back, it would have to go back to the exact same time as that initial 0 moment).
|
I will close this PR. Thank @picnixz for your attention to this incident during this period. |
Uh oh!
There was an error while loading. Please reload this page.