gh-88786: Update `uuid.uuid1()` in `uuid` module to meet RFC4122 #124571

rruuaanng · 2024-09-26T02:51:42Z

Issue: Force UUIDv1 to use a clock sequence storage (RFC 4122) #88786

rruuaanng · 2024-09-26T03:12:46Z

I think this feature may be merged as soon as possible, or before release, because this will cause many systems that use uuid1 to generate uuids that may not be unique.

JacobCoffee · 2024-09-26T03:40:55Z

@rruuaanng the issue has been open for some time, and 3.13 is in rc2 and the final is due next week according to PEP 719 so I wouldn't expect this to be expedited

rruuaanng · 2024-09-26T04:14:21Z

@rruuaanng the issue has been open for some time, and 3.13 is in rc2 and the final is due next week according to PEP 719 so I wouldn't expect this to be expedited

Okay, Maybe it doesn't need to be handled immediately, but I think this PR is worth merging to resolve this lingering issue.

rruuaanng · 2024-09-26T04:15:22Z

would we want an entry for .. versionchanged in Doc/library/uuid.rst? Some might rely on past behavior maybe and look to the docs to see changes

Regarding the documentation, perhaps no changes are needed since users won't notice that the UUID has changed.

picnixz · 2024-09-26T09:49:19Z

Note that RFC 4122/9562 does NOT require the clock sequence to be incremented (emphasis mine):

If the previous value of the clock sequence is known, it MAY be incremented; otherwise it SHOULD be set to a random or high-quality pseudorandom value.

RFC 4122 originally said something similar:

If the previous value of the clock sequence is known, it can just be incremented; otherwise it should be set to a random or high-quality pseudo-random value.

The Python implementation decided not to increment the clock sequence, probably to avoid a stable storage (as specified by the comment) and to keep a bit of unpredictability. As such, I'm not sure that this change should be done (this specific behaviour has been there for years).

If you want to change this behaviour, tests must be added and this would only be a feature rather than a bug fix IMO.

Lib/uuid.py

rruuaanng · 2024-09-26T10:55:17Z

want

Oh, it seems I missed updating the current clock sequence with the result of the previous increment.

rruuaanng · 2024-09-26T10:57:40Z

And I don't think we need to add tests for this because there are already tests for uuid1. My changes just make it more random rather than changing the result format entirely.

picnixz · 2024-09-26T11:01:43Z

We still need tests. A change of implementation needs to be tested. I'm also worried about the fact that the counter is never reset. The counter should be reset at some point when you change the timestamp.

(And I'm sorry but nothing is more random. It's actually more predictable since the clock sequence is incremented deterministically when possible).

rruuaanng · 2024-09-26T11:06:25Z

We still need tests. A change of implementation needs to be tested. I'm also worried about the fact that the counter is never reset. The counter should be reset at some point when you change the timestamp.

(And I'm sorry but nothing is more random. It's actually more predictable since the clock sequence is incremented deterministically when possible).

Hmmm, Maybe you're right, I'm thinking.

rruuaanng · 2024-09-26T11:19:22Z

Note that RFC 4122/9562 does NOT require the clock sequence to be incremented (emphasis mine):

If the previous value of the clock sequence is known, it MAY be incremented; otherwise it SHOULD be set to a random or high-quality pseudorandom value.

RFC 4122 originally said something similar:

If the previous value of the clock sequence is known, it can just be incremented; otherwise it should be set to a random or high-quality pseudo-random value.

The Python implementation decided not to increment the clock sequence, probably to avoid a stable storage (as specified by the comment) and to keep a bit of unpredictability. As such, I'm not sure that this change should be done (this specific behaviour has been there for years).

If you want to change this behaviour, tests must be added and this would only be a feature rather than a bug fix IMO.

But without making changes, it simply increments the timestamp, which makes no sense.

timestamp = _last_timestamp + 1

rruuaanng · 2024-09-26T12:39:24Z

@picnixz I have make change, please review!

rruuaanng · 2024-09-26T12:47:57Z

However, I remain skeptical about testing this, since there seems to be a check for uuid1 already. And my changes pass the existing test. The existing test includes a check on the duplication of large quantities of UUID.

picnixz · 2024-09-26T13:38:43Z

But without making changes, it simply increments the timestamp, which makes no sense.

Without this change, there is a random clock sequence if you don't specify one so it's not an issue. The implementation IS compliant, it's just that it does not use all the suggestions from the RFC (some lines below, there is a if clock_seq is None: clock_seq = random.getrandbits(14) so it is ok).

However, I remain skeptical about testing this, since there seems to be a check for uuid1 already

Yes but this doesn't check anything new. You MUST check that the counter is correctly incremented. In addition, it must be initialized to a RANDOM value (this is what the RFC states):

The clock sequence MUST be originally (i.e., once in the lifetime of a system) initialized to a random number to minimize the correlation across systems.

I will add a DO-NOT-MERGE label until all the interrogations have been answered. Note that regenerating the clock sequence for every non-specified clock sequence is fine and does not require all this. So I'm not convinced at all.

rruuaanng · 2024-09-26T14:47:54Z

But without making changes, it simply increments the timestamp, which makes no sense.

Without this change, there is a random clock sequence if you don't specify one so it's not an issue. The implementation IS compliant, it's just that it does not use all the suggestions from the RFC (some lines below, there is a if clock_seq is None: clock_seq = random.getrandbits(14) so it is ok).

However, I remain skeptical about testing this, since there seems to be a check for uuid1 already

Yes but this doesn't check anything new. You MUST check that the counter is correctly incremented. In addition, it must be initialized to a RANDOM value (this is what the RFC states):

The clock sequence MUST be originally (i.e., once in the lifetime of a system) initialized to a random number to minimize the correlation across systems.

I will add a DO-NOT-MERGE label until all the interrogations have been answered. Note that regenerating the clock sequence for every non-specified clock sequence is fine and does not require all this. So I'm not convinced at all.

For the suggestions mentioned above, I will make the following improvements:

Add a check to increment the clock sequence.
Check if the clock sequence is a random number (but I can't prove if the clock sequence is random, you know why).

picnixz · 2024-09-26T14:54:24Z

Check if the clock sequence is a random number (but I can't prove if the clock sequence is random, you know why).

There is a simple way to do it, namely mocking the interface. But honestly, I think all this change is really not needed and I am really not convinced by that. There is no issue with the current generation of the clock sequence. Even when you use the same timestamp, you will have different clock sequences so you will not have uniqueness issues (well, the entropy is only 14 bits so you can have collisions, but with a low probablity).

Finally, UUIDv1 are historic and UUIDv6 should be used as a replacement now that RFC 9562 has been accepted. I do have a PR that implements v6 and I think v1 should not be used anymore nowawadays. If people are concerned about the uniqueness, then they should generate their own clock sequence (hence the reason why we allow them to pass the clock sequence explicitly).

rruuaanng · 2024-09-26T15:03:30Z

Check if the clock sequence is a random number (but I can't prove if the clock sequence is random, you know why).

There is a simple way to do it, namely mocking the interface. But honestly, I think all this change is really not needed and I am really not convinced by that. There is no issue with the current generation of the clock sequence. Even when you use the same timestamp, you will have different clock sequences so you will not have uniqueness issues (well, the entropy is only 14 bits so you can have collisions, but with a low probablity).

Finally, UUIDv1 are historic and UUIDv6 should be used as a replacement now that RFC 9562 has been accepted. I do have a PR that implements v6 and I think v1 should not be used anymore nowawadays. If people are concerned about the uniqueness, then they should generate their own clock sequence (hence the reason why we allow them to pass the clock sequence explicitly).

I agree with your point of view. Perhaps I just want to put an end to this legacy issue, as Jacob mentioned, this problem has been open for a long time.

rruuaanng · 2024-09-27T03:01:49Z

I almost forgot, I need to respond to your first point again.

Without this change, there is a random clock sequence if you don't specify one so it's not an issue. The implementation IS compliant, it's just that it does not use all the suggestions from the RFC (some lines below, there is a if clock_seq is None: clock_seq = random.getrandbits(14) so it is ok).

Regarding the above, As mentioned in the RFC.

If the previous value of the clock sequence is known, it can just be incremented; otherwise it should be set to a random or high-quality pseudo-random value.

It's not that a new one is generated when the current clock sequence is empty, but rather that the current clock sequence being passed should be modified when the previous one does not exist.

Regarding the tests, I think you are right. I am working on writing the relevant tests. Maybe you can help me remove the tags once they are completed.

rruuaanng · 2024-09-27T04:06:42Z

Although I’m reluctant, it seems that the existing test_uuid1 does not check timestamp = _last_timestamp + 1 whether the original timestamp is incrementing (which I found to be difficult during implementation).

So I think it seems that there is no need to add corresponding tests, because the original one does not implement this approach.
And when the rollback is triggered, if the auto-increment does not run, it may be a problem with the interpreter.

picnixz · 2024-09-27T09:12:45Z

Although I’m reluctant, it seems that the existing test_uuid1 does not check timestamp = _last_timestamp + 1 whether the original timestamp is incrementing (which I found to be difficult during implementation).

Then this is an issue. The coverage is therefore insufficient. You can complete this test if you want but checking the clock sequence increment must be done in a test.

So I think it seems that there is no need to add corresponding tests, because the original one does not implement this approach.

Not having tests does not mean that new behaviour should not be tested.

It's not that a new one is generated when the current clock sequence is empty, but rather that the current clock sequence being passed should be modified when the previous one does not exist.

The RFC suggests to either increment the value or to always randomly generate a new clock sequence. The current implementation chooses this second alternative and thus is compliant with the standards.

Perhaps I just want to put an end to this legacy issue, as Jacob mentioned, this problem has been open for a long time.

I must disagree (again). The only issue I see with the current implementation is that the probabilty of having a collision within the same timestamp on the clock sequence is not that low (50% of chance of having a collision if you generate 128 UUIDs with the same timestamp).

The implementation follows one recommendation, maybe not what the original author expected, yet it is a correct implementation. One could always reduce the possibility of a UUIDv1 collision when generating many UUIDs within the same timestamp but once the clock sequence reaches $2^{14}$, you will have an overflow on the clock sequence and you will get the an already-used clock sequence, independently of the timestamp value.

Example:

last_clock_seq = 0
Timestamp = 0
Generate $N_1$ UUIDs: your clock sequence will be last_clock_seq, ..., last_clock_seq + N_1.
Timestamp = 1
Generate $N_2$ UUIDs: your clock sequence will be last_clock_seq + N_1 + 1, ..., last_clock_seq + N_1 + N_2.

Since only the 14 lowest bits of the clock sequence are kept when generating the UUID, once $N_1 + N_2 > 2^{14}$, you will have the same clock sequences for UUIDs with different timestamps but they will be entirely predictable. You do NOT want predictability in general.
Similarly, if $N_1 > 2^{14}$, you'll also generate UUIDs with the same timestamp and with the same clock sequence. This is the reason why you should re-randomize last_clock_seq if the timestamp changed.

So, the only reason why we would (but not necessarily should) change the current implementation is to avoid such issue. But UUIDv1 is historical and honestly no more recommended for production (UUIDv6 being the one being promoted (see #120650) or v4 if you want pure randomness). As such, I don't see the point of improving such legacy component (and for now, no one ever complained on this specific matter).

picnixz

I really want tests. In addition, you need to take care of the existing code L694-696 which explicitly says that we do not use a stable storage.

picnixz · 2024-09-27T09:13:18Z

Lib/uuid.py

+        # if there is a previous clock sequence, then increment
+        # otherwise, the clock sequence will be regenerated
+        if _last_clock_seq is not None:
+            clock_seq = _last_clock_seq + 1


You should not replace this one if clock_seq has been given by the user.

Maybe you're right about this. If the clock_seq goes past the max value for a long, it'll reset to 0. This means there's a chance it could clash with a UUID that was made when it was 0 (though for the clock to roll back, it would have to go back to the exact same time as that initial 0 moment).

rruuaanng · 2024-09-27T09:20:32Z

I will close this PR. Thank @picnixz for your attention to this incident during this period.

Update uuid1 in uuid module to meet RFC4122

4fa0463

bedevere-app bot added the awaiting review label Sep 26, 2024

bedevere-app bot mentioned this pull request Sep 26, 2024

Force UUIDv1 to use a clock sequence storage (RFC 4122) #88786

Closed

rruuaanng added 3 commits September 26, 2024 10:52

Delete my bug tag

ee5b3ec

Remove old code

4550483

Added comments

d302f62

This comment was marked as resolved.

Sign in to view

picnixz requested changes Sep 26, 2024

View reviewed changes

Lib/uuid.py Outdated Show resolved Hide resolved

Lib/uuid.py Outdated Show resolved Hide resolved

bedevere-app bot added awaiting core review and removed awaiting review labels Sep 26, 2024

rruuaanng added 2 commits September 26, 2024 19:11

Change clock back block

63f8cde

Add '.'

3cf280e

picnixz added the DO-NOT-MERGE label Sep 26, 2024

rruuaanng requested a review from picnixz September 27, 2024 03:04

rruuaanng requested a review from JacobCoffee September 27, 2024 04:17

picnixz requested changes Sep 27, 2024

View reviewed changes

rruuaanng closed this Sep 27, 2024

rruuaanng reopened this Sep 27, 2024

rruuaanng closed this Sep 27, 2024

rruuaanng deleted the gh88786 branch September 27, 2024 09:24

Uh oh!

gh-88786: Update uuid.uuid1() in uuid module to meet RFC4122 #124571

gh-88786: Update uuid.uuid1() in uuid module to meet RFC4122 #124571

Uh oh!

Conversation

rruuaanng commented Sep 26, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rruuaanng commented Sep 26, 2024

Uh oh!

JacobCoffee commented Sep 26, 2024

Uh oh!

This comment was marked as resolved.

Uh oh!

rruuaanng commented Sep 26, 2024

Uh oh!

rruuaanng commented Sep 26, 2024

Uh oh!

picnixz commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rruuaanng commented Sep 26, 2024

Uh oh!

rruuaanng commented Sep 26, 2024

Uh oh!

picnixz commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rruuaanng commented Sep 26, 2024

Uh oh!

rruuaanng commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rruuaanng commented Sep 26, 2024

Uh oh!

rruuaanng commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Sep 26, 2024

Uh oh!

rruuaanng commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Sep 26, 2024

Uh oh!

rruuaanng commented Sep 26, 2024

Uh oh!

rruuaanng commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rruuaanng commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Sep 27, 2024

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

picnixz Sep 27, 2024

Choose a reason for hiding this comment

Uh oh!

rruuaanng Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rruuaanng commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gh-88786: Update `uuid.uuid1()` in `uuid` module to meet RFC4122 #124571

gh-88786: Update `uuid.uuid1()` in `uuid` module to meet RFC4122 #124571

rruuaanng commented Sep 26, 2024 •

edited by bedevere-app bot

Loading

picnixz commented Sep 26, 2024 •

edited

Loading

picnixz commented Sep 26, 2024 •

edited

Loading

rruuaanng commented Sep 26, 2024 •

edited

Loading

rruuaanng commented Sep 26, 2024 •

edited

Loading

rruuaanng commented Sep 26, 2024 •

edited

Loading

rruuaanng commented Sep 27, 2024 •

edited

Loading

rruuaanng commented Sep 27, 2024 •

edited

Loading

rruuaanng Sep 27, 2024 •

edited

Loading

rruuaanng commented Sep 27, 2024 •

edited

Loading