- 
                Notifications
    You must be signed in to change notification settings 
- Fork 8k
Fix uniqid() performances #18232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix uniqid() performances #18232
Conversation
If available, use uuidgen() system call instead of a loop on gettimeofday(), that improves performances lot.
        
          
                ext/standard/uniqid.c
              
                Outdated
          
        
      | ZEND_PARSE_PARAMETERS_END(); | ||
|  | ||
| #ifdef __NetBSD__ | ||
| struct uuid uuid; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused here. uuidgen is on FreeBSD/NetBSD (other oses have different apis) which you detect at configure time. Why the particular NetBSD code path is needed ? we should fall into the code above I think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I do not really get the code duplication. But not sure it s worth the efforts to be honest considering later comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this be even executed - it's in elif section. Basically:
#ifdef HAVE_UUIDGEN
...
#elif HAVE_GETTIMEOFDAY
...
#ifdef HAVE_UUIDGEN
That doesn't make any sense to me. Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I committed it a bit too early, I should I removed that section that was used in earlier tests. It is fixed now.
| Now that I remember, cpython used to use native calls for uuid generations, came back and forth to fix undesired behavior differences and ultimately decided to do one unique implementation in python. | 
before adding the uuidgen() test in configure
| Honestly, this function is so horrible that I wonder if we can't just replace the implementation with something entirely different but sane, given that deprecating it didn't pass. | 
| 
 As a weaker constraint that is also guaranteed by the time-based guarantee: The output is monotonically increasing, which is convenient for databases, because new values will not be distributed all over the place. As I've also explained in my deprecation proposal (https://wiki.php.net/rfc/deprecations_php_8_4#deprecate_uniqid): Alternatives are already available that are better in every regard (except for “succinctness” perhaps) and changing the implementation will result in breaking changes for at least some of the users. Deprecating the function without intent of removal would have been the least impactful solution. Trying to make small incremental changes to  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the general remarks regarding the state of uniqid(), there are also issues with the proposed implementation.
In any case, I'm requesting an RFC being written to change the uniqid() implementation, due to the breaking changes with regard to documented behavior.
        
          
                ext/standard/uniqid.c
              
                Outdated
          
        
      | if (more_entropy) { | ||
| n[1] &= 0xffffff; | ||
| uniqid = strpprintf(0, "%s%08x%06x.%08x", prefix, n[0], n[1], n[2]); | ||
| } else { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output format is incompatible with the non-uuidgen output. It also leaks parts of the hosts MAC address and also doesn't actually provide “more entropy”, since the mac address is fixed on a single node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually compatible, despite the fact I do not understand how gettimeofday+more entropy produces 14+8 characters, where I would expect 13+8 reading the code. Check it on your own:
$ php -r 'printf("%s\n", uniqid("", 1));'
67ee7f958a5b34.94453710
For the MAC address, I understand it is not included in UUID anymore:
$ uuidgen
bf0ff2ae-cf7a-46b1-bd2e-c3577268e311
$ uuidgen
39638058-e887-41a5-97d1-6451b0622763
But I can understand it may be platform dependent, and we must avoid using it in case it could contains the MAC.
on some platforms. Reuse the more_entropy from the gettimeofday() version instead. Refactor to avoid code duplication.
| Some numbers to show how badly slow the original implementation can be, here on a NetBSD 10.0 amd64 Xen domU. First the original version, second the uuidgen() flavor. Note that this is 100 iterations versus 1M. $ time php -r 'for ($i = 0; $i < 100; $i++) uniqid();' $ time php -r 'for ($i = 0; $i < 1000000; $i++) uniqid();' | 
| My understanding is that Xen (historically?) had a particularly slow clock implementation. For Linux I believe this has been fixed since. I'm sympathetic to  From your email address I'm seeing you are a NetBSD committer. Is NetBSD actually performing a syscall for each call to  As I said before, changing  The  | 
| 
 Indeed it is not, but it could help. There are many applications relying on uniqid(), and if you search the web, you find other people reporting performance problems with uniqid() on various setups. For the NetBSD case, that was tracked down to Xen VM with multiple processor and using the clockinterrupt timecounter. In that setup, around 100k calls to gettimeofday() return the same time until it gets increased. This is certainly a bug that deserves to be fixed. | 
misbehaves, while keeping the documented behavior of returning a time based value. Instead of looping on gettimeofday until the system time changes, call it once, and if time has not changed, return the previous value +1us.
| 
 I pushed an updated approach, which I think honors the documented behavior. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. That certainly is better, but I'm still not sure if I like it for the aforementioned reasons. It would definitely make uniqid() even more predictable when knowing that the system is running Xen / some slow clock implementation, making it even worse for cases where it is incorrectly used for values that need to be unpredictable.
        
          
                ext/standard/uniqid.c
              
                Outdated
          
        
      | if (tv.tv_sec <= prev_tv.tv_sec || | ||
| (tv.tv_sec == prev_tv.tv_sec && tv.tv_usec <= prev_tv.tv_usec)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (tv.tv_sec <= prev_tv.tv_sec || | |
| (tv.tv_sec == prev_tv.tv_sec && tv.tv_usec <= prev_tv.tv_usec)) { | |
| if (tv.tv_sec < prev_tv.tv_sec || | |
| (tv.tv_sec == prev_tv.tv_sec && tv.tv_usec <= prev_tv.tv_usec)) { | 
Otherwise the second branch will never be taken.
| 
 In cases where PHP is running faster than the clock, the output is already predicable. | 
If available, use uuidgen() system call instead of a loop on gettimeofday(), that improves performances lot.