-
Notifications
You must be signed in to change notification settings - Fork 8k
Fix SORT_REGULAR with new transitive comparison functions #20517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Girgias
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various comments and questions and this needs a rebase as I refactored the sorting code to remove a bunch of duplication.
| static int php_array_hash_compare_transitive(zval *zv1, zval *zv2) /* {{{ */ | ||
| { | ||
| return php_array_compare_transitive(zv1, zv2); | ||
| } | ||
| /* }}} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept this one so we can pass a compare_func_t to zend_hash_compare().
php_array_compare_transitive() doesn’t match that signature, so we still need this tiny adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment is no longer valid, this can be removed, but I noticed a measurable regression in my benchmarks after removing it, so I decided to keep it in place. I should probably include a comment in the function regarding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you benchmarking? Because I don't really see why it would regress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I built a benchmark that evaluates all comparison ops, sort family, and all the array functions that use SORT_REGULAR -- 142 ops in total. I have a dedicated FreeBSD server in my office that I use for benchmarking. Wall-clock CV stays below 0.1%, so the deltas are solid.
When I remove the wrapper and pass php_array_compare_transitive() directly to zend_hash_compare(), 66/142 ops get slower (Time-Weighted ΔMedian%: 0.03%).
With the wrapper in place I see 60/142 ops get slower (Time-Weighted ΔMedian%: -1.20%).
My best guess is that the wrapper keeps php_array_compare_transitive() inlinable at its other call sites; once its address escapes through zend_hash_compare() the compiler stops inlining it into the rest of array.c.
Is there another way to keep those call sites inlined while still satisfying the desire to avoid the wrapper?
| } | ||
| /* }}} */ | ||
|
|
||
| static int php_array_compare_transitive(zval *op1, zval *op2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you need this? I thought the transitivity issue was only about numeric strings and numeric values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept php_array_compare_transitive() because once we decide an array needs the "regular" fallback (e.g. it mixes numeric strings and numbers, or it contains arrays/objects whose elements do), we have to apply that stricter comparison recursively. zend_compare() would hit the same non‑transitive behavior when it dives into nested arrays or object properties, so this helper still wraps the recursive walk and reuses the numeric‑string handling (plus the enum ordering) at each level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right... this is annoying to say the least. Please expand on the comment to explain that what you are overriding is enum, array, and object comparison, you probably should also add a comment near zend_compare to apply changes back to here.
Or maybe just check that the values are enums, arrays, or objects and then defer to zend_compare() for everything else to prevent duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the full php_array_compare_transitive() matrix because the non-transitivity isn't limited to enums/arrays/objects. The original bug repros with scalars, so if we just short-circuited to zend_compare() for "everything else," we'd immediately fall back into the non-transitive ordering we're trying to fix. That's why the helper mirrors zend_compare() and overrides the handful of cases that can become non-transitive.
I can definitely expand the comment to spell that out and add a "keep in sync with zend_compare()" note near the helper, but we can't simply defer to zend_compare() for the scalar cases without reintroducing the bug.
|
@Girgias thank you for taking the time to provide the careful review! Looks like I was able to capture your sorting code refactor when I created this new branch. I'll push a fresh commit what I addressed in your code comments. Thanks again for the help! |
f6e9f05 to
374a660
Compare
Girgias
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please only do the fix for the transitivity.
Optimizations can be decided later, but currently it just pollutes the PR and makes it harder to review and merge.
374a660 to
2ff1700
Compare
|
@Girgias yes, I clearly got a bit carried away haha. I decided to reimplement and force push a clean commit. Sorry for the mess I made of this PR. I have a bag full of optimizations we can save for a follow-up PR. One worth calling out would be to split |
| static int php_array_hash_compare_transitive(zval *zv1, zval *zv2) /* {{{ */ | ||
| { | ||
| return php_array_compare_transitive(zv1, zv2); | ||
| } | ||
| /* }}} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you benchmarking? Because I don't really see why it would regress?
| /* Mirrors zend_std_compare_objects(), but recurses via php_array_compare_transitive() | ||
| * so nested properties obey SORT_REGULAR's transitive ordering. */ | ||
| static int php_array_compare_transitive_objects(zval *o1, zval *o2) /* {{{ */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think might make more sense is to create a zend_std_compare_objects_ex() function that takes a function pointer for the prop table comparison if this is identical.
As hopefully the compiler will inline the behaviour properly in zend_std_compare_objects() so that it should be equivalent. As for quite a bit I was trying to understand what the point of this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just gave it a try, and benchmarked it. I saw a small, almost negligable, regression. I see Time-Weighted ΔMedian% increased ~0.9% (from -1.20% to -0.31%).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a different idea. I created a zend_object_compare_kind enum in zend_object_handlers.h and added zend_std_compare_objects_ex(), so the standard object comparator can flip between zend_compare() and a transitive variant (zend_compare_transitive() without going through a function-pointer callback.
To make that transitive mode reusable everywhere, I moved the SORT_REGULAR compare logic into Zend itself (zend_compare_transitive(), plus zend_compare_symbol_tables_transitive() and the enum-aware helpers).
This design showed a negligible difference (within measurement noise) in my benchmarks compared to the current implementation.
I'm happy to push another commit with this change if you'd like to see.
| } | ||
| /* }}} */ | ||
|
|
||
| static int php_array_compare_transitive(zval *op1, zval *op2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right... this is annoying to say the least. Please expand on the comment to explain that what you are overriding is enum, array, and object comparison, you probably should also add a comment near zend_compare to apply changes back to here.
Or maybe just check that the values are enums, arrays, or objects and then defer to zend_compare() for everything else to prevent duplication.
- Add zend_compare_{long,double}_to_string_ex() plus
zendi_smart_strcmp_ex() so SORT_REGULAR can invoke transitive-aware
scalar comparisons without touching zend_compare()
- Introduce php_array_compare_transitive() (pared-down zend_compare())
and php_array_compare_transitive_objects() (mirrors
zend_std_compare_objects()) so arrays, objects, and enums recurse with
transitive ordering
- Route the public sort APIs and array_unique() through
php_array_sort_regular() so PHP_SORT_REGULAR always uses the new
comparator
- Add regression tests: phpGH-20262 (array_unique with enums/objects/nested
arrays) plus SORT_REGULAR consistency tests for sort()/ksort() on
numeric-string edge cases
Fixes: phpGH-20262
- Make every php_get_*_compare_func{,_reverse,_unstable} return the
*_regular variants so the public sort APIs no longer need
php_array_sort_regular()
- Drop php_array_sort_regular() and the old key/data compare impl
helpers now that their logic lives in the generated *_regular
comparators
- Have array_unique() fetch its unstable comparator exclusively through
php_get_data_compare_func_unstable(), matching the rest of the sort
entry points
- Compare backed enums via their stored backing values so SORT_REGULAR’s common path no longer fetches and compares case names; unit enums still fall back to case-name ordering, with object handles as the deterministic tie-breaker - Add ext/standard/tests/array/sort/sort_enum_stability.phpt to ensure both unit and backed enums produce the same sorted order regardless of access order
2ff1700 to
9026917
Compare
- Remove DEFINE_SORT_VARIANTS_USING macro layer - Inline the implementation directly in DEFINE_SORT_VARIANTS - Move enum helper functions after DEFINE_SORT_VARIANTS usage
…helper - Replace php_array_apply_sort with php_sort that handles parameter parsing - Consolidate duplicate parameter parsing code across asort, arsort, sort, rsort, krsort, and ksort - Each sort function now simply calls php_sort with appropriate compare function and renumber flag
Apply IEEE 754 totalOrder predicate for NaN handling in transitive SORT_REGULAR comparisons. This provides a consistent, deterministic ordering where NaN values sort after +INF but before non-numeric strings: -INF < finite numbers < +INF < NaN < non-numeric strings
Summary
Fixes #20262 by making SORT_REGULAR fall back to a fully transitive comparator whenever loose comparison semantics would otherwise be non-transitive (numeric strings vs ints/floats, enums, nested arrays/objects). This keeps duplicates grouped so array_unique() and the sort family behave consistently.
Highlights
php_array_compare_transitive()/php_array_compare_transitive_objects()and wire all SORT_REGULAR compare dispatchers (php_get_*_compare_func{,_reverse,_unstable}) to the generated*_regularcomparators, keeping array_unique()/diff/intersect on the same transitive path.array_unique()(scalars, objects, nested arrays),sort()/ksort()numeric-string edge cases, and enum ordering stability.Performance Impact
Status: Unoptimized Implementation
This initial implementation prioritizes correctness and transitivity to fix the underlying stability issues. It is not yet optimized, relying on a generalized dispatch mechanism to ensure the logic holds up under review.
As a result, there are known regressions in specific comparison operations due to the overhead of the new dispatch logic:
ksort/krsort): ~10% regression (general overhead).However, even in this unoptimized state, the new architecture yields significant wins in common scenarios:
Roadmap:
Once the transitive comparison logic is approved, I will submit a follow-up PR with finely tuned optimizations. These changes are expected to eliminate the current regressions and dramatically improve performance across all remaining operations.