fix: Further improve performance of the UTF-8 string comparison logic #2182
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR both improves performance and simplifies the UTF-8 string comparison logic. It addresses prior performance regressions by introducing a more optimized algorithm for string ordering.
The semantics of the UTF-8 string comparison logic were originally fixed by #1967, but this fix caused a material performance degradation, which was then improved by #2021. The performance was, however, presumably still sub-optimal, and this PR further improves the speed back to close to its original speed and, serendipitously, simplifies the algorithm too.
This PR effectively ports the following two PRs from the firebase-android-sdk repository:
Highlights
compareUtf8Strings()
method inOrder.java
has been rewritten to improve performance and simplify its logic. The new algorithm leverages the relationship between UTF-8 and UTF-16 representations for more efficient string comparison, avoiding costly byte string conversions.codePointAt
andByteString.copyFromUtf8
for non-ASCII characters has been replaced with a more straightforward character-by-character comparison that intelligently handles surrogate pairs.OrderTest.java
has been updated to improve the robustness of thecompareUtf8Strings()
test. Instead of asserting exact return values, it now checks thesignum
(sign) of the comparison result, which is a more appropriate way to validate comparator behavior.