Skip to content

fix: Further improve performance of the UTF-8 string comparison logic #2182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 7, 2025

Conversation

dconeybe
Copy link
Contributor

@dconeybe dconeybe commented Jul 7, 2025

This PR both improves performance and simplifies the UTF-8 string comparison logic. It addresses prior performance regressions by introducing a more optimized algorithm for string ordering.

The semantics of the UTF-8 string comparison logic were originally fixed by #1967, but this fix caused a material performance degradation, which was then improved by #2021. The performance was, however, presumably still sub-optimal, and this PR further improves the speed back to close to its original speed and, serendipitously, simplifies the algorithm too.

This PR effectively ports the following two PRs from the firebase-android-sdk repository:

Highlights

  • Performance Optimization: The compareUtf8Strings() method in Order.java has been rewritten to improve performance and simplify its logic. The new algorithm leverages the relationship between UTF-8 and UTF-16 representations for more efficient string comparison, avoiding costly byte string conversions.
  • Algorithm Simplification: The previous logic involving codePointAt and ByteString.copyFromUtf8 for non-ASCII characters has been replaced with a more straightforward character-by-character comparison that intelligently handles surrogate pairs.
  • Test Robustness: The OrderTest.java has been updated to improve the robustness of the compareUtf8Strings() test. Instead of asserting exact return values, it now checks the signum (sign) of the comparison result, which is a more appropriate way to validate comparator behavior.

The semantics of this logic were originally fixed by #1967, but this fix
caused a material performance degradation, which was then improved by #2021.
The performance was, however, still suboptimal, and this PR further improves the
speed back to close to its original speed and, serendipitously, simplifies the
algorithm too.

This commit effectively ports the following two PRs from the
firebase-android-sdk repository:

- firebase/firebase-android-sdk#7098
- firebase/firebase-android-sdk#7109
@dconeybe dconeybe marked this pull request as ready for review July 7, 2025 20:21
@dconeybe dconeybe requested a review from a team as a code owner July 7, 2025 20:21
@dconeybe dconeybe merged commit 869a381 into main Jul 7, 2025
36 of 37 checks passed
@dconeybe dconeybe deleted the dconeybe/Utf8StringComparePerformanceFix branch July 7, 2025 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/java-firestore API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants