Skip to content

Implement stateful surpasses() for until()#7745

Merged
Manishearth merged 12 commits intounicode-org:mainfrom
poulsbo:stateful-surpasses
Mar 13, 2026
Merged

Implement stateful surpasses() for until()#7745
Manishearth merged 12 commits intounicode-org:mainfrom
poulsbo:stateful-surpasses

Conversation

@poulsbo
Copy link
Contributor

@poulsbo poulsbo commented Mar 6, 2026

Improve performance of until() by introducing a new, stateful implementation of surpasses() that minimizes recalculation in field iteration loops.

Changelog

icu_calendar: Speed up until year and month field handling by 75% on average by optimizing surpasses calculation

@poulsbo
Copy link
Contributor Author

poulsbo commented Mar 6, 2026

Note: Please see PR #7739 for detailed discussion of this implementation, around here.

@poulsbo poulsbo force-pushed the stateful-surpasses branch from 885f70e to a5e6c00 Compare March 6, 2026 20:04
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting on review until Shane's suggestion in #7739 (comment) is implemented

@robertbastian
Copy link
Member

what are the performance numbers here with #7739 as the baseline?

@poulsbo
Copy link
Contributor Author

poulsbo commented Mar 9, 2026

what are the performance numbers here with #7739 as the baseline?

These numbers are for the current, overly complicated implementation. I filtered out the week/day stuff which isn't relevant since those are handled by the (unchanged) fast RD path.

Benchmark Results:
+--------+-----------------------+---------------+-----------+-----------+------------+----------+-----------+----------+
| Field  | Calendar              | Distance      | Orig Time |  New Time |    Abs Chg |    % Chg | Status    | Outliers |
+--------+-----------------------+---------------+-----------+-----------+------------+----------+-----------+----------+
| months | buddhist              | far_past      | 678.19 ns | 224.50 ns | -453.69 ns | -66.897% | Improved  |      11% |
| months | chinese_cached        | far_past      |  29.71 ms |   4.12 ms |  -25.59 ms | -86.143% | Improved  |       6% |
| months | coptic                | far_past      |   9.13 µs |  70.83 ns |   -9.06 µs | -99.224% | Improved  |       8% |
| months | ethiopic              | far_past      |   9.06 µs |  69.82 ns |   -8.99 µs | -99.229% | Improved  |       9% |
| months | gregorian             | far_past      | 766.32 ns | 225.65 ns | -540.67 ns | -70.554% | Improved  |       9% |
| months | hebrew                | far_past      |   2.45 ms |  52.78 µs |   -2.40 ms | -97.848% | Improved  |       6% |
| months | indian                | far_past      | 695.12 ns | 204.92 ns | -490.20 ns | -70.52 % | Improved  |       4% |
| months | islamic/civil         | far_past      |   1.64 µs | 479.98 ns |   -1.16 µs | -70.779% | Improved  |       4% |
| months | islamic/observational | far_past      | 317.91 µs |  73.99 µs | -243.92 µs | -76.725% | Improved  |       4% |
| months | islamic/tabular       | far_past      |   1.63 µs | 479.51 ns |   -1.15 µs | -70.58 % | Improved  |       9% |
| months | islamic/ummalqura     | far_past      | 317.88 µs |  73.75 µs | -244.13 µs | -76.8  % | Improved  |      10% |
| months | iso                   | far_past      | 676.54 ns | 224.57 ns | -451.97 ns | -66.806% | Improved  |      10% |
| months | julian                | far_past      | 326.32 ns |  88.89 ns | -237.43 ns | -72.759% | Improved  |       6% |
| years  | buddhist              | far_past      | 761.78 ns | 159.47 ns | -602.31 ns | -79.066% | Improved  |       9% |
| years  | buddhist              | very_far_past | 456.37 ns | 108.58 ns | -347.79 ns | -76.208% | Improved  |      12% |
| years  | chinese_cached        | far_past      |   7.49 µs |   1.02 µs |   -6.47 µs | -86.373% | Improved  |       6% |
| years  | chinese_cached        | very_far_past |   4.79 µs | 833.12 ns |   -3.96 µs | -82.623% | Improved  |       5% |
| years  | coptic                | far_past      | 243.36 ns |  78.62 ns | -164.74 ns | -67.694% | Improved  |       7% |
| years  | coptic                | very_far_past | 746.46 ns | 181.95 ns | -564.51 ns | -75.625% | Improved  |      10% |
| years  | ethiopic              | far_past      | 245.53 ns |  76.88 ns | -168.65 ns | -68.689% | Improved  |       7% |
| years  | ethiopic              | very_far_past | 752.07 ns | 181.09 ns | -570.98 ns | -75.921% | Improved  |       4% |
| years  | gregorian             | far_past      | 578.06 ns | 171.22 ns | -406.84 ns | -70.38 % | Improved  |      16% |
| years  | gregorian             | very_far_past | 331.79 ns | 107.84 ns | -223.95 ns | -67.498% | Improved  |       6% |
| years  | hebrew                | far_past      |   3.05 µs | 531.00 ns |   -2.52 µs | -82.58 % | Improved  |       7% |
| years  | hebrew                | very_far_past |   2.51 µs | 494.80 ns |   -2.01 µs | -80.263% | Improved  |       8% |
| years  | indian                | far_past      | 672.62 ns | 135.85 ns | -536.77 ns | -79.803% | Improved  |      10% |
| years  | indian                | very_far_past | 394.19 ns |  82.68 ns | -311.51 ns | -79.025% | Improved  |       9% |
| years  | islamic/civil         | far_past      |   2.31 µs | 469.63 ns |   -1.84 µs | -79.704% | Improved  |      10% |
| years  | islamic/civil         | very_far_past | 703.45 ns | 235.17 ns | -468.28 ns | -66.569% | Improved  |      12% |
| years  | islamic/observational | far_past      |   2.29 µs | 436.13 ns |   -1.85 µs | -80.917% | Improved  |       8% |
| years  | islamic/observational | very_far_past | 658.50 ns | 227.19 ns | -431.31 ns | -65.499% | Improved  |      10% |
| years  | islamic/tabular       | far_past      |   2.32 µs | 473.96 ns |   -1.85 µs | -79.569% | Improved  |       3% |
| years  | islamic/tabular       | very_far_past | 662.29 ns | 234.88 ns | -427.41 ns | -64.535% | Improved  |       6% |
| years  | islamic/ummalqura     | far_past      |   2.34 µs | 437.19 ns |   -1.91 µs | -81.335% | Improved  |       6% |
| years  | islamic/ummalqura     | very_far_past | 627.01 ns | 227.51 ns | -399.50 ns | -63.715% | Improved  |       7% |
| years  | iso                   | far_past      | 140.87 ns | 158.40 ns |   17.53 ns |  12.446% | Regressed |      14% |
| years  | iso                   | very_far_past | 446.46 ns | 108.54 ns | -337.92 ns | -75.689% | Improved  |      13% |
| years  | julian                | far_past      | 381.51 ns |  88.04 ns | -293.47 ns | -76.923% | Improved  |       8% |
| years  | julian                | very_far_past | 835.89 ns | 158.41 ns | -677.48 ns | -81.049% | Improved  |       5% |
+--------+-----------------------+---------------+-----------+-----------+------------+----------+-----------+----------+

Summary by calendar:
+-----------------------+-----------+----------+-----------+-------+----------+-------+
| Calendar              | Avg % Chg | Improved | Regressed | Noise | Status   | Count |
+-----------------------+-----------+----------+-----------+-------+----------+-------+
| buddhist              | -74.057 % |        3 |         0 |     0 | Improved |     3 |
| chinese_cached        | -85.0463% |        3 |         0 |     0 | Improved |     3 |
| coptic                | -80.8477% |        3 |         0 |     0 | Improved |     3 |
| ethiopic              | -81.2797% |        3 |         0 |     0 | Improved |     3 |
| gregorian             | -69.4773% |        3 |         0 |     0 | Improved |     3 |
| hebrew                | -86.897 % |        3 |         0 |     0 | Improved |     3 |
| indian                | -76.4493% |        3 |         0 |     0 | Improved |     3 |
| islamic/civil         | -72.3507% |        3 |         0 |     0 | Improved |     3 |
| islamic/observational | -74.3803% |        3 |         0 |     0 | Improved |     3 |
| islamic/tabular       | -71.5613% |        3 |         0 |     0 | Improved |     3 |
| islamic/ummalqura     | -73.95  % |        3 |         0 |     0 | Improved |     3 |
| iso                   | -43.3497% |        2 |         1 |     0 | Mixed    |     3 |
| julian                | -76.9103% |        3 |         0 |     0 | Improved |     3 |
+-----------------------+-----------+----------+-----------+-------+----------+-------+

Summary by field:
+--------+-----------+----------+-----------+-------+----------+-------+
| Field  | Avg % Chg | Improved | Regressed | Noise | Status   | Count |
+--------+-----------+----------+-----------+-------+----------+-------+
| months | -78.8357% |       13 |         0 |     0 | Improved |    13 |
| years  | -72.1079% |       25 |         1 |     0 | Mixed    |    26 |
+--------+-----------+----------+-----------+-------+----------+-------+

Summary by distance:
+---------------+-----------+----------+-----------+-------+----------+-------+
| Distance      | Avg % Chg | Improved | Regressed | Noise | Status   | Count |
+---------------+-----------+----------+-----------+-------+----------+-------+
| far_past      | -74.825 % |       25 |         1 |     0 | Mixed    |    26 |
| very_far_past | -73.4015% |       13 |         0 |     0 | Improved |    13 |
+---------------+-----------+----------+-----------+-------+----------+-------+

@poulsbo
Copy link
Contributor Author

poulsbo commented Mar 10, 2026

I forgot to mention, I reverted the surpasses() implementation in 8f7d2f7 back to its original state. Only until() uses the new SurpassesChecker.

This is not finished, but submitted now to see if this approach looks good to folks. TODO next:

  • Add independent test coverage for surpasses()
  • Refactor surpasses() to use SurpassesChecker

@poulsbo poulsbo force-pushed the stateful-surpasses branch 3 times, most recently from c8e9585 to e5e08cc Compare March 11, 2026 13:46
@poulsbo
Copy link
Contributor Author

poulsbo commented Mar 11, 2026

@sffc (or anyone else for whom this has an obvious answer):

After this change, surpasses() has no internal callers. It also has no public API surface. It was only being called by until(), which now does all NonISODateSurpasses checks via the stateful SurpassesChecker.

Should the original (non-stateful) implementation just be deleted?

@Manishearth
Copy link
Member

It might be useful to have it stick around so we can prove that it has the same behavior, but aside from that it seems ok to delete. I haven't looked at the code yet, let's wait for Shane to look.

@poulsbo
Copy link
Contributor Author

poulsbo commented Mar 11, 2026

@Manishearth Just saw PR #7753, nice. I will merge and retest and I will be done with this, pending review feedback.

poulsbo added 4 commits March 12, 2026 12:19
…eSurpasses spec

This is not ready to go, but I wanted to get some feedback.

Most of the following will not make sense until one has seen the code.

- The priority here is to stick to the spec, and avoid recalculation.
  There are some clumsy aspects, like the need for the caller to
  invoke the checker in a specific way:

  1. Large fields to small fields.
  2. Before moving from field A to field B, make a final
     call with the computed value of field A, to set the internal
     state correctly.
  3. Caller maintains their own y/m/w/d.

  I tried to work around unicode-org#2 and unicode-org#3, but couldn't come up with
  anything clean. It seems like the current approach is a good
  compromise.

- For the debug assertions, clippy complained that I was calling
  a mut function. This is actually ok, because the object
  is reset to a new state immediately afterwards. The workaround
  to the clippy warning is expensive, but should be ok because it's
  just for debug. I tried other approaches but didn't get anything
  else to work. Adding an "allow" directive didn't work because
  debug_assert! is a macro.

- There is now no in-library usage of surpasses(). All the testing
  for that was via until() as far as I can tell. The work item is
  to add test coverage for surpasses(), and once that looks good,
  I will refactor surpasses() to use the stateful checker, to
  optimize performance and eliminate duplicate code.
@poulsbo poulsbo force-pushed the stateful-surpasses branch from 8ebf5ed to 2cb4975 Compare March 12, 2026 17:03
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is very close to what I had in mind!

Comment on lines +1162 to +1169
let surpasses_years = ArithmeticDate::<C>::compare_surpasses_lexicographic(
self.sign,
self.y0,
base_month,
self.parts.day(),
self.cal_date_2,
self.cal,
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This doesn't use the results of the m0 calculations, and it implements step 4 in the spec. Move this up to the comment about step 4, and move constrain and m0 to step 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 7dc6469

// ordinal month 6), a lexicographical-only check fails to detect that a later
// day in Adar I actually surpasses an earlier day in Adar. This ordinal check
// ensures the year counter correctly stays at 0 instead of over-incrementing to 1.
surpasses_years || self.surpasses_months(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. It correctly justifies why you need to do this in terms of calendar logic. If you prefer, the following explanation is shorter and also correct.

If the lexicographic check returns false, the spec continues to the ordinal check, which is implemented in surpasses_months.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 7dc6469

self.end_of_month.day
};

surpasses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add a comment like

// We do not need to perform any other checks because step 8 of the spec guarantees an early return if weeks and days are 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 7dc6469

self.cal_date_2,
);

// 9. Let endOfMonth be BalanceNonISODate(calendar, monthsAdded.[[Year]], monthsAdded.[[Month]] + 1, 0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (optional): consider moving steps 9-12 to a function set_month which you call up above in the until impl only once when you need it. It seems like you can get a little bit more performance if you avoid calling new_balanced (which is expensive) until you need it for weeks and days.

You could also move step 5 into a new fn set_year but that's less impactful because we need to run that code each time for reasons explained in the 8-line comment you wrote.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 7dc6469

I didn't do the same in set_year because, not only is it less impactful, it's also a little messier. We'd need another state variable for base_month, or we'd have to copy-paste the code from surpasses_year which is a little ugly.

poulsbo and others added 4 commits March 13, 2026 11:36
Co-authored-by: Shane F. Carr <shane@unicode.org>
- Move some code around to more closely match the spec

- Improve performance of month iteration by minimizing calls to `new_balanced`

- Add explicit `set_<field>` methods to `SurpassesChecker`, to be clearer
  semantically. This is a place to "freeze" the internal state of the checker
  for a given field value. These usually inline to call `surpasses_<field>`, but in
  the case of months, they do the expensive `new_balanced` call. Future optimizations
  can move more stuff into `set_<field>`.
@poulsbo
Copy link
Contributor Author

poulsbo commented Mar 13, 2026

Can I get a quick review on f58a9da (CHANGELOG.md)

UPDATE: Never mind, it's supposed to go in the PR description, let me fix that

@poulsbo
Copy link
Contributor Author

poulsbo commented Mar 13, 2026

@sffc Could you push the merge (squash) button? Looks like it's good to go.

@Manishearth Manishearth merged commit 7244442 into unicode-org:main Mar 13, 2026
35 of 36 checks passed
@poulsbo poulsbo deleted the stateful-surpasses branch March 13, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants