Improve the performance of ArithmeticDate.until()#7739
Improve the performance of ArithmeticDate.until()#7739Manishearth merged 3 commits intounicode-org:mainfrom
ArithmeticDate.until()#7739Conversation
Improve the performnace of the calendar `until()` function. - Tighten the estimation of the number of months between two dates in order to reduce the number of incremental `surpasses()` tests. This is done by introducing a new DateFieldsResolver function, `min_months_from()`, which the Coptic, Ethiopic, and Hebrew calendar implementations customize. - Eliminate the `surpasses()` loop for the final days delta calculation. Instead compute an intermediate date, which is the start date plus the years, months, and weeks computed so far. Subtract the Rata Die of the intermediate date from that of the target date to get the days field of the DateDuration. - Fix minor typos in function names and comments. Performance, as measured using criterion, is improved for Years and Months intervals. Improvements on the order of 20% to 80%, or 100s of ns to >1 ms, are shown, for base times of 500 ns to 5 ms. For Weeks and Days, which are already computed efficiently using RD subtraction, criterion measures a slight slowdown (on the order of 1 ns, for a 15-20 ns operation). This is most likely due to a change in compiler inlining, optimization, register usage, etc., since the code for Days/Weeks has not changed.
|
Output of Many regressions are in the 1-2 ns range. For example, this seems to be 11.7 ns -> 13.2 ns, delta = +1.5 ns: Code for Here's the full output. |
robertbastian
left a comment
There was a problem hiding this comment.
I'm curious, how is the Hebrew, Coptic, and Ethiopian performance without custom min_months_from implementations? You get 80% performance improvements for calendars where the min_months_from computation didn't, so is that actually pulling its weight?
| // (7y+1)/19 is the number of leap years before year y. Compute the | ||
| // number of leap years in the start year, up to but not including the | ||
| // end year. | ||
| let count_leaps = |y: i64| (7 * y + 1).div_euclid(19); |
There was a problem hiding this comment.
this probably doesn't need to be micro-optimised, but:
this does two euclidean integer divisions, which is going to be branchy and expensive. try to rewrite this in terms of normal divisions. you know that the divisor is positive, so all the div_euclid does is subtract 1 if y is negative. but since you're computing a lower bound, you can just subtract 1 unconditionally and call it a day.
having done that, you might be able to go from two divisions to one by distributing over the subtraction correctly.
There was a problem hiding this comment.
but since you're computing a lower bound, you can just subtract 1 unconditionally and call it a day.
That makes this O(n) again: having min_months be a lower bound that is always close to the actual months value is part of the goal. As long as min_months is guaranteed to be at most a constant number of months off from months, the month search routine is constant time.
There was a problem hiding this comment.
However, I think this math can be far simpler: we just need to compute y * 235 / 19 (235 months in a 19 year metonic cycle)
This math produces a good lower bound: it's 12.38 months per year. At the "edge" of the calculation this will always be lower than the actual number of leap years when rounded down: the hebrew calendar has at least one leap year in every 3 year span anyway.
There was a problem hiding this comment.
That makes this O(n) again
1 is a constant though? this is only called once, so subtracting one too much is one extra loop execution
There was a problem hiding this comment.
I actually started with y * 235 / 19 but then I confused myself by reading Reingold (7y + 1 mod 19 < 7 == is_leap_year). Let me try going back to that.
| // (`self`` + YMW) and the target date (`other`). | ||
| #[allow(clippy::expect_used)] // added() cannot fail: years/months/weeks validated above | ||
| let from = self | ||
| .added( |
There was a problem hiding this comment.
it might be worth inlining added and removing unnecessary code, such as all the error paths that you say can't happen
There was a problem hiding this comment.
Attempted in commit 659278f. Created a copy of added() without the bounds checking, and assuming Constrains behavior.
Pro: No expect() required, because written to be infallible.
Con: Duplication of the added() logic.
I couldn't think of a clean way to not have the code duplicated. I guess as long as the NonISODateAdd algorithm is stable it might not be an issue, but it is a code smell.
There was a problem hiding this comment.
The algorithm isn't stable yet.
| // Now that we have `years`, `months`, and `weeks`, we can compute | ||
| // `days` directly by subtracting RD values of the intermediate date | ||
| // (`self`` + YMW) and the target date (`other`). | ||
| #[allow(clippy::expect_used)] // added() cannot fail: years/months/weeks validated above |
There was a problem hiding this comment.
added() has quite a few error cases, I'm not immediately convinced that this unwrap is safe
Manishearth
left a comment
There was a problem hiding this comment.
We shouldn't be doing the RataDie calculation: that was an approach that was considered and rejected.
The month stuff is fine.
I'm surprised your benches take an hour to run. Not ethat you can filter to specific benches by writing a filter after the --, but I was able to run the whole suite in ~10min IIRC.
| candidate_days += sign; | ||
| } | ||
|
|
||
| // Now that we have `years`, `months`, and `weeks`, we can compute |
There was a problem hiding this comment.
issue: I'm not 100% convinced this works
added() is not intended to be the inverse of until; and verifying which cases this is the case for is quite tricky. In particular, Constrain/Reject behavior makes this hard to reason about.
We considered doing this early on and decided against it. As documented in #7682 the primary plan for optimizing this part of the code is to optimize surpasses(): note that at the moment this code won't be called more than 31 times.
We can also reduce that with a binary search, which I was originally going t oimplement, and @robertbastian has some code for.
There was a problem hiding this comment.
I agree -- for fields larger than Day, or if Overflow::Reject is used.
But for the Day field, everything "left" winds up in Day. Running a loop to find !surpasses(d) && surpasses(d+1), is functionally the same as doing a RD subtraction.
Stated another way, once we are at the bottom of until(), Year/Month/Week values are frozen. All we're doing is nailing down the Day field. This is now a linear space problem, and the RD subtraction is mathematically the same as doing the loop. added() is appropriate here not because it is the inverse of until(), but because it uses the same new_balanced and regulated day logic as surpasses().
Incidentally, I first tried to compute both Week and Day using RD subtraction. This failed spectacularly, breaking a bunch of tests. When I investigated why, I saw that RD subtraction would never work for Week/Day because of the Constrain logic for fields > Day. No such problem when dealing only with Day.
Note that all the date_arithmetic_snapshot.rs tests are passing, across all calendars. Invoked via cargo insta test -p icu_calendar --all-features --review.
There was a problem hiding this comment.
Incidentally, I first tried to compute both Week and Day using RD subtraction. This failed spectacularly, breaking a bunch of tests. When I investigated why, I saw that RD subtraction would never work for Week/Day because of the Constrain logic for fields > Day. No such problem when dealing only with Day.
FWIW I got this to work.
There was a problem hiding this comment.
Commit f3800d1: Reverted to follow the CalendarDateUntil spec in computing the Day field.
| // (7y+1)/19 is the number of leap years before year y. Compute the | ||
| // number of leap years in the start year, up to but not including the | ||
| // end year. | ||
| let count_leaps = |y: i64| (7 * y + 1).div_euclid(19); |
There was a problem hiding this comment.
but since you're computing a lower bound, you can just subtract 1 unconditionally and call it a day.
That makes this O(n) again: having min_months be a lower bound that is always close to the actual months value is part of the goal. As long as min_months is guaranteed to be at most a constant number of months off from months, the month search routine is constant time.
| // (7y+1)/19 is the number of leap years before year y. Compute the | ||
| // number of leap years in the start year, up to but not including the | ||
| // end year. | ||
| let count_leaps = |y: i64| (7 * y + 1).div_euclid(19); |
There was a problem hiding this comment.
However, I think this math can be far simpler: we just need to compute y * 235 / 19 (235 months in a 19 year metonic cycle)
This math produces a good lower bound: it's 12.38 months per year. At the "edge" of the calculation this will always be lower than the actual number of leap years when rounded down: the hebrew calendar has at least one leap year in every 3 year span anyway.
…m until(), removing unnecessary checks
| /// | ||
| /// For internal use by `until()` where the operation is known to succeed. | ||
| /// Follows `Overflow::Constrain` logic. | ||
| fn added_without_checks(&self, duration: DateDuration, cal: &C) -> UncheckedArithmeticDate<C> { |
There was a problem hiding this comment.
I still don't think we should do this.
I'm highly worried about diverging too far from the spec in terms of algorithm here. The arithmetic algorithms are tricky and matching the spec has helped find many issues both here and in the spec.
I left a comment here with next steps: #7077 (comment)
There was a problem hiding this comment.
Commit f3800d1 reverts until() to follow the CalendarDateUntil spec in computing the Day field.
… min-month calculations for Hebrew and Chinese calendars
c8e240e to
431b775
Compare
|
Commit 431b775 adds stateful The Disclaimer: I generated the following summary and markdown table with gemini CLI. I spot checked them; they seem accurate. Raw benchmark logs attached below. YearAll benchmarks showed a statistically significant improvement.
Summary of Results
MonthAll calendar types show significant performance improvements for the Benchmark Results: until/months/calendar (far_past)
Summary of Results
|
e84be6e to
431b775
Compare
Manishearth
left a comment
There was a problem hiding this comment.
@sffc, is this what you were envisioning for surpasses()? This is rather complicated, I was hoping there was a simpler way to cache the intermediates.
|
Actually, @poulsbo , sorry to make you run benchmarks again, but can you split out the month stuff and the surpasses stuff into two PRs? We can land the month stuff easily, but it will be useful to have separate numbers on the surpasses stuff after the month stuff lands. |
|
I have separate numbers on the surpasses stuff vs just the month stuff. As I recall, surpasses is where the big wins are, but let me post in the actual data. |
|
(My main goal is to separate out the PRs so we can land the month stuff) |
|
Makes sense, I'll separate the |
sffc
left a comment
There was a problem hiding this comment.
Good start. This is roughly the shape I had envisioned. I think we can make it cleaner and more closely follow the spec.
| fn min_months_from_inner(_start: Self::YearInfo, years: i64) -> i64 { | ||
| // A lunisolar leap month is inserted at least every 3 years. Reingold denies | ||
| // that the Chinese calendar determines leap years using the 19-year Metonic cycle. | ||
| 12 * years + (years / 3) |
There was a problem hiding this comment.
Thought: This is much better, but we could in principle be even tighter. The 19-year bound is too small, but there should be something between that and what you have here. It would be nice to leave open a low-priority issue to investigate and implement the theoretical minimal bound.
There was a problem hiding this comment.
Yeah, I was working on that math earlier but did not finish it.
| // There are 7 leap years in every 19-year Metonic cycle. | ||
| 235 * years / 19 |
There was a problem hiding this comment.
Suggestion: Explain why 235 * years / 19 is a lower bound. I understand that it is the average. and you are flooring the value. We end up with:
| # of years | min months |
|---|---|
| 1 | 12 |
| 2 | 24 |
| 3 | 37 (+1 leap month) |
| 4 | 49 |
| 5 | 61 |
| 6 | 74 (+1 leap month) |
| 7 | 86 |
| 8 | 98 |
| 9 | 111 (+1 leap month) |
| 10 | 123 |
| 11 | 136 (+1 leap month) |
| 12 | 148 |
| 13 | 160 |
| 14 | 173 (+1 leap month) |
| 15 | 185 |
| 16 | 197 |
| 17 | 210 (+1 leap month) |
| 18 | 222 |
| 19 | 235 (+1 leap month) |
Note: the Hebrew metonic cycle has leap years in 3, 6, 8, 11, 14, 17, and 19, i.e., leap year gaps of +3, +3, +2, +3, +3, +3, +2.
I think your formula is the theoretical minimum bound. I think, no matter where I start in the metonic cycle, the number of months I get from your formula is always less than or equal to the actual number of months.
For example, if I start in year 3 month M01 and add 5 years, I need to cross over 2 leap months, one in year 3 and one in year 6. Your formula agrees.
The "worst case" is I think month M06 in year 8 to year 18, a span that has only 3 leap months. That is a 10-year difference, and your formula correctly predicts that there are a minimum of 3 leap months during that span.
Please try to summarize why your bound is the theoretical minimum bound in a concise and compelling way.
There was a problem hiding this comment.
I do have the start of the hustification written in #7739 (comment)
There was a problem hiding this comment.
(I know this PR is closed/merged, but this seems like the right place for this comment.)
My original plan was to use the start year parameter to tell exactly where we were in the cycle, using the
| // 5. Let m0 be MonthCodeToOrdinal(calendar, y0, ! ConstrainMonthCode(calendar, y0, parts.[[MonthCode]], constrain)). | ||
| let m0 = cal | ||
| .ordinal_from_month(y0, base_month, constrain) | ||
| .unwrap_or(1); |
There was a problem hiding this comment.
Nit: Keep the debug assertion we had before if possible.
| // 3. Let y0 be parts.[[Year]] + years. | ||
| let y0 = cal.year_info_from_extended(parts.year().to_extended_year() + years as i32); | ||
| let base_month = cal.month_from_ordinal(parts.year(), parts.month()); | ||
| if Self::compare_surpasses_lexicographic(sign, y0, base_month, parts.day(), cal_date_2, cal) |
There was a problem hiding this comment.
Question: Why did you move compare_surpasses_lexicographic below the calculation of m0? You don't need m0 as input to that function. Please try to match the spec as closely as possible.
| let years = i64::from(duration.years) * sign_mul; | ||
| let months = i64::from(duration.months) * sign_mul; | ||
|
|
||
| let month_checker = self.surpasses_month_checker(other, years, sign, cal); |
There was a problem hiding this comment.
Suggestion: you can also create a years checker. It can run the "lexicographic" surpasses only.
| } | ||
|
|
||
| /// Prepares a stateful checker for week and day iteration in `surpasses()`. | ||
| fn surpasses_week_day_checker<'a>( |
There was a problem hiding this comment.
Issue: you shouldn't need this fn, because there should always be a call to surpasses_month_checker before you need the week/day checker.
| /// By saving intermediary computations based on a fixed year, | ||
| /// only the computations relating to the month are done. The week | ||
| /// and day are expected to be zero. | ||
| struct SurpassesMonthChecker<'a, C: DateFieldsResolver> { |
There was a problem hiding this comment.
Suggestion: I think you can fairly easily write this as a single SurpassesChecker, and it will be easier to reason about. For example, your call site should look more like:
let mut checker = self.surpasses_checker();
while !checker.surpasses_years(y) { y += 1 };
while !checker.surpasses_months(m) { m += 1 };
while !checker.surpasses_weeks(w) { w += 1 };
while !checker.surpasses_days(w, d) { d += 1 };
return (y, m, w, d)When surpasses_years returns false, nothing happens, but as soon as it returns true, the checker saves the intermediate results and prepares for a call to surpasses_months. Rinse and repeat for surpasses_months. You can add a debug assertion that surpasses_years and surpasses_months may never return true multiple times.
There was a problem hiding this comment.
Ah, that would be much cleaner.
For some reason when I was reading through surpasses(), I anchored early on the idea that I would need two separate checker objects, one to handle months, one to handle weeks/days, because the algorithm seemed to break down that way.
But it would be less noisy to have a single checker. Let me try to structure it that way. I'll address your other comments at the same time.
885f70e to
f3800d1
Compare
|
I pulled the stateful |
|
Followup #7746 |
robertbastian
left a comment
There was a problem hiding this comment.
So what are the performance numbers of the the month optimsiation that was actually merged here?
| // months corresponding to min_years. For solar calendars, this is 12 * min_years. | ||
| // For the Hebrew calendar, a leap month is added for 7 out of 19 years. East Asian | ||
| // Calendars do not provide a specialized implementation of `min_months_from()` | ||
| // because it would be too expensive to calculate; they default to 12 * min_years. |
There was a problem hiding this comment.
nit: don't really have to list out all the different implementations of min_months_from_inner, this will just become stale
| fn min_months_from_inner(_start: Self::YearInfo, years: i64) -> i64 { | ||
| // A lunisolar leap month is inserted at least every 3 years. Reingold denies | ||
| // that the Chinese calendar determines leap years using the 19-year Metonic cycle. | ||
| 12 * years + (years / 3) |
There was a problem hiding this comment.
there are a two issues with this:
EastAsianTraditionalis generic inR: Rules, and the rules can do whatever they want. We either need to add a requirement that the rules insert a leap month at least every three years, or delegate this logic to theRulesimplementation- Even for our concrete calendars, I'm not sure that this is correct. Both
ChinaandKoreause three different rules (1900-1912, 1912-2050/2100, simple approximation otherwise), so this needs to be shown to hold for all three (and for the transitions). For the hardcoded data we can write a test that this holds, but for the simple approximation I would like to see some kind of reasoning.
Improving docs from #7739 --------- Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
Improving docs from unicode-org#7739 --------- Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
…an (#7753) This is one way of resolving (some of?) @robertbastian's concerns from #7739 --------- Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
Progress on #7077
Improve the performnace of the calendar
until()function.Tighten the estimation of the number of months between two dates in order to reduce the number of incremental
surpasses()tests. This is done by introducing a new DateFieldsResolver function,min_months_from(), which the Coptic, Ethiopic, and Hebrew calendar implementations customize.Eliminate the
surpasses()loop for the final days delta calculation. Instead compute an intermediate date, which is the start date plus the years, months, and weeks computed so far. Subtract the Rata Die of the intermediate date from that of the target date to get the days field of the DateDuration.Fix minor typos in function names and comments.
Performance, as measured using criterion, is improved for Years and Months intervals. Improvements on the order of 20% to 80%, or 100s of ns to >1 ms, are shown, for base times of 500 ns to 5 ms.
For Weeks and Days, which are already computed efficiently using RD subtraction, criterion measures a slight slowdown (on the order of 1 ns, for a 15-20 ns operation). This is most likely due to a change in compiler inlining, optimization, register usage, etc., since the code for Days/Weeks has not changed.
Changelog: N/A
(covered by other PRs)