-
Notifications
You must be signed in to change notification settings - Fork 225
Use simple approximation for LunarChinese #7006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. |
4d87b46
to
00dabee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial feedback. I want to do a more thorough review before this lands
if !(1900..2100).contains(&case.iso_year) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't pass outside the hardcoded range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then update the tests to either be in-range or use the new dates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait until I have an approval before I delete test cases that I might have to restore later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed these tests because consistency with ICU is no longer a goal, and we have tests against the actual ground truth
e8fcb02
to
77cf05e
Compare
Dangi data: https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e. Please ignore the data for the year 1899 and 2050, it is incomplete. |
I'm going to figure out a home for the scraper and scraped data, but for now we can just cite KASI. |
The code and data used for fetching this will be pushed up to a separate (private) Unicode repo once we have one. You can find the cleaned up source data in https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e. I'm imagining that post-1950 data will change or be removed with #7006 The initial motivation here was to fix the apparent ground truth mismatch found in https://github.com/unicode-org/icu4x/pull/7007/files#r2393049682. Turns out it was a different problem, and it has been fixed in #7013. We may potentially need the same discussion as #6970 about whether we care about these pre-1912 dates, since that's the only time this diverges.
077ed47
to
acf1c4a
Compare
LunarChineseYearData::simple( | ||
// Future reference time is probably UTC+9 | ||
day_fraction_to_ms!(9 / 24), | ||
// This is required for continuity with the hardcoded data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these years start on the same day with both methods:
2092, 2093, 2094, 2095, 2096, 2097, 2098, 2103, 2104, 2106, 2107, 2108, 2109, 2110
crucially, not 2101, which is why we need a correction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like adjusting the new moon because it skews not only this year but all future years. I want us to at least try to approximate GB/T, with an average error near zero, even if local errors are several hours one way or the other. Can we just project Reingold two more years before cutting over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, that's why I posted this list
we're not going to get an average error of zero, because the mean lunar cycle varies considerable over centuries. this method does not really align with gbt, even without lunar correction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should pick a new moon that is close to the mean of the synodic cycle. If the moon was at the outer extreme of the ellipse on the date you picked in January 2000, then we'll be carrying that error forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: We should run Reingold's new moon code for several hundred years and pick the new moon instant that minimizes the error. This should be an easy simulation to write, and it makes the approximation less arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't consider minimising an error to be a goal here, at least not in this PR
if !(1900..2100).contains(&case.iso_year) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
new_moon_correction: Milliseconds, | ||
related_iso: i32, | ||
) -> LunarChineseYearData { | ||
fn periodic_duration_on_or_before( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: docs
base_moment: LocalMoment, | ||
duration: Milliseconds, | ||
) -> LocalMoment { | ||
let num_periods = ((rata_die - base_moment.rata_die + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what's going on with this +1 / -1 ?
For functions like this either the math should be explained so that it's clear what it does, or the function should be documented so I can verify the math, doing neither leaves me guessing. Ideally both are documented.
My understanding is: this function is attempting to find the closest whole period from base_moment
of period duration
looking back from rata_die
. The +/- 1 handles the edge cases, but I'd like a comment explanation explaining how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially wrote this formula before @robertbastian refactored it. The +1 / -1 is to get us to the final millisecond of the day. We're given a RD, but the function returns "on or before", so we add 1 to the RD and then subtract 1 from the millisecond. We do the -1 because if the new moon occurs exactly at midnight, it belongs to the following day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk, I can remove this, we don't have a requirement of millisecond precision here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the +1 / -1. @Manishearth was just asking why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it wouldn't be the end of the world if we didn't move the new moon at the millisecond of exactly midnight to next day. lunar times are +-minutes from reality anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either works, if we're being "inaccurate" there I'd still prefer a comment saying where we diverge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diverge from what? this whole algorithm is hand waving
pub(crate) fn md_from_rd(self, rd: RataDie) -> (u8, u8) { | ||
debug_assert!( | ||
rd < self.next_new_year() || !WELL_BEHAVED_ASTRONOMICAL_RANGE.contains(&rd), | ||
rd < self.next_new_year(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: I'm worried about hitting these again in V8: can we write a fuzzer for these?
Both RD-to-chinese and chinese-from-fields.
It's pretty easy, use cargo-fuzz
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we have a test at the extreme values, right? I don't see what you'd be worried about, there's not floating point math that can degenerate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR I'd slightly prefer the well-behaved guard be kept around, and removing them is a separate change.
But if you feel confident about the assertions I guess it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't using any logic from calendrical_calculations
anymore, so that crate should not define how the assertions here behave
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general our calendar code has been full of broken invariants, and I'm not convinced this new code doesn't have that problem. The astronomical range isn't just a calendrical_calculations
concept, it's an ICU4X one too, defined in calendrical_calculations
because both need them. calendrical_calculations
already defines two such constants, one for islamic, and one for chinese. ICU4X could internally define its own for Pingqi.
So I don't think "calendrical_calculations
is no longer used so we no longer need to gate the assertions" is a valid reason. The question is whether we should be applying a valid astronomical range (well, a valid "approximation calculation" range) to invariants here at all. I'm okay with the answer being no, but I'm wary about these debug assertions, I already spent a significant amount of time playing whack-a-mole with the earlier ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be useful to be able to reproduce the issues V8 is having in our repo, so that we can eventually confidently remove these. right now it's unclear if they are needed or not
// 2000-01-06T18:14 https://aa.usno.navy.mil/calculated/moon/phases?date=2000-01-01&nump=1&format=t | ||
const UTC_NEW_MOON: LocalMoment = LocalMoment { | ||
rata_die: calendrical_calculations::gregorian::fixed_from_gregorian(2000, 1, 6), | ||
local_milliseconds: ((18 * 60) + 14) * 60 * 1000, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: pick a new moon that is as close to the mean new moon as possible, i.e., one where the forces that cause the synodic month length to change are in equilibrium. Here's a good post on the subject: https://astronomy.stackexchange.com/questions/55048/what-causes-the-variation-in-the-length-of-the-synodic-month-besides-the-eccent
(does not block the PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the mean synodic month length we use is for the year 2000, which I why I'm using this new moon
/// | ||
/// Stays anchored in the Gregorian calendar, even as the Gregorian calendar drifts | ||
/// from the seasons in the distant future and distant past. | ||
fn simple(utc_offset: Milliseconds, related_iso: i32) -> LunarChineseYearData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion (optional): Make a version of this function where the new moon and solar term calculations are pluggable, and then assert that when you plug in the Reingold ones, they match HKO. This would increase our confidence that this approximation is a principled approximation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my remaining comments are stylistic or suitable for a follow-up
#5778
Replaces #6995