Skip to content

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Sep 30, 2025

#5778

Replaces #6995

@robertbastian robertbastian requested review from a team, Manishearth and sffc as code owners September 30, 2025 19:42
Copy link

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

@robertbastian robertbastian force-pushed the pinqi2 branch 3 times, most recently from 4d87b46 to 00dabee Compare September 30, 2025 20:00
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial feedback. I want to do a more thorough review before this lands

Comment on lines 1808 to 1835
if !(1900..2100).contains(&case.iso_year) {
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't pass outside the hardcoded range

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then update the tests to either be in-range or use the new dates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait until I have an approval before I delete test cases that I might have to restore later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed these tests because consistency with ICU is no longer a goal, and we have tests against the actual ground truth

@Manishearth
Copy link
Member

Dangi data: https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e. Please ignore the data for the year 1899 and 2050, it is incomplete.

@Manishearth
Copy link
Member

I'm going to figure out a home for the scraper and scraped data, but for now we can just cite KASI.

robertbastian pushed a commit that referenced this pull request Oct 1, 2025
The code and data used for fetching this will be pushed up to a separate
(private) Unicode repo once we have one. You can find the cleaned up
source data in
https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e.

I'm imagining that post-1950 data will change or be removed with
#7006



The initial motivation here was to fix the apparent ground truth
mismatch found in
https://github.com/unicode-org/icu4x/pull/7007/files#r2393049682. Turns
out it was a different problem, and it has been fixed in
#7013.

We may potentially need the same discussion as #6970 about whether we
care about these pre-1912 dates, since that's the only time this
diverges.
@robertbastian robertbastian marked this pull request as draft October 1, 2025 10:32
@robertbastian robertbastian force-pushed the pinqi2 branch 7 times, most recently from 077ed47 to acf1c4a Compare October 1, 2025 16:52
@robertbastian robertbastian marked this pull request as ready for review October 1, 2025 17:03
LunarChineseYearData::simple(
// Future reference time is probably UTC+9
day_fraction_to_ms!(9 / 24),
// This is required for continuity with the hardcoded data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these years start on the same day with both methods:

2092, 2093, 2094, 2095, 2096, 2097, 2098, 2103, 2104, 2106, 2107, 2108, 2109, 2110

crucially, not 2101, which is why we need a correction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like adjusting the new moon because it skews not only this year but all future years. I want us to at least try to approximate GB/T, with an average error near zero, even if local errors are several hours one way or the other. Can we just project Reingold two more years before cutting over?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, that's why I posted this list

we're not going to get an average error of zero, because the mean lunar cycle varies considerable over centuries. this method does not really align with gbt, even without lunar correction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pick a new moon that is close to the mean of the synodic cycle. If the moon was at the outer extreme of the ellipse on the date you picked in January 2000, then we'll be carrying that error forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: We should run Reingold's new moon code for several hundred years and pick the new moon instant that minimizes the error. This should be an easy simulation to write, and it makes the approximation less arbitrary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't consider minimising an error to be a goal here, at least not in this PR

Comment on lines 1808 to 1835
if !(1900..2100).contains(&case.iso_year) {
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

new_moon_correction: Milliseconds,
related_iso: i32,
) -> LunarChineseYearData {
fn periodic_duration_on_or_before(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: docs

base_moment: LocalMoment,
duration: Milliseconds,
) -> LocalMoment {
let num_periods = ((rata_die - base_moment.rata_die + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what's going on with this +1 / -1 ?

For functions like this either the math should be explained so that it's clear what it does, or the function should be documented so I can verify the math, doing neither leaves me guessing. Ideally both are documented.

My understanding is: this function is attempting to find the closest whole period from base_moment of period duration looking back from rata_die. The +/- 1 handles the edge cases, but I'd like a comment explanation explaining how.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially wrote this formula before @robertbastian refactored it. The +1 / -1 is to get us to the final millisecond of the day. We're given a RD, but the function returns "on or before", so we add 1 to the RD and then subtract 1 from the millisecond. We do the -1 because if the new moon occurs exactly at midnight, it belongs to the following day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk, I can remove this, we don't have a requirement of millisecond precision here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the +1 / -1. @Manishearth was just asking why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it wouldn't be the end of the world if we didn't move the new moon at the millisecond of exactly midnight to next day. lunar times are +-minutes from reality anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either works, if we're being "inaccurate" there I'd still prefer a comment saying where we diverge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diverge from what? this whole algorithm is hand waving

pub(crate) fn md_from_rd(self, rd: RataDie) -> (u8, u8) {
debug_assert!(
rd < self.next_new_year() || !WELL_BEHAVED_ASTRONOMICAL_RANGE.contains(&rd),
rd < self.next_new_year(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I'm worried about hitting these again in V8: can we write a fuzzer for these?

Both RD-to-chinese and chinese-from-fields.

It's pretty easy, use cargo-fuzz.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we have a test at the extreme values, right? I don't see what you'd be worried about, there's not floating point math that can degenerate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR I'd slightly prefer the well-behaved guard be kept around, and removing them is a separate change.

But if you feel confident about the assertions I guess it's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't using any logic from calendrical_calculations anymore, so that crate should not define how the assertions here behave

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general our calendar code has been full of broken invariants, and I'm not convinced this new code doesn't have that problem. The astronomical range isn't just a calendrical_calculations concept, it's an ICU4X one too, defined in calendrical_calculations because both need them. calendrical_calculations already defines two such constants, one for islamic, and one for chinese. ICU4X could internally define its own for Pingqi.

So I don't think "calendrical_calculations is no longer used so we no longer need to gate the assertions" is a valid reason. The question is whether we should be applying a valid astronomical range (well, a valid "approximation calculation" range) to invariants here at all. I'm okay with the answer being no, but I'm wary about these debug assertions, I already spent a significant amount of time playing whack-a-mole with the earlier ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be useful to be able to reproduce the issues V8 is having in our repo, so that we can eventually confidently remove these. right now it's unclear if they are needed or not

@robertbastian robertbastian requested a review from sffc October 6, 2025 14:25
Comment on lines 1099 to 1103
// 2000-01-06T18:14 https://aa.usno.navy.mil/calculated/moon/phases?date=2000-01-01&nump=1&format=t
const UTC_NEW_MOON: LocalMoment = LocalMoment {
rata_die: calendrical_calculations::gregorian::fixed_from_gregorian(2000, 1, 6),
local_milliseconds: ((18 * 60) + 14) * 60 * 1000,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: pick a new moon that is as close to the mean new moon as possible, i.e., one where the forces that cause the synodic month length to change are in equilibrium. Here's a good post on the subject: https://astronomy.stackexchange.com/questions/55048/what-causes-the-variation-in-the-length-of-the-synodic-month-besides-the-eccent

(does not block the PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mean synodic month length we use is for the year 2000, which I why I'm using this new moon

///
/// Stays anchored in the Gregorian calendar, even as the Gregorian calendar drifts
/// from the seasons in the distant future and distant past.
fn simple(utc_offset: Milliseconds, related_iso: i32) -> LunarChineseYearData {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (optional): Make a version of this function where the new moon and solar term calculations are pluggable, and then assert that when you plug in the Reingold ones, they match HKO. This would increase our confidence that this approximation is a principled approximation.

@robertbastian robertbastian requested a review from sffc October 8, 2025 11:17
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my remaining comments are stylistic or suitable for a follow-up

@robertbastian robertbastian merged commit b31494d into unicode-org:main Oct 8, 2025
31 checks passed
@robertbastian robertbastian deleted the pinqi2 branch October 8, 2025 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants