-
Notifications
You must be signed in to change notification settings - Fork 25.6k
RFC 9557 : Add support for parsing ISO date time with zone-id #130054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💚 CLA has been signed |
61c74ba to
5f32531
Compare
…601Parser We can now parse ISO date-time such as : 2029-05-15T17:14:56.123456789-08:00[America/Los_Angeles] or 2031-12-03T10:15:30.123456789+01:00[Europe/Paris] or with "short-id" (like `NST` for `Pacific/Auckland`) 2025-06-26T12:01:48.211+12:00[NST]
Such as: `2031-12-03T10:15:30.123456789Z[UTC]`
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
| } | ||
| } | ||
| pos++; // read the + or - | ||
| if (str.charAt(pos) == '[' && str.charAt(len - 1) == ']') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could lead to some malformed strings being parsed, like 2022-07-08T00:14:07+[Europe/London]
| zoneId = ZoneId.SHORT_IDS.getOrDefault(zoneId, zoneId); | ||
| return ZoneId.of(zoneId); | ||
| } catch (DateTimeException e) { | ||
| return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about unknown suffixes, or suffixes representing features like e.g. calendars? (for example, 2022-07-08T00:14:07+01:00[knort=blargel]?)
I think that in that case we would want to ignore that, and keep the parsed offset. Here instead I think we end up ignoring both.
|
Thank you for your interest in Elasticsearch. We have taken a first look at your changes and at first sight this seems to be going in a different direction from what we intended with this parser (which is to be efficient, avoid expensive parsing when not needed, and still be able to handle most/all of the valid formats). I'm not an expert in ISO date time, so I'll ask @thecoop to take a second look at this and confirm or dispute my opinion. In case of inconsistencies between time-offset and Time Zone Information, this PR favours the Time Zone Information, even in cases where that is incorrect or not a time zone id at all. The original code favours the time-offset when this is present, and limit the parsing of a zone id when this is not an issue (i.e. when there is no time-offset). I think this is the correct approach. According to the RFC:
Even when the time-offset and Time Zone Information are both present, and the Time Zone Information is marked as critical, according to the RFC
We can still say "we always favour the time-offset": programmed behavior is a legal way of addressing inconsistencies, and I think that in most of our use cases we prefer to make a decision than rejecting the timestamp, as rejection may have ripple effects (e.g. break on previously ingested documents, or reject documents that were accepted before, which would be a breaking change) |
|
Many thanks for your contribution. Unfortunately this is not a change we should do to Elasticsearch. The Elasticsearch already supports parsing date-time strings using zone ids by defining a custom formatter using the |
We can now parse RFC 9557 ISO date-time such as :
We can now parse ISO date-time such as :
2029-05-15T17:14:56.123456789-08:00[America/Los_Angeles]
or
2031-12-03T10:15:30.123456789+01:00:00[Europe/Paris]
or with "short-id" (like
NSTforPacific/Auckland)2025-06-26T12:01:48.211+12:00[NST]