Skip to content

Conversation

@jycr
Copy link

@jycr jycr commented Jun 25, 2025

We can now parse RFC 9557 ISO date-time such as :

We can now parse ISO date-time such as :

2029-05-15T17:14:56.123456789-08:00[America/Los_Angeles]

or

2031-12-03T10:15:30.123456789+01:00:00[Europe/Paris]

or with "short-id" (like NST for Pacific/Auckland)

2025-06-26T12:01:48.211+12:00[NST]

@jycr jycr requested a review from a team as a code owner June 25, 2025 23:27
@cla-checker-service
Copy link

cla-checker-service bot commented Jun 25, 2025

💚 CLA has been signed

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Jun 25, 2025
@jycr jycr force-pushed the main branch 2 times, most recently from 61c74ba to 5f32531 Compare June 26, 2025 00:04
jycr added 2 commits June 28, 2025 13:27
…601Parser

We can now parse ISO date-time such as :

2029-05-15T17:14:56.123456789-08:00[America/Los_Angeles]

or

2031-12-03T10:15:30.123456789+01:00[Europe/Paris]

or with "short-id" (like `NST` for `Pacific/Auckland`)

2025-06-26T12:01:48.211+12:00[NST]
Such as: `2031-12-03T10:15:30.123456789Z[UTC]`
@PeteGillinElastic PeteGillinElastic added :Core/Infra/Core Core issues without another label and removed needs:triage Requires assignment of a team area label labels Jul 11, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jul 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

}
}
pos++; // read the + or -
if (str.charAt(pos) == '[' && str.charAt(len - 1) == ']') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could lead to some malformed strings being parsed, like 2022-07-08T00:14:07+[Europe/London]

zoneId = ZoneId.SHORT_IDS.getOrDefault(zoneId, zoneId);
return ZoneId.of(zoneId);
} catch (DateTimeException e) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about unknown suffixes, or suffixes representing features like e.g. calendars? (for example, 2022-07-08T00:14:07+01:00[knort=blargel]?)
I think that in that case we would want to ignore that, and keep the parsed offset. Here instead I think we end up ignoring both.

@ldematte ldematte self-assigned this Jul 12, 2025
@ldematte
Copy link
Contributor

ldematte commented Jul 12, 2025

Thank you for your interest in Elasticsearch.

We have taken a first look at your changes and at first sight this seems to be going in a different direction from what we intended with this parser (which is to be efficient, avoid expensive parsing when not needed, and still be able to handle most/all of the valid formats).

I'm not an expert in ISO date time, so I'll ask @thecoop to take a second look at this and confirm or dispute my opinion.

In case of inconsistencies between time-offset and Time Zone Information, this PR favours the Time Zone Information, even in cases where that is incorrect or not a time zone id at all.

The original code favours the time-offset when this is present, and limit the parsing of a zone id when this is not an issue (i.e. when there is no time-offset). I think this is the correct approach. According to the RFC:

Acting on the inconsistency may involve rejecting the timestamp or resolving the inconsistency via additional information, such as user input and/or programmed behavior.

Even when the time-offset and Time Zone Information are both present, and the Time Zone Information is marked as critical, according to the RFC

In case of an inconsistency between time-offset and time zone suffix, if the critical flag is used on the time zone suffix, an application MUST act on the inconsistency

We can still say "we always favour the time-offset": programmed behavior is a legal way of addressing inconsistencies, and I think that in most of our use cases we prefer to make a decision than rejecting the timestamp, as rejection may have ripple effects (e.g. break on previously ingested documents, or reject documents that were accepted before, which would be a breaking change)

@thecoop
Copy link
Member

thecoop commented Jul 14, 2025

Many thanks for your contribution. Unfortunately this is not a change we should do to Elasticsearch. The ISO8601Parser class is only used for built-in parsers, as defined in the DateFormatters class. These parsers have specific defined formats for the strings that are accepted by those parsers, and changing those formats can have dramatic and unintended consequences for anything using those parsers to parse or reject date-time strings.

Elasticsearch already supports parsing date-time strings using zone ids by defining a custom formatter using the V format specification. It will not be as performant as using the built-in formats, but if the performance is critical, I suggest pre-processing the data into a format acceptable to the predefined Elasticsearch formats before ingestion.

@thecoop thecoop closed this Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Core Core issues without another label external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Core/Infra Meta label for core/infra team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants