Support parsing all DateTimes that could be represented using a 64 bit integer with millisecond resolution #18

Drvi · 2025-03-01T09:35:41Z

Support parsing all DateTimes that could be represented using a 64 bit integer with millisecond resolution, i.e. all values between -292277024-05-15T16:47:04.192 to 292277025-08-17T07:12:55.807
- Additionally, all valid timestamps with in the range [-2147483648-01-01T00:00:00.000, -292277024-05-15T16:47:04.193] will be clamped to the minimal representable DateTime, -292277024-05-15T16:47:04.192, and all valid timestamps with in the range [292277025-08-17T07:12:55.808, 2147483647-12-31T23:59:59.999]
  will be clamped to the maximal representable DateTime, 292277025-08-17T07:12:55.807.
Bump the cache github action since v2 is not unsupported anymore
Bump Julia compat to 1.10 since it's the new LTS and we got reinterpret-related errors in earlier versions.
I have refactored the code to avoid unnecessary validation, which helped with the failing allocation tests on 1.11 at least and I've added a bunch of tests.

Drvi · 2025-03-01T13:41:41Z

I don't know whats up with the failing allocations tests -- they pass for me locally
EDIT: refactored the code speculatively, which helped on 1.11
EDIT2: Switching to @ballocated fixed the remaining alloc test failures on Julia nightly

…t integer with millisecond resolution

…d remove unnecessary validation logic

nickrobinson251

LGTM

Couple questions/suggestions but i think this looks good -- thanks for the easy to follow tests

src/type_parsers/datetime_parser.jl

nickrobinson251 · 2025-03-10T10:26:59Z

src/type_parsers/datetime_parser.jl

-    (tz, pos, _, code) = Parsers.tryparsenext(Dates.DatePart{'Z'}(3, false), buf, pos, len, b, code)
-    return tz, pos, code
+    (tz, pos, _, code_tz) = Parsers.tryparsenext(Dates.DatePart{'Z'}(3, false), buf, pos, len, b, code)
+    return tz, pos, Parsers.invalid(code_tz) ? code : code_tz


sorry, don't think i fully understood the changes in this function -- which tests show this new behaviour? is it

res = Parsers.xparse(ChunkedCSV.GuessDateTime, string(ChunkedCSV.MIN_DATETIME, "Z")) @test res.val == ChunkedCSV.MIN_DATETIME @test Parsers.ok(res.code)

should we / do we have a test with an actual invalid timezone to check we still end up with the invalid code?

Ok, this is tricky to explain, because I'm trying to fulfill the unwritten contract between Parsers.typeparser, which we're implementing here, and the other layers in Parsers.jl. Basically, the type parser is must consume all valid bytes and stop once it encounters a byte that doesn't "belong" to the type it's parsing. Sometimes that byte is a delimiter or a whitespace in which case you shouldn't mark it as INVALID and let the other layers handle it, but sometimes certain structure is dictated by the parsing grammar, like if you have a float with a scientific notation exponent, then a number or sign must follow the e, otherwise this is an invalid float. Except, you are not supposed to rely on knowing what the delimiter or quote is, since these should only be handled by the other layers in Parsers.jl. And since we must consume all bytes that belong to the type, we must attempt to parse the timezone string, just in case it is there, and in case it is not there, we must handle it gracefully. So this is what I'm doing here. I need to handle both the case when there is a timezone, e.g. when we're here:

2024-01-01 00:00:00.000Z, ^

and the other case

2024-01-01 00:00:00.000, ^

Since the timezone is optional. If the timezone is just wrong:

2024-01-01 00:00:00.000!, ^

We'll say -- well, this byte doesn't seem to belong to us, so we won't skip past it (and let the other layers in Parsers.jl handle it and mark it as invalid) but we also don't say this value is invalid. The other layers in Parsers.jl don't understand what a valid timezone is or isn't, but they understand there must be no more non-space characters after we consumed all the bytes for the type, so it will mark the parsing as invalid because of that.

So this code path is exercised in all test cases that have a time component (i.e. not just the date part), since in all of those, we'll reach this function. I've explicitly added the test set "parsing in context" to make sure we always return the expected code in these cases.

Co-authored-by: Nick Robinson <[email protected]>

Drvi marked this pull request as ready for review March 5, 2025 12:51

Drvi requested review from NHDaly and nickrobinson251 March 6, 2025 15:53

Drvi added 9 commits March 10, 2025 10:47

Support parsing all DateTimes that could be represented using a 64 bi…

8b288d8

…t integer with millisecond resolution

experiment

0813342

actions/cache@v2 -> actions/cache@v4

9a37d3f

.

345c94c

.

df8ba4e

Support very short dates

4e45aba

Clamp timestamps with 32 bit year

165c592

Refactor datetime parsing to better align with other Parser layers an…

7655fed

…d remove unnecessary validation logic

Use BenchmarkTools to test allocations

9f0f3c6

Drvi force-pushed the td-negative-dates branch from 219dc6a to 9f0f3c6 Compare March 10, 2025 09:48

Drvi mentioned this pull request Mar 10, 2025

Distinguish between missing and errored values in TaskResultBuffer #19

Merged

nickrobinson251 approved these changes Mar 10, 2025

View reviewed changes

Drvi and others added 2 commits March 10, 2025 13:22

Apply suggestions from code review

4ad81ef

Co-authored-by: Nick Robinson <[email protected]>

More comments and tweaks

608eed5

Drvi merged commit fa00066 into main Mar 10, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support parsing all DateTimes that could be represented using a 64 bit integer with millisecond resolution #18

Support parsing all DateTimes that could be represented using a 64 bit integer with millisecond resolution #18

Uh oh!

Drvi commented Mar 1, 2025 •

edited

Loading

Uh oh!

Drvi commented Mar 1, 2025 •

edited

Loading

Uh oh!

nickrobinson251 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nickrobinson251 Mar 10, 2025

Uh oh!

Drvi Mar 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support parsing all DateTimes that could be represented using a 64 bit integer with millisecond resolution #18

Support parsing all DateTimes that could be represented using a 64 bit integer with millisecond resolution #18

Uh oh!

Conversation

Drvi commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Drvi commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickrobinson251 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nickrobinson251 Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

Drvi Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Drvi commented Mar 1, 2025 •

edited

Loading

Drvi commented Mar 1, 2025 •

edited

Loading

Drvi Mar 10, 2025 •

edited

Loading