-
Notifications
You must be signed in to change notification settings - Fork 2
Support parsing all DateTimes that could be represented using a 64 bit integer with millisecond resolution #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I don't know whats up with the failing allocations tests -- they pass for me locally |
…t integer with millisecond resolution
…d remove unnecessary validation logic
nickrobinson251
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Couple questions/suggestions but i think this looks good -- thanks for the easy to follow tests
| (tz, pos, _, code) = Parsers.tryparsenext(Dates.DatePart{'Z'}(3, false), buf, pos, len, b, code) | ||
| return tz, pos, code | ||
| (tz, pos, _, code_tz) = Parsers.tryparsenext(Dates.DatePart{'Z'}(3, false), buf, pos, len, b, code) | ||
| return tz, pos, Parsers.invalid(code_tz) ? code : code_tz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, don't think i fully understood the changes in this function -- which tests show this new behaviour? is it
res = Parsers.xparse(ChunkedCSV.GuessDateTime, string(ChunkedCSV.MIN_DATETIME, "Z"))
@test res.val == ChunkedCSV.MIN_DATETIME
@test Parsers.ok(res.code)
should we / do we have a test with an actual invalid timezone to check we still end up with the invalid code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is tricky to explain, because I'm trying to fulfill the unwritten contract between Parsers.typeparser, which we're implementing here, and the other layers in Parsers.jl. Basically, the type parser is must consume all valid bytes and stop once it encounters a byte that doesn't "belong" to the type it's parsing. Sometimes that byte is a delimiter or a whitespace in which case you shouldn't mark it as INVALID and let the other layers handle it, but sometimes certain structure is dictated by the parsing grammar, like if you have a float with a scientific notation exponent, then a number or sign must follow the e, otherwise this is an invalid float. Except, you are not supposed to rely on knowing what the delimiter or quote is, since these should only be handled by the other layers in Parsers.jl. And since we must consume all bytes that belong to the type, we must attempt to parse the timezone string, just in case it is there, and in case it is not there, we must handle it gracefully. So this is what I'm doing here. I need to handle both the case when there is a timezone, e.g. when we're here:
2024-01-01 00:00:00.000Z,
^
and the other case
2024-01-01 00:00:00.000,
^
Since the timezone is optional. If the timezone is just wrong:
2024-01-01 00:00:00.000!,
^
We'll say -- well, this byte doesn't seem to belong to us, so we won't skip past it (and let the other layers in Parsers.jl handle it and mark it as invalid) but we also don't say this value is invalid. The other layers in Parsers.jl don't understand what a valid timezone is or isn't, but they understand there must be no more non-space characters after we consumed all the bytes for the type, so it will mark the parsing as invalid because of that.
So this code path is exercised in all test cases that have a time component (i.e. not just the date part), since in all of those, we'll reach this function. I've explicitly added the test set "parsing in context" to make sure we always return the expected code in these cases.
Co-authored-by: Nick Robinson <[email protected]>
-292277024-05-15T16:47:04.192to292277025-08-17T07:12:55.807[-2147483648-01-01T00:00:00.000, -292277024-05-15T16:47:04.193]will be clamped to the minimal representable DateTime,-292277024-05-15T16:47:04.192, and all valid timestamps with in the range[292277025-08-17T07:12:55.808, 2147483647-12-31T23:59:59.999]will be clamped to the maximal representable DateTime,
292277025-08-17T07:12:55.807.