[Variant][Shredding] Support typed_access for timestamp_micro/timestamp_nano#8401
Conversation
984fbcb to
0748c0e
Compare
e003775 to
aa9f8a9
Compare
scovich
left a comment
There was a problem hiding this comment.
LGTM, other than a panic that needs to be removed.
Your call whether to deal with timestamp precision widening in this PR or as a follow-on effort (maybe it depends on whether there's any controversy about the approach)
| ) | ||
| } | ||
| // Variant timestamp only support time unit with microsecond or nanosecond precision | ||
| _ => panic!( |
There was a problem hiding this comment.
Can we return an error instead of panicking?
Also, AFAIK all the other timestamp precisions can be widened losslessly to microsecond; should we consider doing that instead of blowing up?
There was a problem hiding this comment.
yes, let's not panic here
ANother potential approach is to return a not yet implemented error -- maybe by matching the supported types explicitly, so like
DataType::Timestamp(TimeUnit::Microsecond, Some(_)) => {
generic_conversion_single_value!(
TimestampMicrosecondType,
as_primitive,
|v| DateTime::from_timestamp_micros(v).unwrap(),
typed_value,
index
)
}instead of
DataType::Timestamp(timeunit, tz) => {
match (timeunit, tz) {
(TimeUnit::Microsecond, Some(_)) => {
generic_conversion_single_value!(
TimestampMicrosecondType,
as_primitive,
|v| DateTime::from_timestamp_micros(v).unwrap(),
typed_value,
index
)
}There was a problem hiding this comment.
I'm always a fan of less nesting, when possible...
There was a problem hiding this comment.
Thanks for pointing this out. Yes, TimeUnit::Second and TimeUnit::Millisecond can convert to TimeUnit::MicroSecond losslessly. Using panic here because this value is located in typed_value of the Variant, and from the spec of Variant, there are only Micro and Nano units for Timestamp(with/without timezone), I assumed that this is an error(this could not happen?), I can add the second and Millisecond units if this could happen.
There was a problem hiding this comment.
Using
panichere because this value is located intyped_valueof theVariant, and from the spec ofVariant, there are onlyMicroandNanounits forTimestamp(with/without timezone), I assumed that this is an error(this could not happen?)
Good point! There are definitely unit tests (parquet/tests/variant_integration.rs) that expect non-variant types in typed_value columns to produce an error (cases 127 and 137, specifically). So we should not widen on the read path, but rather reject.
Widening would seem to make sense on the shredding write path, tho?
There was a problem hiding this comment.
See also #8435, which I noticed while trying to add struct support
There was a problem hiding this comment.
IIUC, we need to widen to TimeUnit::MicroSecond when writing to a variant, as TimeUnit::MicroSecond is the minimum precision we support
There was a problem hiding this comment.
Exactly. We have flexibility on the write path to decide whether/how we widen or convert values that don't exactly fit the variant type system, but readers expect the resulting types to always be valid variant shredding types.
| ) | ||
| } | ||
| // Variant timestamp only support time unit with microsecond or nanosecond precision | ||
| _ => panic!( |
There was a problem hiding this comment.
yes, let's not panic here
ANother potential approach is to return a not yet implemented error -- maybe by matching the supported types explicitly, so like
DataType::Timestamp(TimeUnit::Microsecond, Some(_)) => {
generic_conversion_single_value!(
TimestampMicrosecondType,
as_primitive,
|v| DateTime::from_timestamp_micros(v).unwrap(),
typed_value,
index
)
}instead of
DataType::Timestamp(timeunit, tz) => {
match (timeunit, tz) {
(TimeUnit::Microsecond, Some(_)) => {
generic_conversion_single_value!(
TimestampMicrosecondType,
as_primitive,
|v| DateTime::from_timestamp_micros(v).unwrap(),
typed_value,
index
)
}| @@ -119,23 +106,10 @@ variant_test_case!(30); | |||
| variant_test_case!(31); | |||
| // https://github.com/apache/arrow-rs/issues/8334 | |||
| variant_test_case!(32, "Unsupported typed_value type: Time64(Microsecond)"); | |||
| // https://github.com/apache/arrow-rs/issues/8331 | |||
|
|
958e8b0 to
3019a2e
Compare
| ]) | ||
| }); | ||
|
|
||
| partially_shredded_variant_array_gen!( |
There was a problem hiding this comment.
thank you for reducing the boilerplate
|
I merged up from main to resolve a conflict |
|
@klion26 -- this PR seems to have accumulated conflicts -- I apologize for that. Can you please resolve the conflicts so I can merge it in? |
69919eb to
bb4c3eb
Compare
|
@alamb I've rebased on main and resolved the conflicts. Please take another look, thanks. |
Which issue does this PR close?
Timestamp(Microsecond, _)andTimestamp(Nanosecond, _)#8331 .Rationale for this change
Timestamp(Micro, _)andTimestamp(Nano, -)What changes are included in this PR?
Arrayimpl forVariantArrayandShreddedVariantFieldArray#8392), rebase the master in the last commitTimestamp(Micro, _)andTimestamp(Nano, _)Timestamp(Micro, _)andTimestamp(Nano, _)Are these changes tested?
Covered by existing and added tests
Are there any user-facing changes?
No