Skip to content

Commit 3969680

Browse files
authored
fix: Arrow timestamps are in an _unknown_ timezone (#3504)
The current logic assumes a timezone-naive vortex.timestamp has the UTC timzeone. However, Arrow [specifically says it is in an unknown timezone](https://github.com/apache/arrow/blob/main/format/Schema.fbs#L304-L322): > Timestamps with an unset / empty timezone > ----------------------------------------- > > If a Timestamp column has no timezone value, its epoch is > 1970-01-01 00:00:00 (January 1st 1970, midnight) in an *unknown* timezone. > > Therefore, timestamp values without a timezone cannot be meaningfully > interpreted as physical points in time, but only as calendar / clock > indications ("wall clock time") in an unspecified timezone. > > For example, the timestamp value 0 with an empty timezone string > corresponds to "January 1st 1970, 00h00" in an unknown timezone: there > is not enough information to interpret it as a well-defined physical > point in time. > > One consequence is that timestamp values without a timezone cannot > be reliably compared or ordered, since they may have different points of > reference. In particular, it is *not* possible to interpret an unset > or empty timezone as the same as "UTC". This bug most prominently appears in the Vortex cli browser which shows the incorrect timezone for the min/max statistics of a timestamp column: ``` ╭─────────────────────────────────────────────────────────────────── │ File Layout │ Segments │╭Layout Info─────────────────────────────────────────────────────── ││Kind: vortex.flat ││Row Count: 1 ││Schema: {max=ext(vortex.timestamp, i64, ExtMetadata([1, 0, 0]))?, ││Children: 0 ││Segment data size: 1.32 kB ││FlatBuffer Size: 1.32 kB ││ ││ │╰────────────────────────────────────────────────────────────────── │╭Array Info──────────────────────────────────────────────────────── ││chunk max max_is ││0 2025-06-11T02:24:37.054366Z false ││ ││ ││ │╰────────────────────────────────────────────────────────────────── ╰─────────────────────────────────────────────────────────────────── ``` After this change, the year, month, day, hour, etc. are unchanged, but the timezone is correctly reported as unknown. ``` ╭─────────────────────────────────────────────────────────────────── │ File Layout │ Segments │╭Layout Info─────────────────────────────────────────────────────── ││Kind: vortex.flat ││Row Count: 1 ││Schema: {max=ext(vortex.timestamp, i64, ExtMetadata([1, 0, 0]))?, ││Children: 0 ││Segment data size: 1.32 kB ││FlatBuffer Size: 1.32 kB ││ ││ │╰────────────────────────────────────────────────────────────────── │╭Array Info──────────────────────────────────────────────────────── ││chunk max max_is ││0 2025-06-11T02:24:37.054366 false ``` Signed-off-by: Daniel King <[email protected]>
1 parent 5931d55 commit 3969680

File tree

2 files changed

+7
-7
lines changed

2 files changed

+7
-7
lines changed

vortex-dtype/src/datetime/temporal.rs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
use std::fmt::Display;
22
use std::sync::{Arc, LazyLock};
33

4-
use jiff::civil::{Date, Time};
4+
use jiff::civil::{Date, DateTime, Time};
55
use jiff::{Timestamp, Zoned};
66
use vortex_error::{VortexError, VortexResult, vortex_bail, vortex_err, vortex_panic};
77

@@ -39,8 +39,8 @@ pub enum TemporalJiff {
3939
Time(Time),
4040
/// A date value.
4141
Date(Date),
42-
/// A timestamp value.
43-
Timestamp(Timestamp),
42+
/// A zone-naive timestamp value.
43+
Unzoned(DateTime),
4444
/// A zoned timestamp value.
4545
Zoned(Zoned),
4646
}
@@ -50,7 +50,7 @@ impl Display for TemporalJiff {
5050
match self {
5151
TemporalJiff::Time(t) => write!(f, "{t}"),
5252
TemporalJiff::Date(d) => write!(f, "{d}"),
53-
TemporalJiff::Timestamp(ts) => write!(f, "{ts}"),
53+
TemporalJiff::Unzoned(dt) => write!(f, "{dt}"),
5454
TemporalJiff::Zoned(z) => write!(f, "{z}"),
5555
}
5656
}
@@ -97,8 +97,8 @@ impl TemporalMetadata {
9797
TemporalMetadata::Timestamp(TimeUnit::D, _) => {
9898
vortex_bail!("Invalid TimeUnit TimeUnit::D for TemporalMetadata::Timestamp")
9999
}
100-
TemporalMetadata::Timestamp(unit, None) => Ok(TemporalJiff::Timestamp(
101-
Timestamp::UNIX_EPOCH.checked_add(unit.to_jiff_span(v)?)?,
100+
TemporalMetadata::Timestamp(unit, None) => Ok(TemporalJiff::Unzoned(
101+
DateTime::new(1970, 1, 1, 0, 0, 0, 0)?.checked_add(unit.to_jiff_span(v)?)?,
102102
)),
103103
TemporalMetadata::Timestamp(unit, Some(tz)) => Ok(TemporalJiff::Zoned(
104104
Timestamp::UNIX_EPOCH

vortex-scalar/src/display.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,7 @@ mod tests {
267267
)))
268268
)
269269
),
270-
"1970-01-04T02:05:10Z"
270+
"1970-01-04T02:05:10"
271271
);
272272
}
273273

0 commit comments

Comments
 (0)