Commit 3969680
authored
fix: Arrow timestamps are in an _unknown_ timezone (#3504)
The current logic assumes a timezone-naive vortex.timestamp has the UTC
timzeone. However, Arrow [specifically says it is in an unknown
timezone](https://github.com/apache/arrow/blob/main/format/Schema.fbs#L304-L322):
> Timestamps with an unset / empty timezone
> -----------------------------------------
>
> If a Timestamp column has no timezone value, its epoch is
> 1970-01-01 00:00:00 (January 1st 1970, midnight) in an *unknown*
timezone.
>
> Therefore, timestamp values without a timezone cannot be meaningfully
> interpreted as physical points in time, but only as calendar / clock
> indications ("wall clock time") in an unspecified timezone.
>
> For example, the timestamp value 0 with an empty timezone string
> corresponds to "January 1st 1970, 00h00" in an unknown timezone: there
> is not enough information to interpret it as a well-defined physical
> point in time.
>
> One consequence is that timestamp values without a timezone cannot
> be reliably compared or ordered, since they may have different points
of
> reference. In particular, it is *not* possible to interpret an unset
> or empty timezone as the same as "UTC".
This bug most prominently appears in the Vortex cli browser which shows
the incorrect timezone for the min/max statistics of a timestamp column:
```
╭───────────────────────────────────────────────────────────────────
│ File Layout │ Segments
│╭Layout Info───────────────────────────────────────────────────────
││Kind: vortex.flat
││Row Count: 1
││Schema: {max=ext(vortex.timestamp, i64, ExtMetadata([1, 0, 0]))?,
││Children: 0
││Segment data size: 1.32 kB
││FlatBuffer Size: 1.32 kB
││
││
│╰──────────────────────────────────────────────────────────────────
│╭Array Info────────────────────────────────────────────────────────
││chunk max max_is
││0 2025-06-11T02:24:37.054366Z false
││
││
││
│╰──────────────────────────────────────────────────────────────────
╰───────────────────────────────────────────────────────────────────
```
After this change, the year, month, day, hour, etc. are unchanged, but
the timezone is correctly reported as unknown.
```
╭───────────────────────────────────────────────────────────────────
│ File Layout │ Segments
│╭Layout Info───────────────────────────────────────────────────────
││Kind: vortex.flat
││Row Count: 1
││Schema: {max=ext(vortex.timestamp, i64, ExtMetadata([1, 0, 0]))?,
││Children: 0
││Segment data size: 1.32 kB
││FlatBuffer Size: 1.32 kB
││
││
│╰──────────────────────────────────────────────────────────────────
│╭Array Info────────────────────────────────────────────────────────
││chunk max max_is
││0 2025-06-11T02:24:37.054366 false
```
Signed-off-by: Daniel King <[email protected]>1 parent 5931d55 commit 3969680
File tree
2 files changed
+7
-7
lines changed- vortex-dtype/src/datetime
- vortex-scalar/src
2 files changed
+7
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
43 | | - | |
| 42 | + | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
101 | | - | |
| 100 | + | |
| 101 | + | |
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
267 | 267 | | |
268 | 268 | | |
269 | 269 | | |
270 | | - | |
| 270 | + | |
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
| |||
0 commit comments