|
| 1 | +--- |
| 2 | +layout: docu |
| 3 | +title: Timestamp Issues |
| 4 | +--- |
| 5 | + |
| 6 | +## Timestamp With Time Zone Promotion Casts |
| 7 | + |
| 8 | +Working with time zones in SQL can be quite confusing at times. |
| 9 | +For example, when filtering to a date range, one might try the following query: |
| 10 | + |
| 11 | +```sql |
| 12 | +SET timezone = 'America/Los_Angeles'; |
| 13 | + |
| 14 | +CREATE TABLE times AS |
| 15 | + FROM range('2025-08-30'::TIMESTAMPTZ, '2025-08-31'::TIMESTAMPTZ, INTERVAL 1 HOUR) tbl(t); |
| 16 | + |
| 17 | +FROM times WHERE t <= '2025-08-30'; |
| 18 | +``` |
| 19 | + |
| 20 | +```text |
| 21 | +┌──────────────────────────┐ |
| 22 | +│ t │ |
| 23 | +│ timestamp with time zone │ |
| 24 | +├──────────────────────────┤ |
| 25 | +│ 2025-08-30 00:00:00-07 │ |
| 26 | +└──────────────────────────┘ |
| 27 | +``` |
| 28 | + |
| 29 | +But if you change to another time zone, the results of the query change: |
| 30 | + |
| 31 | +```sql |
| 32 | +SET timezone = 'HST'; |
| 33 | +FROM times WHERE t <= '2025-08-30'; |
| 34 | +``` |
| 35 | + |
| 36 | +```text |
| 37 | +┌──────────────────────────┐ |
| 38 | +│ t │ |
| 39 | +│ timestamp with time zone │ |
| 40 | +├──────────────────────────┤ |
| 41 | +│ 2025-08-29 21:00:00-10 │ |
| 42 | +│ 2025-08-29 22:00:00-10 │ |
| 43 | +│ 2025-08-29 23:00:00-10 │ |
| 44 | +│ 2025-08-30 00:00:00-10 │ |
| 45 | +└──────────────────────────┘ |
| 46 | +``` |
| 47 | + |
| 48 | +Or worse: |
| 49 | + |
| 50 | +```sql |
| 51 | +SET timezone = 'America/New_York'; |
| 52 | +FROM times WHERE t <= '2025-08-30'; |
| 53 | +``` |
| 54 | + |
| 55 | +```text |
| 56 | +┌──────────────────────────┐ |
| 57 | +│ t │ |
| 58 | +│ timestamp with time zone │ |
| 59 | +├──────────────────────────┤ |
| 60 | +│ 0 rows │ |
| 61 | +└──────────────────────────┘ |
| 62 | +``` |
| 63 | + |
| 64 | +These confusing results are due to the SQL casting rules from `DATE` to `TIMESTAMP WITH TIME ZONE`. |
| 65 | +This cast is required to promote the date to midnight _in the current time zone_. |
| 66 | + |
| 67 | +In general, unless you need to use the current time zone for display (or |
| 68 | +[other temporal binning]({% post_url 2022-01-06-time-zones %}) operations) |
| 69 | +you should use plain `TIMESTAMP`s for temporal data. |
| 70 | +This will avoid confusing issues such as this, and the arithmetic operations are generally faster. |
| 71 | + |
| 72 | +## Time Zone Performance |
| 73 | + |
| 74 | +DuckDB uses the _International Components for Unicode_ time library for |
| 75 | +[time zone support]({% post_url 2022-01-06-time-zones %}). |
| 76 | +This library has a number of advantages, including support for daylight savings time past 2037. |
| 77 | +(Note: Pandas gives incorrect results past that year). |
| 78 | + |
| 79 | +The downside of using ICU is that it is not highly performant. |
| 80 | +One workaround for this is to create a calendar table for the timestamps being modeled. |
| 81 | +For example, if the application is modeling electrical supply and demand out to 2100 at hourly resolution, |
| 82 | +one can create the calendar table like so: |
| 83 | + |
| 84 | +```sql |
| 85 | +SET timezone = 'Europe/Netherlands'; |
| 86 | + |
| 87 | +CREATE OR REPLACE TABLE hourly AS |
| 88 | + SELECT |
| 89 | + ts, |
| 90 | + year::SMALLINT AS year, |
| 91 | + month::TINYINT AS month, |
| 92 | + day::TINYINT AS day, |
| 93 | + hour::TINYINT AS hour, |
| 94 | + FROM ( |
| 95 | + SELECT ts, unnest(date_part(['year', 'month', 'day', 'hour',], ts)) |
| 96 | + FROM generate_series( |
| 97 | + '2020-01-01'::DATE::TIMESTAMPTZ, |
| 98 | + '2100-01-01'::DATE::TIMESTAMPTZ, |
| 99 | + INTERVAL 1 HOUR) tbl(ts) |
| 100 | + ) parts; |
| 101 | +``` |
| 102 | + |
| 103 | +You can then join this ~700K row table against any timestamp column |
| 104 | +to quickly obtain the temporal bin values for the time zone in question. |
| 105 | +The inner casts are not required, but result in a smaller table |
| 106 | +because `date_part` returns 64 bit integers for all parts. |
| 107 | + |
| 108 | +Notice that we can extract _all_ of the parts with a single call to `date_part`. |
| 109 | +This part list version of the function is faster than extracting the parts one by one |
| 110 | +because the underlying binning computation computes all parts, |
| 111 | +so picking out the ones in the list is avoids duplicate calls to the slow ICU function. |
| 112 | + |
| 113 | +Also notice that we are leveraging the `DATE` cast rules from the previous section |
| 114 | +to bound the calendar to the model domain. |
| 115 | + |
| 116 | +## Half Open Intervals |
| 117 | + |
| 118 | +Another subtle problem in using SQL for temporal analytics is the `BETWEEN` operator. |
| 119 | +Temporal analytics almost always uses |
| 120 | +[half-open binning intervals](https://www.cs.arizona.edu/~rts/tdbbook.pdf) |
| 121 | +to avoid overlaps at the ends. |
| 122 | +Unfortunately, the `BETWEEN` operator is a closed-closed interval: |
| 123 | + |
| 124 | +```sql |
| 125 | +x BETWEEN begin AND end |
| 126 | +-- expands to |
| 127 | +begin <= x AND x <= end |
| 128 | +-- not |
| 129 | +begin <= x AND x < end |
| 130 | + |
| 131 | +``` |
| 132 | + |
| 133 | +To avoid this problem, make sure you are explicit about comparison boundaries instead of using `BETWEEN`. |
0 commit comments