Skip to content

Commit 9a8547f

Browse files
authored
Merge pull request #5736 from hawkfish/timestanp-guide
Internal #5748: Streaming Ignore Nulls
2 parents 8ceb436 + d0b024b commit 9a8547f

File tree

6 files changed

+411
-0
lines changed

6 files changed

+411
-0
lines changed

_data/menu_docs_12.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1479,6 +1479,10 @@
14791479
{
14801480
"page": "query and query_table Functions",
14811481
"url": "query_and_query_table_functions"
1482+
},
1483+
{
1484+
"page": "Timestamp Issues",
1485+
"url": "timestamps"
14821486
}
14831487
]
14841488
},

_data/menu_docs_preview.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1513,6 +1513,10 @@
15131513
{
15141514
"page": "query and query_table Functions",
15151515
"url": "query_and_query_table_functions"
1516+
},
1517+
{
1518+
"page": "Timestamp Issues",
1519+
"url": "timestamps"
15161520
}
15171521
]
15181522
},

_data/menu_docs_stable.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1513,6 +1513,10 @@
15131513
{
15141514
"page": "query and query_table Functions",
15151515
"url": "query_and_query_table_functions"
1516+
},
1517+
{
1518+
"page": "Timestamp Issues",
1519+
"url": "timestamps"
15161520
}
15171521
]
15181522
},
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
layout: docu
3+
title: Timestamp Issues
4+
---
5+
6+
## Timestamp With Time Zone Promotion Casts
7+
8+
Working with time zones in SQL can be quite confusing at times.
9+
For example, when filtering to a date range, one might try the following query:
10+
11+
```sql
12+
SET timezone = 'America/Los_Angeles';
13+
14+
CREATE TABLE times AS
15+
FROM range('2025-08-30'::TIMESTAMPTZ, '2025-08-31'::TIMESTAMPTZ, INTERVAL 1 HOUR) tbl(t);
16+
17+
FROM times WHERE t <= '2025-08-30';
18+
```
19+
20+
```text
21+
┌──────────────────────────┐
22+
│ t │
23+
│ timestamp with time zone │
24+
├──────────────────────────┤
25+
│ 2025-08-30 00:00:00-07 │
26+
└──────────────────────────┘
27+
```
28+
29+
But if you change to another time zone, the results of the query change:
30+
31+
```sql
32+
SET timezone = 'HST';
33+
FROM times WHERE t <= '2025-08-30';
34+
```
35+
36+
```text
37+
┌──────────────────────────┐
38+
│ t │
39+
│ timestamp with time zone │
40+
├──────────────────────────┤
41+
│ 2025-08-29 21:00:00-10 │
42+
│ 2025-08-29 22:00:00-10 │
43+
│ 2025-08-29 23:00:00-10 │
44+
│ 2025-08-30 00:00:00-10 │
45+
└──────────────────────────┘
46+
```
47+
48+
Or worse:
49+
50+
```sql
51+
SET timezone = 'America/New_York';
52+
FROM times WHERE t <= '2025-08-30';
53+
```
54+
55+
```text
56+
┌──────────────────────────┐
57+
│ t │
58+
│ timestamp with time zone │
59+
├──────────────────────────┤
60+
│ 0 rows │
61+
└──────────────────────────┘
62+
```
63+
64+
These confusing results are due to the SQL casting rules from `DATE` to `TIMESTAMP WITH TIME ZONE`.
65+
This cast is required to promote the date to midnight _in the current time zone_.
66+
67+
In general, unless you need to use the current time zone for display (or
68+
[other temporal binning]({% post_url 2022-01-06-time-zones %}) operations)
69+
you should use plain `TIMESTAMP`s for temporal data.
70+
This will avoid confusing issues such as this, and the arithmetic operations are generally faster.
71+
72+
## Time Zone Performance
73+
74+
DuckDB uses the _International Components for Unicode_ time library for
75+
[time zone support]({% post_url 2022-01-06-time-zones %}).
76+
This library has a number of advantages, including support for daylight savings time past 2037.
77+
(Note: Pandas gives incorrect results past that year).
78+
79+
The downside of using ICU is that it is not highly performant.
80+
One workaround for this is to create a calendar table for the timestamps being modeled.
81+
For example, if the application is modeling electrical supply and demand out to 2100 at hourly resolution,
82+
one can create the calendar table like so:
83+
84+
```sql
85+
SET timezone = 'Europe/Netherlands';
86+
87+
CREATE OR REPLACE TABLE hourly AS
88+
SELECT
89+
ts,
90+
year::SMALLINT AS year,
91+
month::TINYINT AS month,
92+
day::TINYINT AS day,
93+
hour::TINYINT AS hour,
94+
FROM (
95+
SELECT ts, unnest(date_part(['year', 'month', 'day', 'hour',], ts))
96+
FROM generate_series(
97+
'2020-01-01'::DATE::TIMESTAMPTZ,
98+
'2100-01-01'::DATE::TIMESTAMPTZ,
99+
INTERVAL 1 HOUR) tbl(ts)
100+
) parts;
101+
```
102+
103+
You can then join this ~700K row table against any timestamp column
104+
to quickly obtain the temporal bin values for the time zone in question.
105+
The inner casts are not required, but result in a smaller table
106+
because `date_part` returns 64 bit integers for all parts.
107+
108+
Notice that we can extract _all_ of the parts with a single call to `date_part`.
109+
This part list version of the function is faster than extracting the parts one by one
110+
because the underlying binning computation computes all parts,
111+
so picking out the ones in the list is avoids duplicate calls to the slow ICU function.
112+
113+
Also notice that we are leveraging the `DATE` cast rules from the previous section
114+
to bound the calendar to the model domain.
115+
116+
## Half Open Intervals
117+
118+
Another subtle problem in using SQL for temporal analytics is the `BETWEEN` operator.
119+
Temporal analytics almost always uses
120+
[half-open binning intervals](https://www.cs.arizona.edu/~rts/tdbbook.pdf)
121+
to avoid overlaps at the ends.
122+
Unfortunately, the `BETWEEN` operator is a closed-closed interval:
123+
124+
```sql
125+
x BETWEEN begin AND end
126+
-- expands to
127+
begin <= x AND x <= end
128+
-- not
129+
begin <= x AND x < end
130+
131+
```
132+
133+
To avoid this problem, make sure you are explicit about comparison boundaries instead of using `BETWEEN`.
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
layout: docu
3+
title: Timestamp Issues
4+
---
5+
6+
## Timestamp With Time Zone Promotion Casts
7+
8+
Working with time zones in SQL can be quite confusing at times.
9+
For example, when filtering to a date range, one might try the following query:
10+
11+
```sql
12+
SET timezone = 'America/Los_Angeles';
13+
14+
CREATE TABLE times AS
15+
FROM range('2025-08-30'::TIMESTAMPTZ, '2025-08-31'::TIMESTAMPTZ, INTERVAL 1 HOUR) tbl(t);
16+
17+
FROM times WHERE t <= '2025-08-30';
18+
```
19+
20+
```text
21+
┌──────────────────────────┐
22+
│ t │
23+
│ timestamp with time zone │
24+
├──────────────────────────┤
25+
│ 2025-08-30 00:00:00-07 │
26+
└──────────────────────────┘
27+
```
28+
29+
But if you change to another time zone, the results of the query change:
30+
31+
```sql
32+
SET timezone = 'HST';
33+
FROM times WHERE t <= '2025-08-30';
34+
```
35+
36+
```text
37+
┌──────────────────────────┐
38+
│ t │
39+
│ timestamp with time zone │
40+
├──────────────────────────┤
41+
│ 2025-08-29 21:00:00-10 │
42+
│ 2025-08-29 22:00:00-10 │
43+
│ 2025-08-29 23:00:00-10 │
44+
│ 2025-08-30 00:00:00-10 │
45+
└──────────────────────────┘
46+
```
47+
48+
Or worse:
49+
50+
```sql
51+
SET timezone = 'America/New_York';
52+
FROM times WHERE t <= '2025-08-30';
53+
```
54+
55+
```text
56+
┌──────────────────────────┐
57+
│ t │
58+
│ timestamp with time zone │
59+
├──────────────────────────┤
60+
│ 0 rows │
61+
└──────────────────────────┘
62+
```
63+
64+
These confusing results are due to the SQL casting rules from `DATE` to `TIMESTAMP WITH TIME ZONE`.
65+
This cast is required to promote the date to midnight _in the current time zone_.
66+
67+
In general, unless you need to use the current time zone for display (or
68+
[other temporal binning]({% post_url 2022-01-06-time-zones %}) operations)
69+
you should use plain `TIMESTAMP`s for temporal data.
70+
This will avoid confusing issues such as this, and the arithmetic operations are generally faster.
71+
72+
## Time Zone Performance
73+
74+
DuckDB uses the _International Components for Unicode_ time library for
75+
[time zone support]({% post_url 2022-01-06-time-zones %}).
76+
This library has a number of advantages, including support for daylight savings time past 2037.
77+
(Note: Pandas gives incorrect results past that year).
78+
79+
The downside of using ICU is that it is not highly performant.
80+
One workaround for this is to create a calendar table for the timestamps being modeled.
81+
For example, if the application is modeling electrical supply and demand out to 2100 at hourly resolution,
82+
one can create the calendar table like so:
83+
84+
```sql
85+
SET timezone = 'Europe/Netherlands';
86+
87+
CREATE OR REPLACE TABLE hourly AS
88+
SELECT
89+
ts,
90+
year::SMALLINT AS year,
91+
month::TINYINT AS month,
92+
day::TINYINT AS day,
93+
hour::TINYINT AS hour,
94+
FROM (
95+
SELECT ts, unnest(date_part(['year', 'month', 'day', 'hour',], ts))
96+
FROM generate_series(
97+
'2020-01-01'::DATE::TIMESTAMPTZ,
98+
'2100-01-01'::DATE::TIMESTAMPTZ,
99+
INTERVAL 1 HOUR) tbl(ts)
100+
) parts;
101+
```
102+
103+
You can then join this ~700K row table against any timestamp column
104+
to quickly obtain the temporal bin values for the time zone in question.
105+
The inner casts are not required, but result in a smaller table
106+
because `date_part` returns 64 bit integers for all parts.
107+
108+
Notice that we can extract _all_ of the parts with a single call to `date_part`.
109+
This part list version of the function is faster than extracting the parts one by one
110+
because the underlying binning computation computes all parts,
111+
so picking out the ones in the list is avoids duplicate calls to the slow ICU function.
112+
113+
Also notice that we are leveraging the `DATE` cast rules from the previous section
114+
to bound the calendar to the model domain.
115+
116+
## Half Open Intervals
117+
118+
Another subtle problem in using SQL for temporal analytics is the `BETWEEN` operator.
119+
Temporal analytics almost always uses
120+
[half-open binning intervals](https://www.cs.arizona.edu/~rts/tdbbook.pdf)
121+
to avoid overlaps at the ends.
122+
Unfortunately, the `BETWEEN` operator is a closed-closed interval:
123+
124+
```sql
125+
x BETWEEN begin AND end
126+
-- expands to
127+
begin <= x AND x <= end
128+
-- not
129+
begin <= x AND x < end
130+
131+
```
132+
133+
To avoid this problem, make sure you are explicit about comparison boundaries instead of using `BETWEEN`.

0 commit comments

Comments
 (0)