Skip to content

Commit 3a57e07

Browse files
mneedhamBlargian
andauthored
Time series use case (#3502)
* Time series use case * make the spell check happy * make linter happy * add intro * add works to exclude list * sub heading tags * more heading ids * link moved * Update docs/use-cases/time-series/analysis-functions.md Co-authored-by: Shaun Struwig <[email protected]> * Update docs/use-cases/time-series/basic-operations.md Co-authored-by: Shaun Struwig <[email protected]> * Update docs/use-cases/time-series/date-time-data-types.md Co-authored-by: Shaun Struwig <[email protected]> * Update docs/use-cases/time-series/index.md Co-authored-by: Shaun Struwig <[email protected]> * Update docs/use-cases/time-series/query-performance.md Co-authored-by: Shaun Struwig <[email protected]> * Update docs/use-cases/time-series/basic-operations.md Co-authored-by: Shaun Struwig <[email protected]> * Update docs/use-cases/time-series/basic-operations.md Co-authored-by: Shaun Struwig <[email protected]> * Update docs/use-cases/time-series/basic-operations.md Co-authored-by: Shaun Struwig <[email protected]> * Update docs/use-cases/time-series/storage-efficiency.md --------- Co-authored-by: Shaun Struwig <[email protected]>
1 parent d7d6818 commit 3a57e07

File tree

9 files changed

+1078
-1
lines changed

9 files changed

+1078
-1
lines changed

docs/use-cases/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ In this section of the docs you can find our use case guides.
1010

1111
| Page | Description |
1212
|-----------------------------------------|---------------------------------------------------------------------|
13-
| [Observability](observability/index.md) | Use case guide on how to setup and use ClickHouse for Observability |
13+
| [Observability](observability/index.md) | Use case guide on how to setup and use ClickHouse for Observability |
14+
| [Time-Series](time-series/index.md) | Use case guide on how to setup and use ClickHouse for time-series |
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
---
2+
title: 'Analysis functions - Time-series'
3+
sidebar_label: 'Analysis functions'
4+
description: 'Functions for analyzing time-series data in ClickHouse.'
5+
slug: /use-cases/time-series/analysis-functions
6+
keywords: ['time-series']
7+
---
8+
9+
# Time-Series analysis functions
10+
11+
Time series analysis in ClickHouse can be performed using standard SQL aggregation and window functions.
12+
When working with time series data, you'll typically encounter three main types of metrics:
13+
14+
* Counter metrics that monotonically increase over time (like page views or total events)
15+
* Gauge metrics that represent point-in-time measurements that can go up and down (like CPU usage or temperature)
16+
* Histograms that sample observations and count them in buckets (like request durations or response sizes)
17+
18+
Common analysis patterns for these metrics include comparing values between periods, calculating cumulative totals, determining rates of change, and analyzing distributions.
19+
These can all be achieved through combinations of aggregations, window functions like `sum() OVER`, and specialized functions like `histogram()`.
20+
21+
## Period-over-period changes {#time-series-period-over-period-changes}
22+
23+
When analyzing time series data, we often need to understand how values change between time periods.
24+
This is essential for both gauge and counter metrics.
25+
The [`lagInFrame`](/docs/sql-reference/window-functions/lagInFrame) window function lets us access the previous period's value to calculate these changes.
26+
27+
The following query demonstrates this by calculating day-over-day changes in views for "Weird Al" Yankovic's Wikipedia page.
28+
The trend column shows whether traffic increased (positive values) or decreased (negative values) compared to the previous day, helping identify unusual spikes or drops in activity.
29+
30+
```sql
31+
SELECT
32+
toDate(time) AS day,
33+
sum(hits) AS h,
34+
lagInFrame(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS p,
35+
h - p AS trend
36+
FROM wikistat
37+
WHERE path = '"Weird_Al"_Yankovic'
38+
GROUP BY ALL
39+
LIMIT 10;
40+
```
41+
42+
```text
43+
┌────────day─┬────h─┬────p─┬─trend─┐
44+
│ 2015-05-01 │ 3934 │ 0 │ 3934 │
45+
│ 2015-05-02 │ 3411 │ 3934 │ -523 │
46+
│ 2015-05-03 │ 3195 │ 3411 │ -216 │
47+
│ 2015-05-04 │ 3076 │ 3195 │ -119 │
48+
│ 2015-05-05 │ 3450 │ 3076 │ 374 │
49+
│ 2015-05-06 │ 3053 │ 3450 │ -397 │
50+
│ 2015-05-07 │ 2890 │ 3053 │ -163 │
51+
│ 2015-05-08 │ 3898 │ 2890 │ 1008 │
52+
│ 2015-05-09 │ 3092 │ 3898 │ -806 │
53+
│ 2015-05-10 │ 3508 │ 3092 │ 416 │
54+
└────────────┴──────┴──────┴───────┘
55+
```
56+
57+
## Cumulative values {#time-series-cumulative-values}
58+
59+
Counter metrics naturally accumulate over time.
60+
To analyze this cumulative growth, we can calculate running totals using window functions.
61+
62+
The following query demonstrates this by using the `sum() OVER` clause creates a running total, while the `bar()` function provides a visual representation of the growth.
63+
64+
```sql
65+
SELECT
66+
toDate(time) AS day,
67+
sum(hits) AS h,
68+
sum(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS c,
69+
bar(c, 0, 50000, 25) AS b
70+
FROM wikistat
71+
WHERE path = '"Weird_Al"_Yankovic'
72+
GROUP BY ALL
73+
ORDER BY day
74+
LIMIT 10;
75+
```
76+
77+
```text
78+
┌────────day─┬────h─┬─────c─┬─b─────────────────┐
79+
│ 2015-05-01 │ 3934 │ 3934 │ █▉ │
80+
│ 2015-05-02 │ 3411 │ 7345 │ ███▋ │
81+
│ 2015-05-03 │ 3195 │ 10540 │ █████▎ │
82+
│ 2015-05-04 │ 3076 │ 13616 │ ██████▊ │
83+
│ 2015-05-05 │ 3450 │ 17066 │ ████████▌ │
84+
│ 2015-05-06 │ 3053 │ 20119 │ ██████████ │
85+
│ 2015-05-07 │ 2890 │ 23009 │ ███████████▌ │
86+
│ 2015-05-08 │ 3898 │ 26907 │ █████████████▍ │
87+
│ 2015-05-09 │ 3092 │ 29999 │ ██████████████▉ │
88+
│ 2015-05-10 │ 3508 │ 33507 │ ████████████████▊ │
89+
└────────────┴──────┴───────┴───────────────────┘
90+
```
91+
92+
## Rate calculations {#time-series-rate-calculations}
93+
94+
When analyzing time series data, it's often useful to understand the rate of events per unit of time.
95+
This query calculates the rate of page views per second by dividing hourly totals by the number of seconds in an hour (3600).
96+
The visual bar helps identify peak hours of activity.
97+
98+
99+
```sql
100+
SELECT
101+
toStartOfHour(time) AS time,
102+
sum(hits) AS hits,
103+
round(hits / (60 * 60), 2) AS rate,
104+
bar(rate * 10, 0, max(rate * 10) OVER (), 25) AS b
105+
FROM wikistat
106+
WHERE path = '"Weird_Al"_Yankovic'
107+
GROUP BY time
108+
LIMIT 10;
109+
```
110+
111+
112+
```text
113+
┌────────────────time─┬───h─┬─rate─┬─b─────┐
114+
│ 2015-07-01 01:00:00 │ 143 │ 0.04 │ █▊ │
115+
│ 2015-07-01 02:00:00 │ 170 │ 0.05 │ ██▏ │
116+
│ 2015-07-01 03:00:00 │ 148 │ 0.04 │ █▊ │
117+
│ 2015-07-01 04:00:00 │ 190 │ 0.05 │ ██▏ │
118+
│ 2015-07-01 05:00:00 │ 253 │ 0.07 │ ███▏ │
119+
│ 2015-07-01 06:00:00 │ 233 │ 0.06 │ ██▋ │
120+
│ 2015-07-01 07:00:00 │ 359 │ 0.1 │ ████▍ │
121+
│ 2015-07-01 08:00:00 │ 190 │ 0.05 │ ██▏ │
122+
│ 2015-07-01 09:00:00 │ 121 │ 0.03 │ █▎ │
123+
│ 2015-07-01 10:00:00 │ 70 │ 0.02 │ ▉ │
124+
└─────────────────────┴─────┴──────┴───────┘
125+
```
126+
127+
## Histograms {#time-series-histograms}
128+
129+
A popular use case for time series data is to build histograms based on tracked events.
130+
Suppose we wanted to understand the distribution of a number of pages based on their total hits, only including pages that have over 10,000 hits.
131+
We can use the `histogram()` function to automatically generate an adaptive histogram based on the number of bins:
132+
133+
```sql
134+
SELECT
135+
histogram(10)(hits) AS hist
136+
FROM
137+
(
138+
SELECT
139+
path,
140+
sum(hits) AS hits
141+
FROM wikistat
142+
WHERE date(time) = '2015-06-15'
143+
GROUP BY path
144+
HAVING hits > 10000
145+
)
146+
FORMAT Vertical;
147+
```
148+
149+
```text
150+
Row 1:
151+
──────
152+
hist: [(10033,23224.55065359477,60.625),(23224.55065359477,37855.38888888889,15.625),(37855.38888888889,52913.5,3.5),(52913.5,69438,1.25),(69438,83102.16666666666,1.25),(83102.16666666666,94267.66666666666,2.5),(94267.66666666666,116778,1.25),(116778,186175.75,1.125),(186175.75,946963.25,1.75),(946963.25,1655250,1.125)]
153+
```
154+
155+
We can then use [`arrayJoin()`](/docs/sql-reference/functions/array-join) to massage the data and `bar()` to visualize it:
156+
157+
158+
```sql
159+
WITH histogram(10)(hits) AS hist
160+
SELECT
161+
round(arrayJoin(hist).1) AS lowerBound,
162+
round(arrayJoin(hist).2) AS upperBound,
163+
arrayJoin(hist).3 AS count,
164+
bar(count, 0, max(count) OVER (), 20) AS b
165+
FROM
166+
(
167+
SELECT
168+
path,
169+
sum(hits) AS hits
170+
FROM wikistat
171+
WHERE date(time) = '2015-06-15'
172+
GROUP BY path
173+
HAVING hits > 10000
174+
);
175+
```
176+
177+
```text
178+
┌─lowerBound─┬─upperBound─┬──count─┬─b────────────────────┐
179+
│ 10033 │ 19886 │ 53.375 │ ████████████████████ │
180+
│ 19886 │ 31515 │ 18.625 │ ██████▉ │
181+
│ 31515 │ 43518 │ 6.375 │ ██▍ │
182+
│ 43518 │ 55647 │ 1.625 │ ▌ │
183+
│ 55647 │ 73602 │ 1.375 │ ▌ │
184+
│ 73602 │ 92880 │ 3.25 │ █▏ │
185+
│ 92880 │ 116778 │ 1.375 │ ▌ │
186+
│ 116778 │ 186176 │ 1.125 │ ▍ │
187+
│ 186176 │ 946963 │ 1.75 │ ▋ │
188+
│ 946963 │ 1655250 │ 1.125 │ ▍ │
189+
└────────────┴────────────┴────────┴──────────────────────┘
190+
```

0 commit comments

Comments
 (0)