|
| 1 | +--- |
| 2 | +title: 'Analysis functions - Time-series' |
| 3 | +sidebar_label: 'Analysis functions' |
| 4 | +description: 'Functions for analyzing time-series data in ClickHouse.' |
| 5 | +slug: /use-cases/time-series/analysis-functions |
| 6 | +keywords: ['time-series'] |
| 7 | +--- |
| 8 | + |
| 9 | +# Time-Series analysis functions |
| 10 | + |
| 11 | +Time series analysis in ClickHouse can be performed using standard SQL aggregation and window functions. |
| 12 | +When working with time series data, you'll typically encounter three main types of metrics: |
| 13 | + |
| 14 | +* Counter metrics that monotonically increase over time (like page views or total events) |
| 15 | +* Gauge metrics that represent point-in-time measurements that can go up and down (like CPU usage or temperature) |
| 16 | +* Histograms that sample observations and count them in buckets (like request durations or response sizes) |
| 17 | + |
| 18 | +Common analysis patterns for these metrics include comparing values between periods, calculating cumulative totals, determining rates of change, and analyzing distributions. |
| 19 | +These can all be achieved through combinations of aggregations, window functions like `sum() OVER`, and specialized functions like `histogram()`. |
| 20 | + |
| 21 | +## Period-over-period changes {#time-series-period-over-period-changes} |
| 22 | + |
| 23 | +When analyzing time series data, we often need to understand how values change between time periods. |
| 24 | +This is essential for both gauge and counter metrics. |
| 25 | +The [`lagInFrame`](/docs/sql-reference/window-functions/lagInFrame) window function lets us access the previous period's value to calculate these changes. |
| 26 | + |
| 27 | +The following query demonstrates this by calculating day-over-day changes in views for "Weird Al" Yankovic's Wikipedia page. |
| 28 | +The trend column shows whether traffic increased (positive values) or decreased (negative values) compared to the previous day, helping identify unusual spikes or drops in activity. |
| 29 | + |
| 30 | +```sql |
| 31 | +SELECT |
| 32 | + toDate(time) AS day, |
| 33 | + sum(hits) AS h, |
| 34 | + lagInFrame(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS p, |
| 35 | + h - p AS trend |
| 36 | +FROM wikistat |
| 37 | +WHERE path = '"Weird_Al"_Yankovic' |
| 38 | +GROUP BY ALL |
| 39 | +LIMIT 10; |
| 40 | +``` |
| 41 | + |
| 42 | +```text |
| 43 | +┌────────day─┬────h─┬────p─┬─trend─┐ |
| 44 | +│ 2015-05-01 │ 3934 │ 0 │ 3934 │ |
| 45 | +│ 2015-05-02 │ 3411 │ 3934 │ -523 │ |
| 46 | +│ 2015-05-03 │ 3195 │ 3411 │ -216 │ |
| 47 | +│ 2015-05-04 │ 3076 │ 3195 │ -119 │ |
| 48 | +│ 2015-05-05 │ 3450 │ 3076 │ 374 │ |
| 49 | +│ 2015-05-06 │ 3053 │ 3450 │ -397 │ |
| 50 | +│ 2015-05-07 │ 2890 │ 3053 │ -163 │ |
| 51 | +│ 2015-05-08 │ 3898 │ 2890 │ 1008 │ |
| 52 | +│ 2015-05-09 │ 3092 │ 3898 │ -806 │ |
| 53 | +│ 2015-05-10 │ 3508 │ 3092 │ 416 │ |
| 54 | +└────────────┴──────┴──────┴───────┘ |
| 55 | +``` |
| 56 | + |
| 57 | +## Cumulative values {#time-series-cumulative-values} |
| 58 | + |
| 59 | +Counter metrics naturally accumulate over time. |
| 60 | +To analyze this cumulative growth, we can calculate running totals using window functions. |
| 61 | + |
| 62 | +The following query demonstrates this by using the `sum() OVER` clause creates a running total, while the `bar()` function provides a visual representation of the growth. |
| 63 | + |
| 64 | +```sql |
| 65 | +SELECT |
| 66 | + toDate(time) AS day, |
| 67 | + sum(hits) AS h, |
| 68 | + sum(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS c, |
| 69 | + bar(c, 0, 50000, 25) AS b |
| 70 | +FROM wikistat |
| 71 | +WHERE path = '"Weird_Al"_Yankovic' |
| 72 | +GROUP BY ALL |
| 73 | +ORDER BY day |
| 74 | +LIMIT 10; |
| 75 | +``` |
| 76 | + |
| 77 | +```text |
| 78 | +┌────────day─┬────h─┬─────c─┬─b─────────────────┐ |
| 79 | +│ 2015-05-01 │ 3934 │ 3934 │ █▉ │ |
| 80 | +│ 2015-05-02 │ 3411 │ 7345 │ ███▋ │ |
| 81 | +│ 2015-05-03 │ 3195 │ 10540 │ █████▎ │ |
| 82 | +│ 2015-05-04 │ 3076 │ 13616 │ ██████▊ │ |
| 83 | +│ 2015-05-05 │ 3450 │ 17066 │ ████████▌ │ |
| 84 | +│ 2015-05-06 │ 3053 │ 20119 │ ██████████ │ |
| 85 | +│ 2015-05-07 │ 2890 │ 23009 │ ███████████▌ │ |
| 86 | +│ 2015-05-08 │ 3898 │ 26907 │ █████████████▍ │ |
| 87 | +│ 2015-05-09 │ 3092 │ 29999 │ ██████████████▉ │ |
| 88 | +│ 2015-05-10 │ 3508 │ 33507 │ ████████████████▊ │ |
| 89 | +└────────────┴──────┴───────┴───────────────────┘ |
| 90 | +``` |
| 91 | + |
| 92 | +## Rate calculations {#time-series-rate-calculations} |
| 93 | + |
| 94 | +When analyzing time series data, it's often useful to understand the rate of events per unit of time. |
| 95 | +This query calculates the rate of page views per second by dividing hourly totals by the number of seconds in an hour (3600). |
| 96 | +The visual bar helps identify peak hours of activity. |
| 97 | + |
| 98 | + |
| 99 | +```sql |
| 100 | +SELECT |
| 101 | + toStartOfHour(time) AS time, |
| 102 | + sum(hits) AS hits, |
| 103 | + round(hits / (60 * 60), 2) AS rate, |
| 104 | + bar(rate * 10, 0, max(rate * 10) OVER (), 25) AS b |
| 105 | +FROM wikistat |
| 106 | +WHERE path = '"Weird_Al"_Yankovic' |
| 107 | +GROUP BY time |
| 108 | +LIMIT 10; |
| 109 | +``` |
| 110 | + |
| 111 | + |
| 112 | +```text |
| 113 | +┌────────────────time─┬───h─┬─rate─┬─b─────┐ |
| 114 | +│ 2015-07-01 01:00:00 │ 143 │ 0.04 │ █▊ │ |
| 115 | +│ 2015-07-01 02:00:00 │ 170 │ 0.05 │ ██▏ │ |
| 116 | +│ 2015-07-01 03:00:00 │ 148 │ 0.04 │ █▊ │ |
| 117 | +│ 2015-07-01 04:00:00 │ 190 │ 0.05 │ ██▏ │ |
| 118 | +│ 2015-07-01 05:00:00 │ 253 │ 0.07 │ ███▏ │ |
| 119 | +│ 2015-07-01 06:00:00 │ 233 │ 0.06 │ ██▋ │ |
| 120 | +│ 2015-07-01 07:00:00 │ 359 │ 0.1 │ ████▍ │ |
| 121 | +│ 2015-07-01 08:00:00 │ 190 │ 0.05 │ ██▏ │ |
| 122 | +│ 2015-07-01 09:00:00 │ 121 │ 0.03 │ █▎ │ |
| 123 | +│ 2015-07-01 10:00:00 │ 70 │ 0.02 │ ▉ │ |
| 124 | +└─────────────────────┴─────┴──────┴───────┘ |
| 125 | +``` |
| 126 | + |
| 127 | +## Histograms {#time-series-histograms} |
| 128 | + |
| 129 | +A popular use case for time series data is to build histograms based on tracked events. |
| 130 | +Suppose we wanted to understand the distribution of a number of pages based on their total hits, only including pages that have over 10,000 hits. |
| 131 | +We can use the `histogram()` function to automatically generate an adaptive histogram based on the number of bins: |
| 132 | + |
| 133 | +```sql |
| 134 | +SELECT |
| 135 | + histogram(10)(hits) AS hist |
| 136 | +FROM |
| 137 | +( |
| 138 | + SELECT |
| 139 | + path, |
| 140 | + sum(hits) AS hits |
| 141 | + FROM wikistat |
| 142 | + WHERE date(time) = '2015-06-15' |
| 143 | + GROUP BY path |
| 144 | + HAVING hits > 10000 |
| 145 | +) |
| 146 | +FORMAT Vertical; |
| 147 | +``` |
| 148 | + |
| 149 | +```text |
| 150 | +Row 1: |
| 151 | +────── |
| 152 | +hist: [(10033,23224.55065359477,60.625),(23224.55065359477,37855.38888888889,15.625),(37855.38888888889,52913.5,3.5),(52913.5,69438,1.25),(69438,83102.16666666666,1.25),(83102.16666666666,94267.66666666666,2.5),(94267.66666666666,116778,1.25),(116778,186175.75,1.125),(186175.75,946963.25,1.75),(946963.25,1655250,1.125)] |
| 153 | +``` |
| 154 | + |
| 155 | +We can then use [`arrayJoin()`](/docs/sql-reference/functions/array-join) to massage the data and `bar()` to visualize it: |
| 156 | + |
| 157 | + |
| 158 | +```sql |
| 159 | +WITH histogram(10)(hits) AS hist |
| 160 | +SELECT |
| 161 | + round(arrayJoin(hist).1) AS lowerBound, |
| 162 | + round(arrayJoin(hist).2) AS upperBound, |
| 163 | + arrayJoin(hist).3 AS count, |
| 164 | + bar(count, 0, max(count) OVER (), 20) AS b |
| 165 | +FROM |
| 166 | +( |
| 167 | + SELECT |
| 168 | + path, |
| 169 | + sum(hits) AS hits |
| 170 | + FROM wikistat |
| 171 | + WHERE date(time) = '2015-06-15' |
| 172 | + GROUP BY path |
| 173 | + HAVING hits > 10000 |
| 174 | +); |
| 175 | +``` |
| 176 | + |
| 177 | +```text |
| 178 | +┌─lowerBound─┬─upperBound─┬──count─┬─b────────────────────┐ |
| 179 | +│ 10033 │ 19886 │ 53.375 │ ████████████████████ │ |
| 180 | +│ 19886 │ 31515 │ 18.625 │ ██████▉ │ |
| 181 | +│ 31515 │ 43518 │ 6.375 │ ██▍ │ |
| 182 | +│ 43518 │ 55647 │ 1.625 │ ▌ │ |
| 183 | +│ 55647 │ 73602 │ 1.375 │ ▌ │ |
| 184 | +│ 73602 │ 92880 │ 3.25 │ █▏ │ |
| 185 | +│ 92880 │ 116778 │ 1.375 │ ▌ │ |
| 186 | +│ 116778 │ 186176 │ 1.125 │ ▍ │ |
| 187 | +│ 186176 │ 946963 │ 1.75 │ ▋ │ |
| 188 | +│ 946963 │ 1655250 │ 1.125 │ ▍ │ |
| 189 | +└────────────┴────────────┴────────┴──────────────────────┘ |
| 190 | +``` |
0 commit comments