Skip to content

Commit 0d68d1e

Browse files
author
Alex Katsman
committed
Add docs for server overload
1 parent 1df1380 commit 0d68d1e

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
description: 'Controlling behavior on server CPU overload.'
3+
slug: /operations/settings/server-overload
4+
title: 'Server overload'
5+
---
6+
7+
# Server overload
8+
9+
## Overview {#overview}
10+
11+
Sometimes server can become overloaded due to different reasons. In order to determine the current CPU overload,
12+
ClickHouse server calculates the ratio of CPU wait time (`OSCPUWaitMicroseconds` metric) to busy time
13+
(`OSCPUVirtualTimeMicroseconds` metric). When the server is overloaded above certain ratio,
14+
it makes sense to discard some queries or even drop connection requests to not increase the load even more.
15+
16+
There's a server setting `os_cpu_busy_time_threshold` which controls the minimum busy time to consider CPU
17+
doing some useful work. If the current value of `OSCPUVirtualTimeMicroseconds` metric is below this value,
18+
CPU overload is assumed to be 0.
19+
20+
## Rejecting queries
21+
22+
The behavior of rejecting queries is controlled by query-level settings `min_os_cpu_wait_time_ratio_to_throw` and
23+
`max_os_cpu_wait_time_ratio_to_throw`. If those settings are set and `min_os_cpu_wait_time_ratio_to_throw` is less
24+
than `max_os_cpu_wait_time_ratio_to_throw`, then the query is rejected and `SERVER_OVERLOADED` error is thrown
25+
with some probability is the overload ratio is at least `min_os_cpu_wait_time_ratio_to_throw`. The probability
26+
is determined as a linear interpolation between min and max ratios. For example, if `min_os_cpu_wait_time_ratio_to_throw = 2`,
27+
`max_os_cpu_wait_time_ratio_to_throw = 6`, and `cpu_overload = 4`, then the query will be rejected with a probability of `0.5`.
28+
29+
## Dropping connections
30+
31+
Dropping connections is controlled by server-level settings `min_os_cpu_wait_time_ratio_to_drop_connection` and
32+
`max_os_cpu_wait_time_ratio_to_drop_connection`. Those settings can be changed without server restart. The idea behind
33+
those settings is similar to the one with rejecting queries. The only difference in this case is if the server is overloaded,
34+
the connection attempt will be rejected from the server side.

0 commit comments

Comments
 (0)