Skip to content

Commit 530a260

Browse files
authored
Merge pull request ceph#59077 from zdover23/wip-doc-2024-08-07-cephfs-cache-configuration-cache-pressure
doc/cephfs: add cache pressure information Reviewed-by: Anthony D'Atri <[email protected]>
2 parents c88b7d3 + bf26274 commit 530a260

File tree

1 file changed

+68
-0
lines changed

1 file changed

+68
-0
lines changed

doc/cephfs/cache-configuration.rst

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,3 +209,71 @@ cache. The limit is configured via:
209209

210210
It is not recommended to set this value above 5M but it may be helpful with
211211
some workloads.
212+
213+
214+
Dealing with "clients failing to respond to cache pressure" messages
215+
--------------------------------------------------------------------
216+
217+
Every second (or every interval set by the ``mds_cache_trim_interval``
218+
configuration paramater), the MDS runs the "cache trim" procedure. One of the
219+
steps of this procedure is "recall client state". During this step, the MDS
220+
checks every client (session) to determine whether it needs to recall caps.
221+
If any of the following are true, then the MDS needs to recall caps:
222+
223+
1. the cache is full (the ``mds_cache_memory_limit`` has been exceeded) and
224+
needs some inodes to be released
225+
2. the client exceeds ``mds_max_caps_per_client`` (1M by default)
226+
3. the client is inactive
227+
228+
To determine whether a client (a session) is inactive, the session's
229+
``cache_liveness`` parameters is checked and compared with the value::
230+
231+
(num_caps >> mds_session_cache_liveness_magnitude)
232+
233+
where ``mds_session_cache_liveness_magnitude`` is a config param (``10`` by
234+
default). If ``cache_liveness`` is smaller than this calculated value, the
235+
session is considered inactive and the MDS sends a "recall caps" request for
236+
all cached caps (the actual recall value is ``num_caps -
237+
mds_min_caps_per_client(100)``).
238+
239+
Under certain circumstances, many "recall caps" requests can be sent so quickly
240+
that the "mon warning limit" exceeded, and the "clients failing to respond to
241+
cache pressure" message can be triggered. If the client does not release the
242+
caps fast enough, the MDS repeats the "recall caps" request one second later.
243+
This means that the MDS will send "recall caps" again and again. The "total"
244+
counter of "recall caps" for the session will grow and grow, and will
245+
eventually exceed the "mon warning limit".
246+
247+
A throttling mechanism, controlled by the ``mds_recall_max_decay_threshold``
248+
parameter (126K by default), is available for reducing the rate of "recall
249+
caps" counter growth, but sometimes it is not enough to slow the "recall caps"
250+
counter's growth rate. If altering the ``mds_recall_max_decay_threshold`` value
251+
does not sufficiently reduce the rate of the "recall caps" counter's growth,
252+
decrease ``mds_recall_max_caps`` incrementally until the "clients failing to
253+
respond to cache pressure" messages no longer appear in the logs.
254+
255+
Example Scenario
256+
~~~~~~~~~~~~~~~~
257+
258+
Here is an example. A client is having 20k caps cached. At some moment the
259+
server decides the client is inactive (because the session's ``cache_liveness``
260+
value is low). It starts to ask the client to release caps down to
261+
``mds_min_caps_per_client`` value (100 by default). For this every seconds it
262+
sends recall_caps asking to release ``caps_num - mds_min_caps_per_client`` caps
263+
(but not more than ``mds_recall_max_caps``, which is 30k by default). A client
264+
is starting to release, but is releasing with a rate of (for example) only 100
265+
caps per second.
266+
267+
So in the first second of time, the mds sends recall_caps = 20k - 100 the
268+
second second recall_caps = (20k - 100) - 100 the third second recall_caps =
269+
(20k - 200) - 100 and so on. And every time it sends recall_caps it updates the
270+
session's recall_caps value, which is calculated how many recall_caps sent in
271+
the last minute. I.e. the counter is growing quickly, eventually exceeding
272+
mds_recall_warning_threshold, which is 128K by default, and ceph starts to
273+
report "failing to respond to cache pressure" warning in the status. Now,
274+
after we set mds_recall_max_caps to 3K, in this situation the mds server sends
275+
only 3K recall_caps per second, and the maximum value the session's recall_caps
276+
value may have (if the mds is sending 3K every second for at least one minute)
277+
is 60 * 3K = 180K. This means that it is still possible to achieve
278+
``mds_recall_warning_threshold`` but only if a client does not "respond" for a
279+
long time, and as your experiments show it is not the case.

0 commit comments

Comments
 (0)