@@ -209,3 +209,71 @@ cache. The limit is configured via:
209209
210210It is not recommended to set this value above 5M but it may be helpful with
211211some workloads.
212+
213+
214+ Dealing with "clients failing to respond to cache pressure" messages
215+ --------------------------------------------------------------------
216+
217+ Every second (or every interval set by the ``mds_cache_trim_interval ``
218+ configuration paramater), the MDS runs the "cache trim" procedure. One of the
219+ steps of this procedure is "recall client state". During this step, the MDS
220+ checks every client (session) to determine whether it needs to recall caps.
221+ If any of the following are true, then the MDS needs to recall caps:
222+
223+ 1. the cache is full (the ``mds_cache_memory_limit `` has been exceeded) and
224+ needs some inodes to be released
225+ 2. the client exceeds ``mds_max_caps_per_client `` (1M by default)
226+ 3. the client is inactive
227+
228+ To determine whether a client (a session) is inactive, the session's
229+ ``cache_liveness `` parameters is checked and compared with the value::
230+
231+ (num_caps >> mds_session_cache_liveness_magnitude)
232+
233+ where ``mds_session_cache_liveness_magnitude `` is a config param (``10 `` by
234+ default). If ``cache_liveness `` is smaller than this calculated value, the
235+ session is considered inactive and the MDS sends a "recall caps" request for
236+ all cached caps (the actual recall value is ``num_caps -
237+ mds_min_caps_per_client(100) ``).
238+
239+ Under certain circumstances, many "recall caps" requests can be sent so quickly
240+ that the "mon warning limit" exceeded, and the "clients failing to respond to
241+ cache pressure" message can be triggered. If the client does not release the
242+ caps fast enough, the MDS repeats the "recall caps" request one second later.
243+ This means that the MDS will send "recall caps" again and again. The "total"
244+ counter of "recall caps" for the session will grow and grow, and will
245+ eventually exceed the "mon warning limit".
246+
247+ A throttling mechanism, controlled by the ``mds_recall_max_decay_threshold ``
248+ parameter (126K by default), is available for reducing the rate of "recall
249+ caps" counter growth, but sometimes it is not enough to slow the "recall caps"
250+ counter's growth rate. If altering the ``mds_recall_max_decay_threshold `` value
251+ does not sufficiently reduce the rate of the "recall caps" counter's growth,
252+ decrease ``mds_recall_max_caps `` incrementally until the "clients failing to
253+ respond to cache pressure" messages no longer appear in the logs.
254+
255+ Example Scenario
256+ ~~~~~~~~~~~~~~~~
257+
258+ Here is an example. A client is having 20k caps cached. At some moment the
259+ server decides the client is inactive (because the session's ``cache_liveness ``
260+ value is low). It starts to ask the client to release caps down to
261+ ``mds_min_caps_per_client `` value (100 by default). For this every seconds it
262+ sends recall_caps asking to release ``caps_num - mds_min_caps_per_client `` caps
263+ (but not more than ``mds_recall_max_caps ``, which is 30k by default). A client
264+ is starting to release, but is releasing with a rate of (for example) only 100
265+ caps per second.
266+
267+ So in the first second of time, the mds sends recall_caps = 20k - 100 the
268+ second second recall_caps = (20k - 100) - 100 the third second recall_caps =
269+ (20k - 200) - 100 and so on. And every time it sends recall_caps it updates the
270+ session's recall_caps value, which is calculated how many recall_caps sent in
271+ the last minute. I.e. the counter is growing quickly, eventually exceeding
272+ mds_recall_warning_threshold, which is 128K by default, and ceph starts to
273+ report "failing to respond to cache pressure" warning in the status. Now,
274+ after we set mds_recall_max_caps to 3K, in this situation the mds server sends
275+ only 3K recall_caps per second, and the maximum value the session's recall_caps
276+ value may have (if the mds is sending 3K every second for at least one minute)
277+ is 60 * 3K = 180K. This means that it is still possible to achieve
278+ ``mds_recall_warning_threshold `` but only if a client does not "respond" for a
279+ long time, and as your experiments show it is not the case.
0 commit comments