Skip to content

Commit ff1134f

Browse files
authored
Merge pull request ceph#60794 from dparmar18/wip-68571
doc/cephfs: document purge queue and its perf counters Reviewed-by: Venky Shankar <[email protected]> Reviewed-by: Anthony D'Atri <[email protected]>
2 parents 8489eee + ae92773 commit ff1134f

File tree

2 files changed

+107
-0
lines changed

2 files changed

+107
-0
lines changed

doc/cephfs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ Administration
9393
CephFS Top Utility <cephfs-top>
9494
Scheduled Snapshots <snap-schedule>
9595
CephFS Snapshot Mirroring <cephfs-mirroring>
96+
Purge Queue <purge-queue>
9697

9798
.. raw:: html
9899

doc/cephfs/purge-queue.rst

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
============
2+
Purge Queue
3+
============
4+
5+
MDS maintains a data structure known as **Purge Queue** which is responsible
6+
for managing and executing the parallel deletion of files.
7+
There is a purge queue for every MDS rank. Purge queues consist of purge items
8+
which contain nominal information from the inodes such as size and the layout
9+
(i.e. all other un-needed metadata information is discarded making it
10+
independent of all metadata structures).
11+
12+
Deletion process
13+
================
14+
15+
When a client requests deletion of a directory (say ``rm -rf``):
16+
17+
- MDS queues the files and subdirectories (purge items) from pq (purge queue)
18+
journal in the purge queue.
19+
- Processes the deletion of inodes in background in small and manageable
20+
chunks.
21+
- MDS instructs underlying OSDs to clean up the associated objects in data
22+
pool.
23+
- Updates the journal.
24+
25+
.. note:: If the users delete the files more quickly than the
26+
purge queue can process then the data pool usage might increase
27+
substantially over time. In extreme scenarios, the purge queue
28+
backlog can become so huge that it can slacken the capacity reclaim
29+
and the linux ``du`` command for CephFS might report inconsistent
30+
data compared to the CephFS Data pool.
31+
32+
There are a few tunable configs that MDS uses internally to throttle purge
33+
queue processing:
34+
35+
.. confval:: filer_max_purge_ops
36+
.. confval:: mds_max_purge_files
37+
.. confval:: mds_max_purge_ops
38+
.. confval:: mds_max_purge_ops_per_pg
39+
40+
Generally, the defaults are adequate for most clusters. However, in
41+
case of pretty huge clusters, if the need arises like ``pq_item_in_journal``
42+
(counter of things pending deletion) reaching gigantic figure then the configs
43+
can be tuned to 4-5 times of the default value as a starting point and
44+
further increments are subject to more requirements.
45+
46+
Start from the most trivial config ``filer_max_purge_ops``, which should help
47+
reclaim the space more quickly::
48+
49+
$ ceph config set mds filer_max_purge_ops 40
50+
51+
Incrementing ``filer_max_purge_ops`` should just work for most
52+
clusters but if it doesn't then move ahead with tuning other configs::
53+
54+
$ ceph config set mds mds_max_purge_files 256
55+
$ ceph config set mds mds_max_purge_ops 32768
56+
$ ceph config set mds mds_max_purge_ops_per_pg 2
57+
58+
.. note:: Setting these values won't immediately break anything except
59+
inasmuch as they control how many delete ops we issue to the
60+
underlying RADOS cluster, but might eat up some cluster performance
61+
if the values set are staggeringly high.
62+
63+
.. note:: The purge queue is not an auto-tuning system in terms of its work
64+
limits as compared to what is going on. So it is advised to make
65+
a conscious decision while tuning the configs based on the cluster
66+
size and workload.
67+
68+
Examining purge queue perf counters
69+
===================================
70+
71+
When analysing MDS perf dumps, the purge queue statistics look like::
72+
73+
"purge_queue": {
74+
"pq_executing_ops": 56655,
75+
"pq_executing_ops_high_water": 65350,
76+
"pq_executing": 1,
77+
"pq_executing_high_water": 3,
78+
"pq_executed": 25,
79+
"pq_item_in_journal": 6567004
80+
}
81+
82+
Let us understand what each of these means:
83+
84+
.. list-table::
85+
:widths: 50 50
86+
:header-rows: 1
87+
88+
* - Name
89+
- Description
90+
* - pq_executing_ops
91+
- Purge queue operations in flight
92+
* - pq_executing_ops_high_water
93+
- Maximum number of executing purge operations recorded
94+
* - pq_executing
95+
- Purge queue files being deleted
96+
* - pq_executing_high_water
97+
- Maximum number of executing file purges
98+
* - pq_executed
99+
- Purge queue files deleted
100+
* - pq_item_in_journal
101+
- Purge items (files) left in journal
102+
103+
.. note:: ``pq_executing`` and ``pq_executing_ops`` might look similar but
104+
there is a small nuance. ``pq_executing`` tracks number of files
105+
in the purge queue while ``pq_executing_ops`` is the count of RADOS
106+
objects from all the files in purge queue.

0 commit comments

Comments
 (0)