|
| 1 | +============ |
| 2 | +Purge Queue |
| 3 | +============ |
| 4 | + |
| 5 | +MDS maintains a data structure known as **Purge Queue** which is responsible |
| 6 | +for managing and executing the parallel deletion of files. |
| 7 | +There is a purge queue for every MDS rank. Purge queues consist of purge items |
| 8 | +which contain nominal information from the inodes such as size and the layout |
| 9 | +(i.e. all other un-needed metadata information is discarded making it |
| 10 | +independent of all metadata structures). |
| 11 | + |
| 12 | +Deletion process |
| 13 | +================ |
| 14 | + |
| 15 | +When a client requests deletion of a directory (say ``rm -rf``): |
| 16 | + |
| 17 | +- MDS queues the files and subdirectories (purge items) from pq (purge queue) |
| 18 | + journal in the purge queue. |
| 19 | +- Processes the deletion of inodes in background in small and manageable |
| 20 | + chunks. |
| 21 | +- MDS instructs underlying OSDs to clean up the associated objects in data |
| 22 | + pool. |
| 23 | +- Updates the journal. |
| 24 | + |
| 25 | +.. note:: If the users delete the files more quickly than the |
| 26 | + purge queue can process then the data pool usage might increase |
| 27 | + substantially over time. In extreme scenarios, the purge queue |
| 28 | + backlog can become so huge that it can slacken the capacity reclaim |
| 29 | + and the linux ``du`` command for CephFS might report inconsistent |
| 30 | + data compared to the CephFS Data pool. |
| 31 | + |
| 32 | +There are a few tunable configs that MDS uses internally to throttle purge |
| 33 | +queue processing: |
| 34 | + |
| 35 | +.. confval:: filer_max_purge_ops |
| 36 | +.. confval:: mds_max_purge_files |
| 37 | +.. confval:: mds_max_purge_ops |
| 38 | +.. confval:: mds_max_purge_ops_per_pg |
| 39 | + |
| 40 | +Generally, the defaults are adequate for most clusters. However, in |
| 41 | +case of pretty huge clusters, if the need arises like ``pq_item_in_journal`` |
| 42 | +(counter of things pending deletion) reaching gigantic figure then the configs |
| 43 | +can be tuned to 4-5 times of the default value as a starting point and |
| 44 | +further increments are subject to more requirements. |
| 45 | + |
| 46 | +Start from the most trivial config ``filer_max_purge_ops``, which should help |
| 47 | +reclaim the space more quickly:: |
| 48 | + |
| 49 | + $ ceph config set mds filer_max_purge_ops 40 |
| 50 | + |
| 51 | +Incrementing ``filer_max_purge_ops`` should just work for most |
| 52 | +clusters but if it doesn't then move ahead with tuning other configs:: |
| 53 | + |
| 54 | + $ ceph config set mds mds_max_purge_files 256 |
| 55 | + $ ceph config set mds mds_max_purge_ops 32768 |
| 56 | + $ ceph config set mds mds_max_purge_ops_per_pg 2 |
| 57 | + |
| 58 | +.. note:: Setting these values won't immediately break anything except |
| 59 | + inasmuch as they control how many delete ops we issue to the |
| 60 | + underlying RADOS cluster, but might eat up some cluster performance |
| 61 | + if the values set are staggeringly high. |
| 62 | + |
| 63 | +.. note:: The purge queue is not an auto-tuning system in terms of its work |
| 64 | + limits as compared to what is going on. So it is advised to make |
| 65 | + a conscious decision while tuning the configs based on the cluster |
| 66 | + size and workload. |
| 67 | + |
| 68 | +Examining purge queue perf counters |
| 69 | +=================================== |
| 70 | + |
| 71 | +When analysing MDS perf dumps, the purge queue statistics look like:: |
| 72 | + |
| 73 | + "purge_queue": { |
| 74 | + "pq_executing_ops": 56655, |
| 75 | + "pq_executing_ops_high_water": 65350, |
| 76 | + "pq_executing": 1, |
| 77 | + "pq_executing_high_water": 3, |
| 78 | + "pq_executed": 25, |
| 79 | + "pq_item_in_journal": 6567004 |
| 80 | + } |
| 81 | + |
| 82 | +Let us understand what each of these means: |
| 83 | + |
| 84 | +.. list-table:: |
| 85 | + :widths: 50 50 |
| 86 | + :header-rows: 1 |
| 87 | + |
| 88 | + * - Name |
| 89 | + - Description |
| 90 | + * - pq_executing_ops |
| 91 | + - Purge queue operations in flight |
| 92 | + * - pq_executing_ops_high_water |
| 93 | + - Maximum number of executing purge operations recorded |
| 94 | + * - pq_executing |
| 95 | + - Purge queue files being deleted |
| 96 | + * - pq_executing_high_water |
| 97 | + - Maximum number of executing file purges |
| 98 | + * - pq_executed |
| 99 | + - Purge queue files deleted |
| 100 | + * - pq_item_in_journal |
| 101 | + - Purge items (files) left in journal |
| 102 | + |
| 103 | +.. note:: ``pq_executing`` and ``pq_executing_ops`` might look similar but |
| 104 | + there is a small nuance. ``pq_executing`` tracks number of files |
| 105 | + in the purge queue while ``pq_executing_ops`` is the count of RADOS |
| 106 | + objects from all the files in purge queue. |
0 commit comments