Skip to content

Commit 6b1bbaf

Browse files
authored
Merge pull request ceph#61441 from ivancich/wip-dynamic-resharding-min-shards
rgw: allow per-bucket minimum number of shards Reviewed-by: Casey Bodley <[email protected]>
2 parents 452e71d + a9b00cf commit 6b1bbaf

File tree

13 files changed

+212
-67
lines changed

13 files changed

+212
-67
lines changed

doc/radosgw/dynamicresharding.rst

Lines changed: 39 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,20 @@ RGW Dynamic Bucket Index Resharding
66

77
.. versionadded:: Luminous
88

9-
A large bucket index can lead to performance problems, which can
10-
be addressed by sharding bucket indexes.
11-
Until Luminous, changing the number of bucket shards (resharding)
12-
needed to be done offline, with RGW services disabled.
13-
Since the Luminous release Ceph has supported online bucket resharding.
9+
A bucket index object with too many entries can lead to performance
10+
problems. This can be addressed by resharding bucket indexes. Until
11+
Luminous, changing the number of bucket shards (resharding) could only
12+
be done offline, with RGW services disabled. Since the Luminous
13+
release Ceph has supported online bucket resharding.
1414

1515
Each bucket index shard can handle its entries efficiently up until
16-
reaching a certain threshold. If this threshold is
17-
exceeded the system can suffer from performance issues. The dynamic
18-
resharding feature detects this situation and automatically increases
19-
the number of shards used by a bucket's index, resulting in a
20-
reduction of the number of entries in each shard. This
21-
process is transparent to the user. Writes to the target bucket
22-
are blocked (but reads are not) briefly during resharding process.
16+
reaching a certain threshold number. If this threshold is exceeded the
17+
system can suffer from performance issues. The dynamic resharding
18+
feature detects this situation and automatically increases the number
19+
of shards used by a bucket's index, resulting in a reduction of the
20+
number of entries in each shard. This process is transparent to the
21+
user. Writes to the target bucket can be blocked briefly during
22+
resharding process, but reads are not.
2323

2424
By default dynamic bucket index resharding can only increase the
2525
number of bucket index shards to 1999, although this upper-bound is a
@@ -29,10 +29,16 @@ spread the number of entries across the bucket index
2929
shards more evenly.
3030

3131
Detection of resharding opportunities runs as a background process
32-
that periodically
33-
scans all buckets. A bucket that requires resharding is added to
34-
a queue. A thread runs in the background and processes the queueued
35-
resharding tasks, one at a time and in order.
32+
that periodically scans all buckets. A bucket that requires resharding
33+
is added to a queue. A thread runs in the background and processes the
34+
queueued resharding tasks one at a time.
35+
36+
Starting with Tentacle, dynamic resharding has the ability to reduce
37+
the number of shards. Once the condition allowing reduction is noted,
38+
there is a time delay before it will actually be executed, in case the
39+
number of objects increases in the near future. The goal of the delay
40+
is to avoid thrashing where resharding keeps getting re-invoked on
41+
buckets that fluctuate in numbers of objects.
3642

3743
Multisite
3844
=========
@@ -48,6 +54,8 @@ Configuration
4854
.. confval:: rgw_dynamic_resharding
4955
.. confval:: rgw_max_objs_per_shard
5056
.. confval:: rgw_max_dynamic_shards
57+
.. confval:: rgw_dynamic_resharding_may_reduce
58+
.. confval:: rgw_dynamic_resharding_reduction_wait
5159
.. confval:: rgw_reshard_bucket_lock_duration
5260
.. confval:: rgw_reshard_thread_interval
5361
.. confval:: rgw_reshard_num_logs
@@ -136,7 +144,7 @@ For example, the output at each dynamic resharding stage is shown below:
136144
Cancel pending bucket resharding
137145
--------------------------------
138146

139-
Note: Bucket resharding operations cannot be cancelled while executing. ::
147+
Note: Bucket resharding tasks cannot be cancelled once they start executing. ::
140148

141149
# radosgw-admin reshard cancel --bucket <bucket_name>
142150

@@ -158,6 +166,20 @@ since the former is prime. A variety of web sites have lists of prime
158166
numbers; search for "list of prime numbers" with your favorite
159167
search engine to locate some web sites.
160168

169+
Setting a bucket's minimum number of shards
170+
-------------------------------------------
171+
172+
::
173+
174+
# radosgw-admin bucket set-min-shards --bucket <bucket_name> --num-shards <min number of shards>
175+
176+
Since dynamic resharding can now reduce the number of shards,
177+
administrators may want to prevent the number of shards from becoming
178+
too low, for example if the expect the number of objects to increase
179+
in the future. This command allows administrators to set a per-bucket
180+
minimum. This does not, however, prevent administrators from manually
181+
resharding to a lower number of shards.
182+
161183
Troubleshooting
162184
===============
163185

src/rgw/driver/rados/rgw_bucket.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2815,6 +2815,7 @@ void init_default_bucket_layout(CephContext *cct, rgw::BucketLayout& layout,
28152815

28162816
if (shards) {
28172817
layout.current_index.layout.normal.num_shards = *shards;
2818+
layout.current_index.layout.normal.min_num_shards = *shards;
28182819
} else if (cct->_conf->rgw_override_bucket_index_max_shards > 0) {
28192820
layout.current_index.layout.normal.num_shards =
28202821
cct->_conf->rgw_override_bucket_index_max_shards;

src/rgw/driver/rados/rgw_rados.cc

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10823,6 +10823,7 @@ int RGWRados::cls_bucket_head_async(const DoutPrefixProvider *dpp, const RGWBuck
1082310823
void RGWRados::calculate_preferred_shards(const DoutPrefixProvider* dpp,
1082410824
const uint64_t num_objs,
1082510825
const uint32_t num_source_shards,
10826+
const uint32_t min_layout_shards,
1082610827
bool& need_resharding,
1082710828
uint32_t* suggested_num_shards)
1082810829
{
@@ -10834,6 +10835,7 @@ void RGWRados::calculate_preferred_shards(const DoutPrefixProvider* dpp,
1083410835

1083510836
RGWBucketReshard::calculate_preferred_shards(dpp,
1083610837
max_dynamic_shards,
10838+
min_layout_shards,
1083710839
max_objs_per_shard,
1083810840
is_multisite,
1083910841
num_objs,
@@ -10867,8 +10869,11 @@ int RGWRados::check_bucket_shards(const RGWBucketInfo& bucket_info,
1086710869
uint32_t suggested_num_shards = 0;
1086810870
const uint32_t num_source_shards =
1086910871
rgw::current_num_shards(bucket_info.layout);
10872+
const uint32_t min_layout_shards =
10873+
rgw::current_min_layout_shards(bucket_info.layout);
1087010874

10871-
calculate_preferred_shards(dpp, num_objs, num_source_shards,
10875+
calculate_preferred_shards(dpp, num_objs,
10876+
num_source_shards, min_layout_shards,
1087210877
need_resharding, &suggested_num_shards);
1087310878
if (! need_resharding) {
1087410879
return 0;

src/rgw/driver/rados/rgw_rados.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1630,6 +1630,7 @@ int restore_obj_from_cloud(RGWLCCloudTierCtx& tier_ctx,
16301630
void calculate_preferred_shards(const DoutPrefixProvider* dpp,
16311631
const uint64_t num_objs,
16321632
const uint32_t current_shard_count,
1633+
const uint32_t min_layout_shards,
16331634
bool& need_resharding,
16341635
uint32_t* suggested_num_shard_count = nullptr);
16351636

src/rgw/driver/rados/rgw_reshard.cc

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -69,23 +69,13 @@ const std::initializer_list<uint16_t> RGWBucketReshard::reshard_primes = {
6969
};
7070

7171

72-
uint32_t RGWBucketReshard::get_prime_shard_count(
73-
uint32_t shard_count,
74-
uint32_t max_dynamic_shards,
75-
uint32_t min_dynamic_shards)
76-
{
72+
uint32_t RGWBucketReshard::nearest_prime(uint32_t shard_count) {
7773
uint32_t prime_shard_count =
7874
get_prime_shards_greater_or_equal(shard_count);
7975

8076
// if we cannot find a larger prime number, then just use what was
8177
// passed in
82-
if (! prime_shard_count) {
83-
prime_shard_count = shard_count;
84-
}
85-
86-
// keep within min/max bounds
87-
return std::min(max_dynamic_shards,
88-
std::max(min_dynamic_shards, prime_shard_count));
78+
return prime_shard_count ? prime_shard_count : shard_count;
8979
}
9080

9181

@@ -96,6 +86,7 @@ uint32_t RGWBucketReshard::get_prime_shard_count(
9686
void RGWBucketReshard::calculate_preferred_shards(
9787
const DoutPrefixProvider* dpp,
9888
const uint32_t max_dynamic_shards,
89+
const uint32_t min_layout_shards,
9990
const uint64_t max_objs_per_shard,
10091
const bool is_multisite,
10192
const uint64_t num_objs,
@@ -139,10 +130,13 @@ void RGWBucketReshard::calculate_preferred_shards(
139130
}
140131

141132
if (prefer_prime) {
142-
calculated_num_shards = get_prime_shard_count(
143-
calculated_num_shards, max_dynamic_shards, min_dynamic_shards);
133+
calculated_num_shards = nearest_prime(calculated_num_shards);
144134
}
145135

136+
calculated_num_shards =
137+
std::min(max_dynamic_shards,
138+
std::max({ calculated_num_shards, min_dynamic_shards, min_layout_shards }));
139+
146140
ldpp_dout(dpp, 20) << __func__ << ": reshard " << verb <<
147141
" suggested; current average (objects/shard) is " <<
148142
float(num_objs) / current_num_shards << ", which is not within " <<
@@ -461,6 +455,7 @@ static int init_target_layout(rgw::sal::RadosStore* store,
461455
rgw::bucket_index_layout_generation target;
462456
target.layout.type = rgw::BucketIndexType::Normal;
463457
target.layout.normal.num_shards = new_num_shards;
458+
target.layout.normal.min_num_shards = current.layout.normal.min_num_shards;
464459
target.gen = current.gen + 1;
465460

466461
if (bucket_info.reshard_status == cls_rgw_reshard_status::IN_PROGRESS) {
@@ -1256,7 +1251,7 @@ int RGWBucketReshard::do_reshard(const rgw::bucket_index_layout_generation& curr
12561251
// block the client op and complete the resharding
12571252
ceph_assert(bucket_info.layout.resharding == rgw::BucketReshardState::InProgress);
12581253
ret = reshard_process(current, max_op_entries, target_shards_mgr, verbose_json_out, out,
1259-
formatter, bucket_info.layout.resharding, dpp, y);
1254+
formatter, bucket_info.layout.resharding, dpp, y);
12601255
if (ret < 0) {
12611256
ldpp_dout(dpp, 0) << __func__ << ": failed in progress state of reshard ret = " << ret << dendl;
12621257
return ret;
@@ -1637,6 +1632,9 @@ int RGWReshard::process_entry(const cls_rgw_reshard_entry& entry,
16371632
ret = store->getRados()->get_bucket_stats(dpp, bucket_info,
16381633
bucket_info.layout.current_index,
16391634
-1, nullptr, nullptr, stats, nullptr, nullptr);
1635+
if (ret < 0) {
1636+
return clean_up("unable to access buckets current stats");
1637+
}
16401638

16411639
// determine current number of bucket entries across shards
16421640
uint64_t num_entries = 0;
@@ -1645,15 +1643,17 @@ int RGWReshard::process_entry(const cls_rgw_reshard_entry& entry,
16451643
}
16461644

16471645
const uint32_t current_shard_count =
1648-
rgw::num_shards(bucket_info.get_current_index().layout.normal);
1646+
rgw::current_num_shards(bucket_info.layout);
1647+
const uint32_t min_layout_shards =
1648+
rgw::current_min_layout_shards(bucket_info.layout);
16491649

16501650
bool needs_resharding { false };
16511651
uint32_t suggested_shard_count { 0 };
16521652
// calling this rados function determines various rados values
16531653
// needed to perform the calculation before calling
16541654
// calculating_preferred_shards() in this class
16551655
store->getRados()->calculate_preferred_shards(
1656-
dpp, num_entries, current_shard_count,
1656+
dpp, num_entries, current_shard_count, min_layout_shards,
16571657
needs_resharding, &suggested_shard_count);
16581658

16591659
// if we no longer need resharding or currently need to expand
@@ -1711,7 +1711,6 @@ int RGWReshard::process_entry(const cls_rgw_reshard_entry& entry,
17111711
}
17121712

17131713
// all checkes passed; we can reshard...
1714-
17151714
RGWBucketReshard br(store, bucket_info, bucket_attrs, nullptr);
17161715

17171716
ReshardFaultInjector f; // no fault injected

src/rgw/driver/rados/rgw_reshard.h

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -160,14 +160,12 @@ class RGWBucketReshard {
160160
}
161161
}
162162

163-
// returns a preferred number of shards given a calculated number of
164-
// shards based on max_dynamic_shards and the list of prime values
165-
static uint32_t get_prime_shard_count(uint32_t suggested_shards,
166-
uint32_t max_dynamic_shards,
167-
uint32_t min_dynamic_shards);
163+
// returns a preferred number of shards as a prime value
164+
static uint32_t nearest_prime(uint32_t suggested_shards);
168165

169166
static void calculate_preferred_shards(const DoutPrefixProvider* dpp,
170167
const uint32_t max_dynamic_shards,
168+
const uint32_t min_layout_shards,
171169
const uint64_t max_objs_per_shard,
172170
const bool is_multisite,
173171
const uint64_t num_objs,

src/rgw/radosgw-admin/radosgw-admin.cc

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
1-
21
// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
32
// vim: ts=8 sw=2 smarttab ft=cpp
43

54
/*
65
* Copyright (C) 2025 IBM
7-
*/
6+
*/
87

98
#include <cerrno>
109
#include <string>
@@ -169,6 +168,7 @@ void usage()
169168
cout << " bucket check unlinked check for object versions that are not visible in a bucket listing \n";
170169
cout << " bucket chown link bucket to specified user and update its object ACLs\n";
171170
cout << " bucket reshard reshard bucket\n";
171+
cout << " bucket set-min-shards set the minimum number of shards that dynamic resharding will consider for a bucket\n";
172172
cout << " bucket rewrite rewrite all objects in the specified bucket\n";
173173
cout << " bucket sync checkpoint poll a bucket's sync status until it catches up to its remote\n";
174174
cout << " bucket sync disable disable bucket sync\n";
@@ -699,6 +699,7 @@ enum class OPT {
699699
BUCKET_RM,
700700
BUCKET_REWRITE,
701701
BUCKET_RESHARD,
702+
BUCKET_SET_MIN_SHARDS,
702703
BUCKET_CHOWN,
703704
BUCKET_RADOS_LIST,
704705
BUCKET_SHARD_OBJECTS,
@@ -937,6 +938,7 @@ static SimpleCmd::Commands all_cmds = {
937938
{ "bucket rm", OPT::BUCKET_RM },
938939
{ "bucket rewrite", OPT::BUCKET_REWRITE },
939940
{ "bucket reshard", OPT::BUCKET_RESHARD },
941+
{ "bucket set-min-shards", OPT::BUCKET_SET_MIN_SHARDS },
940942
{ "bucket chown", OPT::BUCKET_CHOWN },
941943
{ "bucket radoslist", OPT::BUCKET_RADOS_LIST },
942944
{ "bucket rados list", OPT::BUCKET_RADOS_LIST },
@@ -8803,6 +8805,51 @@ int main(int argc, const char **argv)
88038805
}
88048806
} // OPT_RESHARD_CANCEL
88058807

8808+
if (opt_cmd == OPT::BUCKET_SET_MIN_SHARDS) {
8809+
if (bucket_name.empty()) {
8810+
cerr << "ERROR: bucket not specified" << std::endl;
8811+
return -EINVAL;
8812+
}
8813+
8814+
if (!num_shards_specified) {
8815+
cerr << "ERROR: --num-shards not specified" << std::endl;
8816+
return -EINVAL;
8817+
}
8818+
8819+
if (num_shards < 1) {
8820+
cerr << "ERROR: --num-shards must be at least 1" << std::endl;
8821+
return -EINVAL;
8822+
}
8823+
8824+
int ret = init_bucket(tenant, bucket_name, bucket_id, &bucket);
8825+
if (ret < 0) {
8826+
return -ret;
8827+
}
8828+
auto& bucket_info = bucket->get_info();
8829+
8830+
const rgw::BucketIndexType type =
8831+
bucket_info.layout.current_index.layout.type;
8832+
if (type != rgw::BucketIndexType::Normal) {
8833+
cerr << "ERROR: the bucket's layout is type " << type <<
8834+
" instead of type " << rgw::BucketIndexType::Normal <<
8835+
" and therefore does not have a "
8836+
"minimum number of shards that can be altered" << std::endl;
8837+
return EINVAL;
8838+
}
8839+
8840+
uint32_t& min_num_shards =
8841+
bucket_info.layout.current_index.layout.normal.min_num_shards;
8842+
min_num_shards = num_shards;
8843+
8844+
ret = bucket->put_info(dpp(), false, real_time(), null_yield);
8845+
if (ret < 0) {
8846+
cerr << "ERROR: failed writing bucket instance info: " << cpp_strerror(-ret) << std::endl;
8847+
return -ret;
8848+
}
8849+
8850+
return 0;
8851+
} // SET_MIN_SHARDS
8852+
88068853
if (opt_cmd == OPT::OBJECT_UNLINK) {
88078854
int ret = init_bucket(tenant, bucket_name, bucket_id, &bucket);
88088855
if (ret < 0) {

src/rgw/rgw_bucket_layout.cc

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,29 +81,37 @@ void decode_json_obj(BucketHashType& t, JSONObj *obj)
8181
// bucket_index_normal_layout
8282
void encode(const bucket_index_normal_layout& l, bufferlist& bl, uint64_t f)
8383
{
84-
ENCODE_START(1, 1, bl);
84+
ENCODE_START(2, 1, bl);
8585
encode(l.num_shards, bl);
8686
encode(l.hash_type, bl);
87+
encode(l.min_num_shards, bl);
8788
ENCODE_FINISH(bl);
8889
}
8990
void decode(bucket_index_normal_layout& l, bufferlist::const_iterator& bl)
9091
{
91-
DECODE_START(1, bl);
92+
DECODE_START(2, bl);
9293
decode(l.num_shards, bl);
9394
decode(l.hash_type, bl);
95+
if (struct_v >= 2) {
96+
decode(l.min_num_shards, bl);
97+
}
9498
DECODE_FINISH(bl);
9599
}
96100
void encode_json_impl(const char *name, const bucket_index_normal_layout& l, ceph::Formatter *f)
97101
{
98102
f->open_object_section(name);
99103
encode_json("num_shards", l.num_shards, f);
100104
encode_json("hash_type", l.hash_type, f);
105+
encode_json("min_num_shards", l.min_num_shards, f);
101106
f->close_section();
102107
}
103108
void decode_json_obj(bucket_index_normal_layout& l, JSONObj *obj)
104109
{
105110
JSONDecoder::decode_json("num_shards", l.num_shards, obj);
106111
JSONDecoder::decode_json("hash_type", l.hash_type, obj);
112+
113+
// if not set in json, set to default value of 1
114+
JSONDecoder::decode_json("min_num_shards", l.min_num_shards, obj, 1);
107115
}
108116

109117
// bucket_index_layout

0 commit comments

Comments
 (0)