Skip to content

Commit 7646e14

Browse files
committed
Merge 'cql3: Introduce RF-rack-valid keyspaces' from Dawid Mędrek
This PR is an introductory step towards enforcing RF-rack-valid keyspaces in Scylla. The scope of changes: * defining RF-rack-valid keyspaces, * introducing a configuration option enforcing RF-rack-valid keyspaces, * restricting the CREATE and ALTER KEYSPACE statements so that they never lead to RF-rack invalid keyspaces, * during the initialization of a node, it verifies that all existing keyspaces are RF-rack-valid. If not, the initialization fails. We provide tests verifying that the changes behave as intended. --- Note that there are a number of things that still need to be implemented. That includes, for instance, restricting topology operations too. --- Implementation strategy (going beyond the scope of this PR): 1. Introduce the new configuration option `rf_rack_valid_keyspaces`. 2. Start enforcing RF-rack-validity in keyspaces if the option is enabled. 3. Adjust the tests: in the tree and out of it. Explicitly enable the option in all tests. 4. Once the tests have been adjusted, change the default value of the option to enabled. 5. Stop explicitly enabling the option in tests. 6. Get rid of the option. --- Fixes scylladb#20356 Fixes scylladb#23276 Fixes scylladb#23300 --- Backport: this is part of the requirements for releasing 2025.1. Closes scylladb#23138 * github.com:scylladb/scylladb: main: Refuse to start node when RF-rack-invalid keyspace exists cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces db/config: Introduce RF-rack-valid keyspaces
2 parents 0d14177 + 0e04a6f commit 7646e14

File tree

16 files changed

+518
-5
lines changed

16 files changed

+518
-5
lines changed

conf/scylla.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -837,3 +837,6 @@ maintenance_socket: ignore
837837
# Note that creating keyspaces with tablets enabled or disabled is irreversible.
838838
# The `tablets` option cannot be changed using `ALTER KEYSPACE`.
839839
enable_tablets: true
840+
841+
# Enforce RF-rack-valid keyspaces.
842+
rf_rack_valid_keyspaces: false

cql3/statements/alter_keyspace_statement.cc

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
#include <seastar/core/on_internal_error.hh>
1414
#include <stdexcept>
1515
#include "alter_keyspace_statement.hh"
16+
#include "locator/tablets.hh"
1617
#include "prepared_statement.hh"
1718
#include "service/migration_manager.hh"
1819
#include "service/storage_proxy.hh"
@@ -25,6 +26,7 @@
2526
#include "create_keyspace_statement.hh"
2627
#include "gms/feature_service.hh"
2728
#include "replica/database.hh"
29+
#include "db/config.hh"
2830

2931
using namespace std::string_literals;
3032

@@ -194,9 +196,9 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
194196
event::schema_change::target_type target_type = event::schema_change::target_type::KEYSPACE;
195197
auto ks = qp.db().find_keyspace(_name);
196198
auto ks_md = ks.metadata();
197-
const auto& tm = *qp.proxy().get_token_metadata_ptr();
199+
const auto tmptr = qp.proxy().get_token_metadata_ptr();
198200
const auto& feat = qp.proxy().features();
199-
auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, tm, feat);
201+
auto ks_md_update = _attrs->as_ks_metadata_update(ks_md, *tmptr, feat);
200202
std::vector<mutation> muts;
201203
std::vector<sstring> warnings;
202204
auto old_ks_options = get_old_options_flattened(ks);
@@ -265,6 +267,36 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
265267
muts.insert(muts.begin(), schema_mutations.begin(), schema_mutations.end());
266268
}
267269

270+
// If `rf_rack_valid_keyspaces` is enabled, it's forbidden to perform a schema change that
271+
// would lead to an RF-rack-valid keyspace. Verify that this change does not.
272+
// For more context, see: scylladb/scylladb#23071.
273+
if (qp.db().get_config().rf_rack_valid_keyspaces()) {
274+
auto rs = locator::abstract_replication_strategy::create_replication_strategy(
275+
ks_md_update->strategy_name(),
276+
locator::replication_strategy_params(ks_md_update->strategy_options(), ks_md_update->initial_tablets()));
277+
278+
try {
279+
// There are two things to note here:
280+
// 1. We hold a group0_guard, so it's correct to check this here.
281+
// The topology or schema cannot change while we're performing this query.
282+
// 2. The replication strategy we use here does NOT represent the actual state
283+
// we will arrive at after applying the schema change. For instance, if the user
284+
// did not specify the RF for some of the DCs, it's equal to 0 in the replication
285+
// strategy we pass to this function, while in reality that means that the RF
286+
// will NOT change. That is not a problem:
287+
// - RF=0 is valid for all DCs, so it won't trigger an exception on its own,
288+
// - the keyspace must've been RF-rack-valid before this change. We check that
289+
// condition for all keyspaces at startup.
290+
// The second hyphen is not really true because currently topological changes can
291+
// disturb it (see scylladb/scylladb#23345), but we ignore that.
292+
locator::assert_rf_rack_valid_keyspace(_name, tmptr, *rs);
293+
} catch (const std::exception& e) {
294+
// There's no guarantee what the type of the exception will be, so we need to
295+
// wrap it manually here in a type that can be passed to the user.
296+
throw exceptions::invalid_request_exception(e.what());
297+
}
298+
}
299+
268300
auto ret = ::make_shared<event::schema_change>(
269301
event::schema_change::change_type::UPDATED,
270302
target_type,

cql3/statements/create_keyspace_statement.cc

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
#include <seastar/core/coroutine.hh>
1212
#include "cql3/statements/create_keyspace_statement.hh"
1313
#include "cql3/statements/ks_prop_defs.hh"
14+
#include "exceptions/exceptions.hh"
15+
#include "locator/tablets.hh"
1416
#include "prepared_statement.hh"
1517
#include "data_dictionary/data_dictionary.hh"
1618
#include "data_dictionary/keyspace_metadata.hh"
@@ -90,14 +92,14 @@ void create_keyspace_statement::validate(query_processor& qp, const service::cli
9092

9193
future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector<mutation>, cql3::cql_warnings_vec>> create_keyspace_statement::prepare_schema_mutations(query_processor& qp, const query_options&, api::timestamp_type ts) const {
9294
using namespace cql_transport;
93-
const auto& tm = *qp.proxy().get_token_metadata_ptr();
95+
const auto tmptr = qp.proxy().get_token_metadata_ptr();
9496
const auto& feat = qp.proxy().features();
9597
const auto& cfg = qp.db().get_config();
9698
std::vector<mutation> m;
9799
std::vector<sstring> warnings;
98100

99101
try {
100-
auto ksm = _attrs->as_ks_metadata(_name, tm, feat, cfg);
102+
auto ksm = _attrs->as_ks_metadata(_name, *tmptr, feat, cfg);
101103
m = service::prepare_new_keyspace_announcement(qp.db().real_database(), ksm, ts);
102104
// If the new keyspace uses tablets, as long as there are features
103105
// which aren't supported by tablets we want to warn the user that
@@ -119,6 +121,21 @@ future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, std::vector
119121
warnings.push_back("Keyspace `initial` tablets option is deprecated. Use per-table tablet options instead.");
120122
}
121123
}
124+
125+
// If `rf_rack_valid_keyspaces` is enabled, it's forbidden to create an RF-rack-invalid keyspace.
126+
// Verify that it's RF-rack-valid.
127+
// For more context, see: scylladb/scylladb#23071.
128+
if (cfg.rf_rack_valid_keyspaces()) {
129+
try {
130+
// We hold a group0_guard, so it's correct to check this here.
131+
// The topology or schema cannot change while we're performing this query.
132+
locator::assert_rf_rack_valid_keyspace(_name, tmptr, *rs);
133+
} catch (const std::exception& e) {
134+
// There's no guarantee what the type of the exception will be, so we need to
135+
// wrap it manually here in a type that can be passed to the user.
136+
throw exceptions::invalid_request_exception(e.what());
137+
}
138+
}
122139
} catch (const exceptions::already_exists_exception& e) {
123140
if (!_if_not_exists) {
124141
co_return coroutine::exception(std::current_exception());

db/config.cc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1382,6 +1382,9 @@ db::config::config(std::shared_ptr<db::extensions> exts)
13821382
, disk_space_monitor_high_polling_interval_in_seconds(this, "disk_space_monitor_high_polling_interval_in_seconds", value_status::Used, 1, "Disk-space polling interval at or above polling threshold")
13831383
, disk_space_monitor_polling_interval_threshold(this, "disk_space_monitor_polling_interval_threshold", value_status::Used, 0.9, "Disk-space polling threshold. Polling interval is increased when disk utilization is greater than or equal to this threshold")
13841384
, enable_create_table_with_compact_storage(this, "enable_create_table_with_compact_storage", liveness::LiveUpdate, value_status::Used, false, "Enable the deprecated feature of CREATE TABLE WITH COMPACT STORAGE. This feature will eventually be removed in a future version.")
1385+
, rf_rack_valid_keyspaces(this, "rf_rack_valid_keyspaces", liveness::MustRestart, value_status::Used, false,
1386+
"Enforce RF-rack-valid keyspaces. Additionally, if there are existing RF-rack-invalid "
1387+
"keyspaces, attempting to start a node with this option ON will fail.")
13851388
, default_log_level(this, "default_log_level", value_status::Used, seastar::log_level::info, "Default log level for log messages")
13861389
, logger_log_level(this, "logger_log_level", value_status::Used, {}, "Map of logger name to log level. Valid log levels are 'error', 'warn', 'info', 'debug' and 'trace'")
13871390
, log_to_stdout(this, "log_to_stdout", value_status::Used, true, "Send log output to stdout")

db/config.hh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -546,6 +546,8 @@ public:
546546

547547
named_value<bool> enable_create_table_with_compact_storage;
548548

549+
named_value<bool> rf_rack_valid_keyspaces;
550+
549551
static const sstring default_tls_priority;
550552
private:
551553
template<typename T>

docs/architecture/tablets.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,12 @@ You can create a keyspace with tablets enabled with the ``tablets = {'enabled':
140140
Limitations and Unsupported Features
141141
--------------------------------------
142142

143+
.. warning::
144+
145+
If a keyspace has tablets enabled, it must remain :term:`RF-rack-valid <RF-rack-valid keyspace>`
146+
throughout its lifetime. Failing to keep that invariant satisfied may result in data inconsistencies,
147+
performance problems, or other issues.
148+
143149
The following ScyllaDB features are not supported if a keyspace has tablets
144150
enabled:
145151

docs/architecture/zero-token-nodes.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,8 @@ Note that:
2121
* Zero-token nodes never store replicated data, so running ``nodetool rebuild``,
2222
``nodetool repair``, and ``nodetool cleanup`` can be skipped as it does not
2323
affect zero-token nodes.
24+
* Racks consisting solely of zero-token nodes are not taken into consideration
25+
when deciding whether a keyspace is :term:`RF-rack-valid <RF-rack-valid keyspace>`.
26+
However, an RF-rack-valid keyspace must have the replication factor equal to 0
27+
in an :doc:`arbiter DC </operating-scylla/procedures/cluster-management/arbiter-dc>`.
28+
Otherwise, it is RF-rack-invalid.

docs/cql/ddl.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,8 @@ By default, a keyspace is created with tablets enabled. You can use the ``tablet
224224
to opt out a keyspace from tablets-based distribution.
225225

226226
You may want to opt out if you plan to use features that are not supported for keyspaces
227-
with tablets enabled. See :ref:`Limitations and Unsupported Features <tablets-limitations>`
227+
with tablets enabled. Keyspaces using tablets must also remain :term:`RF-rack-valid <RF-rack-valid keyspace>`
228+
throughout their lifetime. See :ref:`Limitations and Unsupported Features <tablets-limitations>`
228229
for details.
229230

230231
**The ``initial`` sub-option (deprecated)**
@@ -300,6 +301,7 @@ Modifying a keyspace with tablets enabled is possible and doesn't require any sp
300301
- If there's any other ongoing global topology operation, executing the ``ALTER`` statement will fail (with an explicit and specific error) and needs to be repeated.
301302
- The ``ALTER`` statement may take longer than the regular query timeout, and even if it times out, it will continue to execute in the background.
302303
- The replication strategy cannot be modified, as keyspaces with tablets only support ``NetworkTopologyStrategy``.
304+
- The ``ALTER`` statement will fail if it would make the keyspace :term:`RF-rack-invalid <RF-rack-valid keyspace>`.
303305

304306
.. _drop-keyspace-statement:
305307

docs/reference/glossary.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,12 @@ Glossary
127127
RBNO is enabled by default for a subset node operations.
128128
See :doc:`Repair Based Node Operations </operating-scylla/procedures/cluster-management/repair-based-node-operation>` for details.
129129

130+
RF-rack-valid keyspace
131+
A keyspace with :doc:`tablets </architecture/tablets>` enabled is RF-rack-valid if all of its data centers
132+
have the :term:`Replication Factor (RF) <Replication Factor (RF)>` of 0, 1, or the number of racks in that data center.
133+
134+
Keyspaces with tablets disabled are always deemed RF-rack-valid, even if they do not satisfy the aforementioned condition.
135+
130136
Shard
131137
Each ScyllaDB node is internally split into *shards*, an independent thread bound to a dedicated core.
132138
Each shard of data is allotted CPU, RAM, persistent storage, and networking resources which it uses as efficiently as possible.

locator/tablets.cc

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
77
*/
88

9+
#include "locator/network_topology_strategy.hh"
910
#include "locator/tablet_replication_strategy.hh"
1011
#include "locator/tablets.hh"
1112
#include "locator/tablet_metadata_guard.hh"
@@ -1042,6 +1043,64 @@ void tablet_metadata_guard::subscribe() {
10421043
});
10431044
}
10441045

1046+
void assert_rf_rack_valid_keyspace(std::string_view ks, const token_metadata_ptr tmptr, const abstract_replication_strategy& ars) {
1047+
tablet_logger.debug("[assert_rf_rack_valid_keyspace]: Starting verifying that keyspace '{}' is RF-rack-valid", ks);
1048+
1049+
// Any keyspace that does NOT use tablets is RF-rack-valid.
1050+
if (!ars.uses_tablets()) {
1051+
tablet_logger.debug("[assert_rf_rack_valid_keyspace]: Keyspace '{}' has been verified to be RF-rack-valid (no tablets)", ks);
1052+
return;
1053+
}
1054+
1055+
// Tablets can only be used with NetworkTopologyStrategy.
1056+
SCYLLA_ASSERT(ars.get_type() == replication_strategy_type::network_topology);
1057+
const auto& nts = *static_cast<const network_topology_strategy*>(std::addressof(ars));
1058+
1059+
const auto& dc_rack_map = tmptr->get_topology().get_datacenter_racks();
1060+
1061+
for (const auto& dc : nts.get_datacenters()) {
1062+
if (!dc_rack_map.contains(dc)) {
1063+
on_internal_error(tablet_logger, seastar::format(
1064+
"Precondition violated: DC '{}' is part of the passed replication strategy, but it is not "
1065+
"known by the passed locator::token_metadata_ptr.", dc));
1066+
}
1067+
}
1068+
1069+
for (const auto& [dc, rack_map] : dc_rack_map) {
1070+
tablet_logger.debug("[assert_rf_rack_valid_keyspace]: Verifying for '{}' / '{}'", ks, dc);
1071+
1072+
size_t normal_rack_count = 0;
1073+
for (const auto& [_, rack_nodes] : rack_map) {
1074+
// We must ignore zero-token nodes because they don't take part in replication.
1075+
// Verify that this rack has at least one normal node.
1076+
const bool normal_rack = std::ranges::any_of(rack_nodes, [tmptr] (host_id host_id) {
1077+
return tmptr->is_normal_token_owner(host_id);
1078+
});
1079+
if (normal_rack) {
1080+
++normal_rack_count;
1081+
}
1082+
}
1083+
1084+
const size_t rf = nts.get_replication_factor(dc);
1085+
1086+
// We must not allow for a keyspace to become RF-rack-invalid. Any attempt at that must be rejected.
1087+
// For more context, see: scylladb/scylladb#23276.
1088+
const bool invalid_rf = rf != normal_rack_count && rf != 1 && rf != 0;
1089+
// Edge case: the DC in question is an arbiter DC and does NOT take part in replication.
1090+
// Any positive RF for that DC is invalid.
1091+
const bool invalid_arbiter_dc = normal_rack_count == 0 && rf > 0;
1092+
1093+
if (invalid_rf || invalid_arbiter_dc) {
1094+
throw std::invalid_argument(std::format(
1095+
"The option `rf_rack_valid_keyspaces` is enabled. It requires that all keyspaces are RF-rack-valid. "
1096+
"That condition is violated: keyspace '{}' doesn't satisfy it for DC '{}': RF={} vs. rack count={}.",
1097+
ks, std::string_view(dc), rf, normal_rack_count));
1098+
}
1099+
}
1100+
1101+
tablet_logger.debug("[assert_rf_rack_valid_keyspace]: Keyspace '{}' has been verified to be RF-rack-valid", ks);
1102+
}
1103+
10451104
}
10461105

10471106
auto fmt::formatter<locator::resize_decision_way>::format(const locator::resize_decision_way& way, fmt::format_context& ctx) const

0 commit comments

Comments
 (0)