Skip to content

Commit 571dd53

Browse files
committed
mon/NVMeofGwMap: add healthcheck warning NVMEOF_GATEWAY_DELETING
Add a warning when NVMeoF gateways are in DELETING state. This happens when there are namespaces under the deleted gateway's ANA group ID. The gateways are removed completely after users manually move these namespaces to another load balancing group. Or if a new gateway is deployed on that host. Signed-off-by: Vallari Agrawal <[email protected]>
1 parent 6fd292f commit 571dd53

File tree

2 files changed

+22
-0
lines changed

2 files changed

+22
-0
lines changed

doc/rados/operations/health-checks.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1665,6 +1665,14 @@ Some of the gateways are in the GW_UNAVAILABLE state. If a NVMeoF daemon has
16651665
crashed, the daemon log file (found at ``/var/log/ceph/``) may contain
16661666
troubleshooting information.
16671667

1668+
NVMEOF_GATEWAY_DELETING
1669+
_______________________
1670+
1671+
Some of the gateways are in the GW_DELETING state. They will stay in this
1672+
state until all the namespaces under the gateway's load balancing group are
1673+
moved to another load balancing group ID. This is done automatically by the
1674+
load balancing process. If this alert persist for a long time, there might
1675+
be an issue with that process.
16681676

16691677
Miscellaneous
16701678
-------------

src/mon/NVMeofGwMap.cc

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -899,6 +899,7 @@ void NVMeofGwMap::get_health_checks(health_check_map_t *checks) const
899899
{
900900
list<string> singleGatewayDetail;
901901
list<string> gatewayDownDetail;
902+
list<string> gatewayInDeletingDetail;
902903
for (const auto& created_map_pair: created_gws) {
903904
const auto& group_key = created_map_pair.first;
904905
auto& group = group_key.second;
@@ -915,6 +916,10 @@ void NVMeofGwMap::get_health_checks(health_check_map_t *checks) const
915916
ostringstream ss;
916917
ss << "NVMeoF Gateway '" << gw_id << "' is unavailable." ;
917918
gatewayDownDetail.push_back(ss.str());
919+
} else if (gw_created.availability == gw_availability_t::GW_DELETING) {
920+
ostringstream ss;
921+
ss << "NVMeoF Gateway '" << gw_id << "' is in deleting state." ;
922+
gatewayInDeletingDetail.push_back(ss.str());
918923
}
919924
}
920925
}
@@ -934,6 +939,15 @@ void NVMeofGwMap::get_health_checks(health_check_map_t *checks) const
934939
ss.str(), gatewayDownDetail.size());
935940
d.detail.swap(gatewayDownDetail);
936941
}
942+
if (!gatewayInDeletingDetail.empty()) {
943+
ostringstream ss;
944+
ss << gatewayInDeletingDetail.size() << " gateway(s) are in deleting state"
945+
<< "; namespaces are automatically balanced across remaining gateways, "
946+
<< "this should take a few minutes.";
947+
auto& d = checks->add("NVMEOF_GATEWAY_DELETING", HEALTH_WARN,
948+
ss.str(), gatewayInDeletingDetail.size());
949+
d.detail.swap(gatewayInDeletingDetail);
950+
}
937951
}
938952

939953
int NVMeofGwMap::blocklist_gw(

0 commit comments

Comments
 (0)