-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Problem Description
In a stretched cluster configuration, both Namespaces and Gateways (GWs) are created with an associated location tag (for example, SiteA, SiteB).
When the location of a Gateway or a Namespace is modified, an automatic rebalancing process is triggered. This process is responsible for redistributing namespaces across ANA groups to maintain location-aware load balancing.
Rebalancing Limitation
A corner case arises when the last Gateway representing a specific location (e.g., SiteA) changes its location. In this scenario, the rebalancing process cannot select a suitable load-balancing (ANA) group for namespaces that are still tagged with the original location (SiteA). As a result:
These namespaces remain associated with their existing ANA group.
They effectively become homeless from a location-aware perspective.
The system state becomes non-obvious to the user.
Current CLI Limitation
The existing nvme-gw show CLI provides only aggregate namespace counts per Gateway, for example:
{
"gw-id": "654abc50dd67",
"anagrp-id": 3,
"location": "SiteB",
"admin-state": "ENABLED",
"num-namespaces": 18,
"performed-full-startup": 1,
"availability": "AVAILABLE",
"num-listeners": 3,
"ana-states": "1: WAIT_BLOCKLIST_CMPL, 2: STANDBY, 3: ACTIVE"
}
This output does not expose how namespaces are distributed by location, which makes it difficult to:
Diagnose rebalance failures
Understand why namespaces remain attached to a given Gateway
Explain why certain administrative operations fail
Proposed Enhancement
Each Gateway has sufficient internal knowledge of its namespaces and their location tags. Using this information, the Gateway should expose a location-to-namespace count map.
Example
If a Gateway with LB group 3 hosts 18 namespaces in total, distributed across two locations:
SiteA: 5 namespaces
SiteB: 13 namespaces
the new CLI will output for all LB groups:
LBGroup 1:
Native Location: SiteC
Namespaces:
Location number-namespaces
SiteC 15
LBGroup 2:
Native Location: SiteD
Namespaces:
Location number-namespaces
SiteD 15
LBGroup 3 :
Native Location SiteB (this is a location of the GW LB group owner)
Namespaces:
Location number-namespaces
SiteB 13
SiteA 5
Native location can be cached from the nvme-gw show command like it is done for other GW commands.
see the helper function defined in cephutils get_ana_grp_location(self):
Benefits
Makes rebalance anomalies immediately visible
Clearly exposes homeless namespaces
Additional Use Case: Gateway Deletion
Another problematic scenario occurs when a Gateway that owns an ANA group containing namespaces from multiple locations receives a Delete Gateway command.
Current Behavior
The Gateway cannot be deleted while it hosts namespaces.
The user receives a generic failure with limited diagnostic information.
The reason for the failure is not obvious.
Proposed Behavior
The new CLI output allows the system to clearly explain why deletion is blocked:
The Gateway still hosts namespaces associated with locations that have no alternative Gateways.
These namespaces must either: change location, or be deleted