Skip to content

To add new gw cli namespace-locationΒ #1744

@leonidc

Description

@leonidc

Problem Description

In a stretched cluster configuration, both Namespaces and Gateways (GWs) are created with an associated location tag (for example, SiteA, SiteB).

When the location of a Gateway or a Namespace is modified, an automatic rebalancing process is triggered. This process is responsible for redistributing namespaces across ANA groups to maintain location-aware load balancing.

Rebalancing Limitation

A corner case arises when the last Gateway representing a specific location (e.g., SiteA) changes its location. In this scenario, the rebalancing process cannot select a suitable load-balancing (ANA) group for namespaces that are still tagged with the original location (SiteA). As a result:

These namespaces remain associated with their existing ANA group.
They effectively become homeless from a location-aware perspective.
The system state becomes non-obvious to the user.

Current CLI Limitation

The existing nvme-gw show CLI provides only aggregate namespace counts per Gateway, for example:

{
"gw-id": "654abc50dd67",
"anagrp-id": 3,
"location": "SiteB",
"admin-state": "ENABLED",
"num-namespaces": 18,
"performed-full-startup": 1,
"availability": "AVAILABLE",
"num-listeners": 3,
"ana-states": "1: WAIT_BLOCKLIST_CMPL, 2: STANDBY, 3: ACTIVE"
}

This output does not expose how namespaces are distributed by location, which makes it difficult to:

Diagnose rebalance failures

Understand why namespaces remain attached to a given Gateway

Explain why certain administrative operations fail

Proposed Enhancement

Each Gateway has sufficient internal knowledge of its namespaces and their location tags. Using this information, the Gateway should expose a location-to-namespace count map.

Example

If a Gateway with LB group 3 hosts 18 namespaces in total, distributed across two locations:

SiteA: 5 namespaces
SiteB: 13 namespaces

the new CLI will output for all LB groups:

LBGroup 1:
Native Location: SiteC
Namespaces:
Location number-namespaces
SiteC 15

LBGroup 2:
Native Location: SiteD
Namespaces:
Location number-namespaces
SiteD 15

LBGroup 3 :
Native Location SiteB (this is a location of the GW LB group owner)
Namespaces:
Location number-namespaces
SiteB 13
SiteA 5

Native location can be cached from the nvme-gw show command like it is done for other GW commands.
see the helper function defined in cephutils get_ana_grp_location(self):

Benefits

Makes rebalance anomalies immediately visible

Clearly exposes homeless namespaces

Additional Use Case: Gateway Deletion

Another problematic scenario occurs when a Gateway that owns an ANA group containing namespaces from multiple locations receives a Delete Gateway command.

Current Behavior

The Gateway cannot be deleted while it hosts namespaces.
The user receives a generic failure with limited diagnostic information.
The reason for the failure is not obvious.

Proposed Behavior

The new CLI output allows the system to clearly explain why deletion is blocked:
The Gateway still hosts namespaces associated with locations that have no alternative Gateways.

These namespaces must either: change location, or be deleted

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

πŸ†• New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions