Skip to content

vSphere 7and 8 not detecting namespace and device to be used as vmfs datastores (resolved ceph version 19.2.3 squid (stable) )  #1753

@zouhairnador

Description

@zouhairnador

ESXi 8.0 and 7.0 Fails to Discover Multiple NVMe-oF Namespaces on Ceph (SPDK) due to ANA Group "Inaccessible" State.
The Issue: In the Ceph NVMe-oF gateway, adding multiple namespaces (NSIDs 2, 3, 4, etc.) automatically assigns them to new ANA Groups (Load Balancing groups). By default, the Listener (the IP entry point) may only mark ANA Group 1 as active/optimized. ESXi connects successfully to the target but discovers 0 devices (or only the first one) because the paths to Groups 2, 3, and 4 are reported as inaccessible by the controller.
The Solution: You must manually align the Listener's "Allowed Groups" with the "Namespace Groups" using the SPDK RPC interface to flip their state to optimized.


🛠 The "What Worked" Guide

  1. Verify the Disconnect (Container Side)
    Access your NVMe-oF gateway container and check the subsystem configuration. Identify that your namespaces are assigned to incremental anagrpid values.
    Bash

Enter the container shell

podman exec -it <container_id> /bin/sh

Check the subsystem

/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_get_subsystems

Result: Look for "anagrpid": 1, 2, 3, 4 etc. under the "namespaces" list.

  1. Map the Namespaces (Positional Syntax)
    If namespaces are missing from the subsystem, add them. In this specific SPDK version, the add_ns command uses Positional Syntax (no flags).
    Example using random IDs:
    Bash

Syntax: nvmf_subsystem_add_ns [NQN] [BDEV_NAME]

/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_add_ns nqn.2016-06.io.spdk:target01 bdev_uuid_random_a1b2c3d4
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_add_ns nqn.2016-06.io.spdk:target01 bdev_uuid_random_e5f6g7h8
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_add_ns nqn.2016-06.io.spdk:target01 bdev_uuid_random_i9j0k1l2
3. Open the "Gates" (Flag-based Syntax)
The Listener requires Flag-based Syntax to update the ANA state. This is the crucial step to make the disks visible to ESXi. You must repeat this for every ANA Group ID used by your namespaces.
Example using Gateway IP 10.10.10.50:
Bash

Enable Group 1 (ensure it is not "inaccessible")

/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_listener_set_ana_state -n optimized -t tcp -a 10.10.10.50 -s 4420 -f ipv4 -g 1 nqn.2016-06.io.spdk:target01

Enable Groups 2, 3, and 4

/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_listener_set_ana_state -n optimized -t tcp -a 10.10.10.50 -s 4420 -f ipv4 -g 2 nqn.2016-06.io.spdk:target01
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_listener_set_ana_state -n optimized -t tcp -a 10.10.10.50 -s 4420 -f ipv4 -g 3 nqn.2016-06.io.spdk:target01
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_listener_set_ana_state -n optimized -t tcp -a 10.10.10.50 -s 4420 -f ipv4 -g 4 nqn.2016-06.io.spdk:target01
4. Refresh ESXi
Run these commands on the ESXi host to force a clean discovery of the now-optimized paths:
Bash

1. Disconnect the current session

esxcli nvme fabrics disconnect -a vmhba64 -s nqn.2016-06.io.spdk:target01

2. Reconnect to the gateway

esxcli nvme fabrics connect -i 10.10.10.50 -p 4420 -a vmhba64 -s nqn.2016-06.io.spdk:target01

3. List the devices

esxcli nvme device list


💡 Expert Advice
• Command Syntax Inconsistency: Be aware that within the same spdk_rpc tool, nvmf_subsystem_add_ns often uses positional arguments, while nvmf_subsystem_listener_set_ana_state mandates flags (-n, -g, -a). Mixing these styles usually results in "Invalid Parameters" or "Unrecognized Arguments" errors.
• Persistent Config: Commands run via spdk_rpc inside the container are volatile. Once verified, ensure you apply these changes via the ceph nvmeof CLI on the host level (e.g., ceph nvmeof subsystem listener add ... --ana-group 2) so they survive gateway restarts.
• Proactive Setup: When scaling NVMe-oF namespaces for ESXi, pre-configure your Listeners to support a range of ANA Groups (1-32) to avoid manual intervention every time a new volume is mapped.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions