-
Notifications
You must be signed in to change notification settings - Fork 62
Description
ESXi 8.0 and 7.0 Fails to Discover Multiple NVMe-oF Namespaces on Ceph (SPDK) due to ANA Group "Inaccessible" State.
The Issue: In the Ceph NVMe-oF gateway, adding multiple namespaces (NSIDs 2, 3, 4, etc.) automatically assigns them to new ANA Groups (Load Balancing groups). By default, the Listener (the IP entry point) may only mark ANA Group 1 as active/optimized. ESXi connects successfully to the target but discovers 0 devices (or only the first one) because the paths to Groups 2, 3, and 4 are reported as inaccessible by the controller.
The Solution: You must manually align the Listener's "Allowed Groups" with the "Namespace Groups" using the SPDK RPC interface to flip their state to optimized.
🛠 The "What Worked" Guide
- Verify the Disconnect (Container Side)
Access your NVMe-oF gateway container and check the subsystem configuration. Identify that your namespaces are assigned to incremental anagrpid values.
Bash
Enter the container shell
podman exec -it <container_id> /bin/sh
Check the subsystem
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_get_subsystems
Result: Look for "anagrpid": 1, 2, 3, 4 etc. under the "namespaces" list.
- Map the Namespaces (Positional Syntax)
If namespaces are missing from the subsystem, add them. In this specific SPDK version, the add_ns command uses Positional Syntax (no flags).
Example using random IDs:
Bash
Syntax: nvmf_subsystem_add_ns [NQN] [BDEV_NAME]
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_add_ns nqn.2016-06.io.spdk:target01 bdev_uuid_random_a1b2c3d4
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_add_ns nqn.2016-06.io.spdk:target01 bdev_uuid_random_e5f6g7h8
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_add_ns nqn.2016-06.io.spdk:target01 bdev_uuid_random_i9j0k1l2
3. Open the "Gates" (Flag-based Syntax)
The Listener requires Flag-based Syntax to update the ANA state. This is the crucial step to make the disks visible to ESXi. You must repeat this for every ANA Group ID used by your namespaces.
Example using Gateway IP 10.10.10.50:
Bash
Enable Group 1 (ensure it is not "inaccessible")
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_listener_set_ana_state -n optimized -t tcp -a 10.10.10.50 -s 4420 -f ipv4 -g 1 nqn.2016-06.io.spdk:target01
Enable Groups 2, 3, and 4
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_listener_set_ana_state -n optimized -t tcp -a 10.10.10.50 -s 4420 -f ipv4 -g 2 nqn.2016-06.io.spdk:target01
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_listener_set_ana_state -n optimized -t tcp -a 10.10.10.50 -s 4420 -f ipv4 -g 3 nqn.2016-06.io.spdk:target01
/usr/local/bin/spdk_rpc -s /var/tmp/spdk.sock nvmf_subsystem_listener_set_ana_state -n optimized -t tcp -a 10.10.10.50 -s 4420 -f ipv4 -g 4 nqn.2016-06.io.spdk:target01
4. Refresh ESXi
Run these commands on the ESXi host to force a clean discovery of the now-optimized paths:
Bash
1. Disconnect the current session
esxcli nvme fabrics disconnect -a vmhba64 -s nqn.2016-06.io.spdk:target01
2. Reconnect to the gateway
esxcli nvme fabrics connect -i 10.10.10.50 -p 4420 -a vmhba64 -s nqn.2016-06.io.spdk:target01
3. List the devices
esxcli nvme device list
💡 Expert Advice
• Command Syntax Inconsistency: Be aware that within the same spdk_rpc tool, nvmf_subsystem_add_ns often uses positional arguments, while nvmf_subsystem_listener_set_ana_state mandates flags (-n, -g, -a). Mixing these styles usually results in "Invalid Parameters" or "Unrecognized Arguments" errors.
• Persistent Config: Commands run via spdk_rpc inside the container are volatile. Once verified, ensure you apply these changes via the ceph nvmeof CLI on the host level (e.g., ceph nvmeof subsystem listener add ... --ana-group 2) so they survive gateway restarts.
• Proactive Setup: When scaling NVMe-oF namespaces for ESXi, pre-configure your Listeners to support a range of ANA Groups (1-32) to avoid manual intervention every time a new volume is mapped.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status