Skip to content

Conversation

@tvegas1
Copy link
Contributor

@tvegas1 tvegas1 commented Oct 7, 2025

What?

When selecting RoCE GID, search for IPv6 non link-local before trying to use IPv6 link-local GID.

Why?

Some clusters only have gid_index=3 usable. Manually setting UCX_IB_GID_INDEX=3 is cumbersome and error-prone.

How?

Relevant cli: show_gids. Tested on two clusters:

UCX_LOG_LEVEL=debug ./rfs/bin/ucx_perftest -l -t ucp_put_bw | grep gid_index

# For all interfaces (mlx5_x/roce_railsx):
UCX_NET_DEVICES=mlx5_0:1 UCX_PROTO_INFO=y ./rfs/bin/ucx_perftest -l -t ucp_put_bw
UCX_NET_DEVICES=mlx5_0:1 UCX_PROTO_INFO=y ./rfs/bin/ucx_perftest -t ucp_put_bw remote-node

@brminich brminich requested a review from amastbaum October 17, 2025 14:17
{UCT_IB_DEVICE_ROCE_V1, AF_INET6}
struct {
uct_ib_roce_version_info_t info;
int skip_ll; /* link-local not allowed when true */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. maybe inverse to simplify - "allow_ll" or "is_ll"
  2. i'd explicitly initialize this value to 0 when applicable (line 1086, 1088 etc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

static int uct_ib_gid_is_ipv6_ll(const union ibv_gid *gid)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can reuse IN6_IS_ADDR_LINKLOCAL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

amastbaum
amastbaum previously approved these changes Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants