Skip to content

Conversation

@guy-ealey-morag
Copy link
Contributor

@guy-ealey-morag guy-ealey-morag commented Jan 6, 2026

What?

Currently when UCX_PROTO_INFO is enabled, it shows tons of output with details of all the selected protocols, most of which is never used by the application. The idea is to introduce a new option to the config variable used, to display only used protocols (with usage counters).

Why?

Useful for debugging and diagnostics

How?

  1. When req tracing is enabled we use the existing progress wrappers to count on every call to the first stage.
  2. When req tracing is disabled, we set a wrapper for the first stage only and remove it when reaching the maximal usage count in order to avoid degrading performance.
  3. This new approach actually removes one extra branch from the fast path.
  4. Works in debug and release modes.

Performance

  1. Compared to the master branch, and with UCX_PROTO_USAGE_COUNT_MAX=255 the overhead is slightly lower (~10ns) due to the removed extra branch from the fast path
  2. with UCX_PROTO_USAGE_COUNT_MAX=inf the performance is comparable to the master branch.
+---------------------+--------------------------------------------------------------------------------------------------+
| perftest self cfg#0 | rendezvous data fetch(multi) into host memory from host                                          |
+---------------------+--------------------------------+-----------------------------------------------------------------+
| 0                 0 | no data fetch                  |                                                                 |
| 516383      1..6293 | copy from mapped remote memory | self/memory                                                     |
| 0         6294..inf | zero-copy read from remote     | 50% on rc_mlx5/mlx5_0:1/path0 and 50% on rc_mlx5/mlx5_2:1/path0 |
+---------------------+--------------------------------+-----------------------------------------------------------------+

This PR was originally created by @iyastreb (#10395), and then tested and finalized by me.

@guy-ealey-morag guy-ealey-morag changed the title Ucp/proto info used UCP/PROTO: UCX_PROTO_INFO=used option Jan 6, 2026
@brminich brminich requested review from iyastreb and tvegas1 January 6, 2026 14:40
@guy-ealey-morag guy-ealey-morag force-pushed the ucp/proto-info-used branch 2 times, most recently from 7ad21c5 to ff2daa4 Compare January 7, 2026 13:27
tvegas1
tvegas1 previously approved these changes Jan 8, 2026
Copy link
Contributor

@tvegas1 tvegas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you but I was thinking if perf was a thing, we could keep proto_usage_count_max checks, without necessarily exposing it as PROTO_USAGE_COUNT_MAX.

" 'y' : Print information for all protocols\n"
" 'n' : Do not print any protocol information\n"
" 'auto' : Print information when UCX_LOG_LEVEL is 'debug' or higher\n"
" 'used' : Print information for used protocols\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need another var now? Otherwise, how one can ask for used protocols to be printed with/without debug log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When set to used the used protocols table will be printed regardless of the log level.
But currently with auto it will print all protocols when the log level is debug or higher.
We could change the auto mode to print the used protocols when debug logs are enabled (instead of all protocols) if this is going to be the common use case.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be a bit confusing to me. Imo we have to manage when we want to print it and the format of the output separately. I was thinking about another env var which would define the format of the output (with values full and used).
@iyastreb wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I don't see a problem here. Use cases I see:

  1. No configs specified, mode is auto, prints nothing
  2. LOG_LEVEL=debug, mode is auto = print ALL PS tables
  3. LOG_LEVEL=debug, PROTO_INFO=used (explicit setting), prints only used PS

So LOG_LEVEL can only influence mode if it's not set explicitly (auto).

Copy link
Contributor

@brminich brminich Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would I print just protocol tables with option used (or without it)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like today: UCX_PROTO_INFO=y to print all PS tables
UCX_PROTO_INFO=used to print actually used tables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, lets use just one var, but I suggest to define it like this:
UCX_PROTO_INFO values:
y - prints tables in "used" format regardless of LOG_LEVEL
n - does not print tables
auto - print tables in "used" format when LOG_LEVEL is debug
all - print tables in the current format (i. e. print all)

@guy-ealey-morag, @iyastreb wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky part is that used PS tables are printed only at the end of the UCP worker lifetime (if it's terminated gracefully!). So essentially it changes behavior UCX users get used to. For example, in sglang I use this feature but I have to print PS tables on some event, because sglang app does not terminate gracefully. So I don't know what is better.. I would keep it as is, and improve in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@guy-ealey-morag
Copy link
Contributor Author

up to you but I was thinking if perf was a thing, we could keep proto_usage_count_max checks, without necessarily exposing it as PROTO_USAGE_COUNT_MAX.

Removing proto_usage_count_max results in simpler logic that also avoid another branch to check if we reached the max, so I think it's a good compromise.

@guy-ealey-morag
Copy link
Contributor Author

After further testing I found out that short active messages are not counted properly because they don't use the wrapper like other protocols.
Counting short active messages will require adding operations in the fast path so I decided to leave it for now.
Instead, I modified the table printing code to not print out a 0 in order to indicate that it's not counted.

Copy link
Contributor

@iyastreb iyastreb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

iyastreb
iyastreb previously approved these changes Jan 19, 2026
tvegas1
tvegas1 previously approved these changes Jan 19, 2026
@guy-ealey-morag guy-ealey-morag dismissed stale reviews from tvegas1 and iyastreb via c9337e0 January 20, 2026 10:05
@brminich brminich merged commit bf1294e into openucx:master Jan 22, 2026
147 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants