-
Notifications
You must be signed in to change notification settings - Fork 517
UCP/PROTO: UCX_PROTO_INFO=used option #11100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7ad21c5 to
ff2daa4
Compare
tvegas1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you but I was thinking if perf was a thing, we could keep proto_usage_count_max checks, without necessarily exposing it as PROTO_USAGE_COUNT_MAX.
| " 'y' : Print information for all protocols\n" | ||
| " 'n' : Do not print any protocol information\n" | ||
| " 'auto' : Print information when UCX_LOG_LEVEL is 'debug' or higher\n" | ||
| " 'used' : Print information for used protocols\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we need another var now? Otherwise, how one can ask for used protocols to be printed with/without debug log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When set to used the used protocols table will be printed regardless of the log level.
But currently with auto it will print all protocols when the log level is debug or higher.
We could change the auto mode to print the used protocols when debug logs are enabled (instead of all protocols) if this is going to be the common use case.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to be a bit confusing to me. Imo we have to manage when we want to print it and the format of the output separately. I was thinking about another env var which would define the format of the output (with values full and used).
@iyastreb wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I don't see a problem here. Use cases I see:
- No configs specified, mode is auto, prints nothing
- LOG_LEVEL=debug, mode is auto = print ALL PS tables
- LOG_LEVEL=debug, PROTO_INFO=used (explicit setting), prints only used PS
So LOG_LEVEL can only influence mode if it's not set explicitly (auto).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would I print just protocol tables with option used (or without it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like today: UCX_PROTO_INFO=y to print all PS tables
UCX_PROTO_INFO=used to print actually used tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, lets use just one var, but I suggest to define it like this:
UCX_PROTO_INFO values:
y - prints tables in "used" format regardless of LOG_LEVEL
n - does not print tables
auto - print tables in "used" format when LOG_LEVEL is debug
all - print tables in the current format (i. e. print all)
@guy-ealey-morag, @iyastreb wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tricky part is that used PS tables are printed only at the end of the UCP worker lifetime (if it's terminated gracefully!). So essentially it changes behavior UCX users get used to. For example, in sglang I use this feature but I have to print PS tables on some event, because sglang app does not terminate gracefully. So I don't know what is better.. I would keep it as is, and improve in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Removing |
|
After further testing I found out that short active messages are not counted properly because they don't use the wrapper like other protocols. |
iyastreb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
45269ef to
de9c856
Compare
c9337e0
What?
Currently when
UCX_PROTO_INFOis enabled, it shows tons of output with details of all the selected protocols, most of which is never used by the application. The idea is to introduce a new option to the config variableused, to display only used protocols (with usage counters).Why?
Useful for debugging and diagnostics
How?
Performance
UCX_PROTO_USAGE_COUNT_MAX=255the overhead is slightly lower (~10ns) due to the removed extra branch from the fast pathUCX_PROTO_USAGE_COUNT_MAX=infthe performance is comparable to the master branch.This PR was originally created by @iyastreb (#10395), and then tested and finalized by me.