- 
                Notifications
    You must be signed in to change notification settings 
- Fork 928
btl/uct: allow connections to be formed using a separate memory domain #13195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
btl/uct: allow connections to be formed using a separate memory domain #13195
Conversation
It is possible that the current memory domain does not have an adequate transport for forming endpoint to endpoint connections. When this is the case the btl will fail to function. To support these situations this CL adds support for using an alternate transport (usually tcp) which can be used to make the endpoint connections. Signed-off-by: Nathan Hjelm <[email protected]>
|  | ||
| ret = mca_btl_uct_tl_progress(module->conn_tl, 0); | ||
|  | ||
| while (NULL | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you get one of the pending connection requests from a particular module, and then somehow you select a different module based on the matching name on the connection request and move forward with that ? This makes little sense to me, because as far as I see in the code module->pending_connection_reqs only contains requests related to this particular module.
The original code was correct, it went over all the component's modules and progressed each module pending connection requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code was correct when the tl's are in the same module. I did not want to put a different memory domain's TL into the module and opted with just saving a module that only has a connection tl. The side effect of this is that I have to look up which module is using the tl for the connection request. There are probably some corner cases this breaks (like systems having inconsistent memory domain naming) so it might make sense to change how connection tls are structured entirely (will require more code changes). The common cases will work though.
Let me look through the code a bit. I wrote this awhile ago to get around UD being plain broken (since fixed in UCT). I revived it because there are concerns with both rdmacm and UD that TCP resolves. Other option is to look at moving all connections to using an OOB communication method (not using UCT at all) but since ORTE went away I am not 100% sure if this is the best way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may completely re-work this because I need to also fix MCA variables for the BTL. Right now the standard ones do nothing but show what the BTL might support. They are not actually used. I have a fix in the works but it is a fundamental redesign of the TL discovery phase.
Will provide an update early next week.
| /** transport index that should be connected */ | ||
| int tl_index; | ||
|  | ||
| char module_name[16]; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why saving the name and using strcmp all the time instead of saving the pointer to the module itself ?
| } | ||
|  | ||
| for (int i = 0; list->list[i]; ++i) { | ||
| regex_t preg; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why regex ? You compare an official (MD or TL) name to the list of include, which should only contains correct names for these objects. Is this only for case insensitivity ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably overkill but I wanted to make the matching more powerful. Originally, when I first started working with these systems the names were based on where (pci bus number, etc) the HCA was connected. Example: rocep0s4 (now simplified to irdma0 but kept the regex matching). Regex is a simple way to get explicit partial matching (using .*).
It is possible that the current memory domain does not have an adequate transport for forming endpoint to endpoint connections. When this is the case the btl will fail to function. To support these situations this CL adds support for using an alternate transport (usually tcp) which can be used to make the endpoint connections.