- 
                Notifications
    You must be signed in to change notification settings 
- Fork 68
Revert "mtl/ofi: Increase priority." #1169
Conversation
This should never have gone into 2.0.0. The ofi provider's priority should never be higher than ob1 if verbs or sockets is the provider. This reverts commit 1b5637d. Fixes open-mpi/ompi#1676
| @yburette This should not have gone into 2.0.0. | 
| :bot:assign: @jsquyres | 
| @yburette @hppritcha @matcabral Do you guys know/remember why the priority was set so high? Shouldn't OFI ensure to set the priority high only if certain libfabric providers are being used? | 
| It was set to 30 on master along with psm2, psm, portals4, and mxm. The problem was we were supposed to leave it at 10 on 2.0.0. The commit that restored the master priority of ofi to be higher than ob1 was included in a cm priority update and wasn't supposed to be. It is probably a good thing we hit this problem now because it shows we should drop the priority on master until the priority can be based on available providers (we never want to choose sockets over ob1 for example). | 
| Test PASSed. | 
| I defer to @yburette . All that I can say though is that compared to other things, OFI MTL has been rock solid. We could make priority based on the underlying provider rather than a fixed parameter. Even there, the verbs provider in OFI is maturing really rapidly. | 
| @hppritcha Ideally thats what we want but we will need ofi/verbs + cm vs ob1/verbs performance numbers before letting ofi win. | 
| @hppritcha I don't think that the verbs support in libfabric is mature enough yet. Additionally, there's the issue of shared memory support in the MTL (which is nominally handled by PSm/psm2). Does that even exist yet for the verbs libfabric provider? I thought we had some kind of include / exclude MCA parameter for libfabric providers in the fourth MTL. Do those not exist? | 
| @hppritcha What's the default "include" setting for the providers -- should it default to something like  | 
| @jsquyres @hjelmn @hppritcha Sorry guys, just saw this discussion. I thought that the original idea was to have the higher priority if, and only if, the OFI provider was in the include list. Let me take a closer look. | 
| " (we never want to choose sockets over ob1 for example)." | 
| @hjelmn Another question: which OFI provider ends up being selected? | 
| For the user (iWarp) verbs is selected (it doesn't even appear to have a matching implementation and should not win vs ob1 anyway), if verbs is disabled then sockets wins (ACK!). | 
| @hjelmn What about when you disable both? | 
| How about changing from a default of excluding providers that we know we don't want to a default of including the providers that we know that we do want? It feels like the risk is lower for including known good providers. | 
| I'd go with the including known good providers. That way we don't need to keep track of all the multitude of "layered" providers that seem to be appearing... | 
| @hjelmn @matcabral @yburette I therefore think that this PR should be closed and/or changed to set the "includes" as the default.  What should be the value:  | 
| @jsquyres @hppritcha I see your point, and I agree that now that we have the "layered" providers, it's going to become increasingly more difficult to maintain this exclude list. I think the change should be fairly trivial -- i.e. include = "psm,psm2", exclude = NULL. @hppritcha Do you want me to include the gni provider by default as well? | 
| Whether or not to include gni is @hppritcha 's call. Since it is not on any Cray platform by default it is probably ok to include but I would like to see performance numbers vs ob1 before doing that. If ob1 looses in a performance comparison then that would be an indication that ob1 needs additional work. | 
| yes please include gni. | 
| @hjelmn @hppritcha @jsquyres | 
| @hjelmn @yburette Ok, I think we're in a good state on master. What exactly do you want on v2.x? Right now (before this PR is merged), here's the current state: 
 I believe that what we want is to leave all the priorities alone (i.e., ignore/close this PR), and get a v2.x PR of open-mpi/ompi@2f0cde7 (i.e., adjust the include/excludes for the OFI MTL). Right? | 
| @yburette Yes, that would be great. Thanks! | 
This should never have gone into 2.0.0. The ofi provider's priority should
never be higher than ob1 if verbs or sockets is the provider.
This reverts commit 1b5637d.
Fixes open-mpi/ompi#1676