- 
                Notifications
    You must be signed in to change notification settings 
- Fork 929
bml/r2: always add btl progress function #1677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit changes the behavior of bml/r2 from conditionally registering btl progress functions to always registering progress functions. Any progress function beloning to a btl that is not yet in use is registered as low-priority. As soon as a proc is added that will make use of the btl is is re-registered normally. This works around an issue with some btls. In order to progress a first message from an unknown peer both ugni and openib need to have their progress functions called. If either btl is not in use after the first call to add_procs the callback was never happening. This commit ensures the btl progress function is called at some point but the number of progress callbacks is reduced from normal to ensure lower overhead when a btl is not used. The current ratio is 1 low priority progress callback for every 8 calls to opal_progress(). Fixes open-mpi#1676 Signed-off-by: Nathan Hjelm <[email protected]>
| @larrystevenwise Please test. | 
| @bosilca Please also review -- this touches  | 
| @bharatpotnuri Please test (this is a possible solution for #1664). | 
| Huh, I referenced the wrong bug #. Will rebase it with the correct one later. | 
| @jsquyres  Thanks @hjelmn . | 
| @bosilca please review. Thanks. | 
| Test passed. | 
| Needed for 2.0.0. Merging and opening 2.0.0 PR. | 
| Have someone evaluated the impact of this change on the performance ? We should stop adding atomic operations and polluting the cache in our critical point. | 
| Ran some simple tests and it looked ok. I can do it without the atomics if that is preferred. Don't really care if the low-priority calls are made exactly 1/8 times. | 
| @bosilca Your request changes will make it so only one thread is ever in opal_progress() correct? If that is the case we should plan to remove all atomics from opal_progress(). | 
| The current version ensure that only one thread is active in opal_progress. However, we might want to be more liberal with the resources, allowing multiple threads to concurrently drain the networks. | 
This commit changes the behavior of bml/r2 from conditionally
registering btl progress functions to always registering progress
functions. Any progress function beloning to a btl that is not yet in
use is registered as low-priority. As soon as a proc is added that
will make use of the btl is is re-registered normally.
This works around an issue with some btls. In order to progress a
first message from an unknown peer both ugni and openib need to have
their progress functions called. If either btl is not in use after the
first call to add_procs the callback was never happening. This commit
ensures the btl progress function is called at some point but the
number of progress callbacks is reduced from normal to ensure lower
overhead when a btl is not used. The current ratio is 1 low priority
progress callback for every 8 calls to opal_progress().
Fixes #1676
Signed-off-by: Nathan Hjelm [email protected]