forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20151215
Geoff Paulsen edited this page Dec 15, 2015
·
5 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Jeff Squyres
- Edgar Gabriel
- Geoffroy Vallee
- Howard
- Joshua Ladd
- Nathan Hjelm
- Ralph
- Ryan Grant
- Todd Kordenbrock
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.2
- 1.10.2 - still 3 PRs waiting to go into 1.10.2
- Ibarrier thrown to Nathan. Found through MPICH test suite.
- Jim Sharp reported, Ralph cleaned it over and put into 1.10, and threw this to Jeff S.
- Integer Overflow - Thrown to George. Ralph will ping him.
- in coll/allreduce - From Jeff Hammonds big MPI thing
- Recasts to size_t to do math, and then recasts down to int.
- Nathan - should doublecheck math, since might still overflow.
- Should evaluate these codepaths a bit better.
- PR on master, but tagged with 1.10.2
- Jeff S will look at today, and may then be able to PR to 1.10.2
- Nathan has one more, unmemmap a pointer belonging to OSHEM.
- Oneline change, and will bring it over soon.
- Subarray 1191 on master. Jeff hasn't been following.
- Need to fork off to George.
- Edgar - Email about ROMIO / Luster issue
- Issue is fixed in OMPIO Master, but not on 1.10. 1.10 OMPIO is vastly out of sync with Master.
- QUESTION: should we update OMPIO on 1.10?
- Some changes in the Framework stuff, but if we pull it over it will drag a lot of other items.
- DECISION: Lets NOT update OMPIO on 1.10.x for now, encourage people to
- After these are done will roll an RC later this week.
-
Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
- PMIx is Howards #1 blocker right now. We need to decide what we want to do.
- Putting off supporting external PMIx in Release Canidate.
- Distros won't pick it up if we don't support it in 2.0.0
- What can we do with PMI-x for 2.0 RC?
- Ralph - It's relatively clean. Ralph will pull Master 1.1.2 over to OMPI 2.0 branch later today.
- Putting off supporting external PMIx in Release Canidate.
- News and shlib version stuff.
- Howard will do News, and share with others to review.
- Addprocs == 0 discovery.
- Running out of resources in a different way.
- Only happens with perpair Queues in openib.
- Thought we'd gotten rid of those years ago, no performance advantage.
- Not really a blocker then, the blocker would be, ensure that we got rid of non-srq mode.
- Nathan will review old email and code.
- Anywhere we have free_list_wait, we get into infinite loops.
- Debugger attachment issue broken on master and 2.x branch
- Processes don't progress until debugger progresses, which can't attach until proctable has been created.
- Some debuggers provide a flag that they turn on when they're attached.
- So we added code that rank0 won't progress until it gets an RML message from mpirun saying that debuger has attached to mpirun.
- So with PMI-x we removed that RML message, which we only need for this.
- Ralph is proposing we use PMI-x error handling to get this message.
- Problem is that PMI-x error hanlding code is in PMI-x 1.2.0, but Master / 2.0 is currently at 1.1.2.
- This is an issue tagged somewhere else. Might be on the Totalview side.
- QUESTION: SHOULD we update PMI-x on master / 2.0 branch to be PMI-x 1.2.0?
- Jeff's been using DDT, but we missed that we broke totalview.
- Totalview still uses MPIR_Being_deubugged.
- two different ways to tell MPI implementation that we're being debugged.
- Ralph will try to setup a time on Doodle to further discuss.
- PMIx is Howards #1 blocker right now. We need to decide what we want to do.
-
Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 *
- RFC: remove embedded libevent and hwloc
- Protip - You can add "Fixes: ISSUE#" to PR, then when PR is merged, it will close the issue.
- Yes it works across github repos.
- Howard, When creating a downstream PR, they add a
- Travs testing
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, HLRS, IBM