forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20160209
Geoff Paulsen edited this page Feb 9, 2016
·
10 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Jeff Squyres
- Geoff Paulsen
- Brad Benton
- Edgar Gabriel
- Howard Pritchard
- Joshua Ladd
- Nathan Hjelm
- Nysal Jan
- ralph
- Ryan Grant
- Sylvain Jeaugey
- Todd Kordenbrock
- Yohann Burette
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3 - Targeting April, unless there is a need.
- Nathan will look at 0 byte send issue.
- dev list of SLURM issues already fixed in 1.10.2
- verbs usNIC not build by default - wait for review by Howard.
- Fortran 08 - Jeff will take a look at today.
- SLES 12 - was a race condition fork/exec before SIGCHILD detection. Fixed.
- Long running jobs (Linpack) still having SIGCHILD issues.
- Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
- Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
-
Issue 1215 https://github.com/open-mpi/ompi/pull/1335: grpcomm errors
- Ralph is unable to replicate. Didn't see on Trinity and elsewhere at scale. Found where the problem is, but trying to figure out why solution isn't working. Ralph-and-Jeff-are-iterating phase.
-
https://github.com/open-mpi/ompi/issues/1252: bad perf caused by openib
- Only fails if openib finds valid procs. As soon as you ibv_cq_poll on 2nd socket. Still like 3ms openib intra-node.
- Specific Mellanox MOFED 3.0 Verbs.
-
Issue 1215 https://github.com/open-mpi/ompi/pull/1335: grpcomm errors
-
PR 927 - need a Ralph review
- (the X / test fail was due to github being down -- it's a false failure)
- Issue 1299 - Nathan Hang osc pt2pt.
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
- Mellanox would like new entrypoints in hcoll into 2.0
- RFC to set the add_procs_cutoff to 32. PR1340 *
- --host vs. --hostfile behavior PR1344
- how many procs to run
- Jeff would like consistent with how over subscription works, but no -np runs 1 proc.
- two issues... how many slots, and how many processes.
- change behavior so that if user doesn't specify -np but DOES specify --host we'll get 1 slot (and one process).
- keep hostfile behavior same as today.
- LANL
- Houston
- HLRS
- IBM
- LANL, Houston, HLRS, IBM
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel