-
Notifications
You must be signed in to change notification settings - Fork 1
Meeting 2017 07
9am US Central time July 11 - noon US Central time July 13, 2017
Cisco, Chicago (pretty much directly next to O'Hare airport, google maps link), 9501 Technology Blvd, West Office Center, Rosemont, Illinois 60018
We are in the "Midway" conference room, which is outside Cisco reception.
Meaning: you don't have to check-in with reception / get a badge.
Just take the first hallway off to your left and Midway is clearly marked immediately on the left.
There are no registration fees to attend this meeting.
Please add your name to the wiki list below if you are coming to the meeting:
- Ralph Castain (Intel)
- Jeff Squyres (Cisco)
- Brice Goglin (Inria)
- Brian Barrett (AWS) [only Tuesday and Wednesday]
- Mohan Gandhi (AWS)
- Shinji Sumimoto (Fujitsu)
- Takahiro Kawashima (Fujitsu)
- Nathan Hjelm (LANL)
- Howard Prichard (LANL)
- George Bosilca (UTK) [at least partially] (I hope we get the good half)
- Edgar Gabriel (UH)
- Artem Polyakov (Mellanox)
- Matthew Dosanjh (SNL)
- Geoff Paulsen (IBM)
- Geoffroy Vallee (ORNL)
- If you sign up after this point, be sure to let Jeff Squyres know so that he can get you a guest badge and wifi access!
Attending Remotely:
- Josh Hursey - IBM (Available from 8:30am-5pm Central) (Added a ☎️ icon next to the items I'd like to call in for, if possible)
- David Bernholdt - ORNL (around other commitments)
-
Howard's bug scrub / issue roundup
-
UCX packaging in OMPI sources (Mellanox)
- Want this in OMPI v 4.0
- Configuration prerequisites
- When we turn it on (check available fabrics, tcp should be available soon, then UCX can be always on)
- How new versions are updated
- Placement inside the sources: needs to be available for both MPI and SHMEM layers.
- INITIAL Got some push-back about adding more embedded packages. Will revisit tomorrow.
- Motivation:
- We see issues on the mailing list related to bad user experience with OMPI on Mellanox fabrics for both performance and stability.
- Definitely need UCX for OSHMEM
- Goal: improve the OOB experience on IB stacks:
- by auto detecting UCX when available (as is done with SLURM autodetection).
- by using the internal version when it is not available (for IB networks).
-
Move the entire Open MPI web site behind a CDN?
- If so, we can remove the mirrors program
-
Investigate shared location for OMPI organization secrets/keys/passwords (e.g., LastPass? 1Password? ...?)
-
☎️ How to better track PRs across multiple release branches?
- E.g., ensure it has already been merged to master
- E.g., ensure that we merge at vX only when it has been merged at all desired versions < vX
- One possibility: should we always make an issue, and put a tag on it for each version that a given PR is merged against?
- Can this be automated via bot somehow?
-
☎️ Proposal for OMPI signed-off-by policy:
- Do not grandfather old commits
- If you cherry pick someone else's commit, you need to sign off
-
☎️ Threading model
-
☎️ Rankfile mapper: Ralph can no longer maintain it. Who will become the maintainer? (IBM volunteered)
-
☎️ Issue/old PR roundup, esp. for the v2.0.x and v2.1.x releases.
-
☎️ Signal forwarding
- Came up on the user list again, this time wanting a way to signal only child procs that call MPI_Init (and not any intermediate procs such as shell scripts)
- Ralph added an MCA param to either hit only direct children, or all descendants of those children - but not exactly what the user requested
-
☎️ Multithreaded Onesided - It's buggy, just fix bugs or refactor?
-
☎️ What is the plan for 4.0 and beyond regarding embedding of:
- hwloc v2 and v1
- Easy way to disable hwloc internals such as NVML from OMPI's configure?
- How to deal with hwloc 2.0 ABI break (2 components?)
- libevent v2.1 and v2.0
- pmix 2.0, 3.0, and 1.x
- One suggestion: should we make the
externalcomponents higher priority than the embedded components? This might naturally start deprecating / phasing out the embedded versions.
- hwloc v2 and v1
-
Strict C99 stuff (e.g., pointer to constant)
- Per Paul Hargrove's discovery; adapted in PR https://github.com/open-mpi/ompi/pull/3813
- Note: there's non-C99 elsewhere in OMPI (i.e., if you enable "strict C99", OPAL fails to compile in at least a few places)
- Do we really care about strict C99?
-
Automate reduction of symbol name pollution?
- https://github.com/open-mpi/ompi/pull/3258
- some CI to ensure MPI not in OPAL layer?
-
SPI: Any updates / action items?
- (This is an open question)
-
Other pending PR's that require any discussion...?
- ...
-
☎️ CI:
- What can we do about the fragility of the Jenkins infrastructure?
- It seems like one or more of the CI's is broken every week due to lost connections or changed protocols, thereby blocking all commits.
- Other random CI updates
- What can we do about the fragility of the Jenkins infrastructure?
-
What do we do about Pathscale compiler support?
- ☎️ MAYBE THURSDAY/GEORGE Fujitsu Status
- ☎️ THURSDAY/GEORGE Plans for v4.0.x (recall: new datatype stuff on master is backwards incompatible with v3.0.x -- https://github.com/open-mpi/ompi/pull/3441)
- Remove MPI symbols removed in MPI-3.0
- Can we do this in a way to default to being a compiler error (showing the exact file / linenumber of removed symbol).
- Any value in providing a non-default way to turn this into a warning to allow customers to make progress without changing their code? Many of these changes are straight forward, is this even worth the effort?
- Any other Binary incompatible changes we want to do for v4.0.x (ASAP)?
- (Artem) List of features for 4.0
- (Artem) What PMIx version is planned.
- Remove MPI symbols removed in MPI-3.0
- ☎️ THURSDAY/GEORGE Remove CR from master before we branch for v4.0.x
- THURSDAY/SO GEORGE CAN BE HERE: Shall we link components against their native main library - e.g., ORTE components to libopen-rte?
- See https://github.com/open-mpi/ompi/issues/3705
- Required reading before the discussion: https://github.com/open-mpi/ompi/wiki/Linkers
- Remember: there is a workaround --
--disable-dlopen(i.e., cases 4 and 16 in the tables on that wiki). But that doesn't help if the OS/distro installs a "case 2" Open MPI by default.
- ☎️ THURSDAY/GEORGE PMIx working group meetings
- Network
- Tiered Storage
- OpenMP/MPI coordination
- Language bindings as apps begin using PMIx? (Ralph volunteers to do Fortran!)
- THURSDAY/GEORGE Old issue about BTL progress functions: https://github.com/open-mpi/ompi/issues/1695
-
☎️ [George & Nathan] IMB Unidir_Get with Vader issue - https://github.com/open-mpi/ompi/issues/3821
-
RESOLVED:
- @hjelmn to look at this in the immediate future
- This is a blocker for v3.0.0
- May also necessitate a release in v2.0.x and v2.1.x -- need to investigate further
-
RESOLVED:
-
☎️ Should we forward all
OMPI_env vars frommpirunenvironments to started process environments?- If so, should we also for
ORTE_andOPAL_env vars? - Or should we only forward
OMPI_MCA_env vars?- NOTE: current master forwards all
OMPI_env vars
- NOTE: current master forwards all
- Should we make a non-
OMPI_MCA_prefix that we also forward, but something less than all ofOMPI_? (E.g.,OMPI_FORWARD_, or something better) - What about non-OMPI MCA params (e.g., PMIX_MCA)?
- Just envars, or do we add a registration function for cmd line support (e.g., -pmca foo x)?
-
RESOLVED:
- Yes, we want to forward non-OMPI_MCA env vars.
- Ralph:
- Will make a PR that will enable components to register what env vars they want forwarded. At max, we will support a single
*for a wildcard (not full regexps) -- e.g.PSM2_*-- for forwarding all names that match. - Will probably be something like: a component that wants to register for this stuff will write something to a text file somewhere (e.g., write
PSM2_*to a text file somewhere) that ORTE/PMIX/whatever will see later and do the forward. This makes it possible fororterunto forward whatever env vars it needs to, without having to open all their corresponding components (e.g.,orterundoesn't know anything about PSM2 components, but can still forward PSM2 env vars.
- Will make a PR that will enable components to register what env vars they want forwarded. At max, we will support a single
- If so, should we also for
-
MPI_File backing file location
- https://github.com/open-mpi/ompi/pull/3739
-
RESOLVED:
- Ralph and Edgar talked -- added a note to the above issue.
-
☎️ Release branch status:
- v1.10
- v2.0.x
- v2.x (i.e., v2.1.x)
- v3.0.x
-
RESOLVED:
- Talked through all of these -- basically the normal content of a Tuesday webex.
-
Release processes / Brian
-
RESOLVED:
- Coming soon: make nightly and release tarballs exactly the same
- AUTHORS: we should automate these updates. Brian will work on this.
- should we keep the orgs in there? It's somewhat of a pain. And it's also a bit of a relic -- from before we did the "signed off by" stuff.
- Should we remove it from git and just auto-generate the file during
make dist? Yes, this seems like a good idea.
- NEWS: this is a problem. Want to change this to only top-level / broad-strokes of features. Do not include individual bug fixes -- there will be a line in there saying "Here's the URL where all the Github fixed issues and PRs can be found for this release".
- Big change: RM's will not assemble NEWS. If a dev wants an item in NEWS, they need to PR it.
- Commit messages: we need to get better about "Reported by helpful user" in commit messages. If we're not going to cite people in NEWS any more, then we want to make sure to cite them in commit messages.
-
RESOLVED:
-
☎️ Can we get a NEWS decoration to commit messages on branches so that we know what to put in NEWS?
-
RESOLVED:
- This is now moot, per above.
-
RESOLVED:
-
☎️ Revisit this old discussion: should we continue cherry-picking from master to release branches?
- The Git Way is usually to merge from master to release branches
- (Artem) Few comments: my impression is that Git way is vice-versa (https://www.atlassian.com/git/tutorials/comparing-workflows#gitflow-workflow). It assumes following types of branches:
-
develop(persistent, where all new features go), -
master(persistent, where all the releases are, each marked with the tag) -
feature(temporal, branched fromdevelop, merged back: for the temp work on new feature) -
hotfix(temporal, branched frommaster, merged back: to fix post-release bugs). (!) Once thehotfixis merged tomaster,masteris merged back todevelop, not vice-versa to keepdevelopconsistent withmaster. -
release(temporal, branched fromdevelop, merger tomaster: to harden before next release)
-
- Currently: a) our
master=developer; b) we don't havemasterequivalent; c) we keepreleasebranches which force us to do cherry-picking and we sometimes have problems with lost commits. - This is not to say that we should follow this, one disadvantage I already see - not easy to support the old releases as release branches are eliminated after it stabilized. Just to keep in mind.
- (Artem) Few comments: my impression is that Git way is vice-versa (https://www.atlassian.com/git/tutorials/comparing-workflows#gitflow-workflow). It assumes following types of branches:
- This puts more emphasis on master to be more stable. But maybe with all of our new CI, master is more stable these days...?
- There are pros, cons, and differences: e.g., things wouldn't go on master unless we intend to merge them to release branches.
-
RESOLVED:
- Main proposal from Brian:
- Shorten time between branch and release, merge from master->release branch during that time (instead of cherry pick), and then cherry pick after release.
- There is some discussion still needed about exactly when we want to stop merging and start cherry picking, because what about new features that come to master that aren't destined for that release
- Brian will be posting a proposal about this
- Main proposal from Brian:
- The Git Way is usually to merge from master to release branches
-
☎️ CI:
- Release process updates
- Where should Open MPI downloads be:
- OMPI web site (probably not)
- S3
- Github
- Where should Open MPI downloads be:
- Release process updates
-
RESOLVED:
- Leave the plans in place for all downloads going to S3 (not Github)
-
☎️ We now have options for merging PRs:
- Continue the way we do now (merge at current head)
- Rebase and merge (i.e., much more of linear history)
- Rebase and squash
-
RESOLVED:
- On master: ...
- Brian thinks rebase and merge is good
- Howard thinks merge @HEAD is good (i.e., what we do today)
- On release branches: continue merging @HEAD
- On master: ...
-
(Artem) UCX/OSC component status update (ready for PR)
-
RESOLVED:
- Seems like a no-brainer: a vendor wants to commit a component that supports their hardware. Go for it.
- This will bring up the network selection discussions again, though. We'll need to figure those out.
-
RESOLVED: