- 
                Notifications
    You must be signed in to change notification settings 
- Fork 928
WeeklyTelcon_20220920
- Dialup Info: (Do not post to public mailing list or public wiki)
- Brendan Cunningham (Cornelis Networks)
- Christoph Niethammer (HLRS)
- David Bernhold (ORNL)
- Geoffrey Paulsen (IBM)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart
- Josh Fisher (Cornelis Networks)
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- Tommy Janjusic (nVidia)
- William Zhang (AWS)
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (nVidia)
- Aurelien Bouteiller (UTK)
- Austen Lauria (IBM)
- Brandon Yates (Intel)
- Brian Barrett (AWS)
- Charles Shereda (LLNL)
- Edgar Gabriel (UoH)
- Erik Zeiske
- George Bosilca (UTK)
- Hessam Mirsadeghi (UCX/nVidia)
- Jan (Sandia -ULT support in Open MPI)
- Jingyin Tang
- Josh Hursey (IBM)
- Marisa Roman (Cornelius)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Cornelis Networks)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Raghu Raja (AWS)
- Ralph Castain (Intel)
- Sam Gutierrez (LLNL)10513
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Xin Zhao (nVidia)
- Multiple weeks on CVE from nvidia.
- v4.1.5
- Schedule: targeting ~6 mon (Targeting October)
- No driver on schedule yet.
 
- 10583 - Potential CVE from 4 years ago issue in libevent.. but might not need to do anything.
- Updated one company reported scanner didn't report anything.
- Waiting on confirmation that patches to remove dead was enough.
 
- 
An RC this week. 
- 
Discuss MCAhttps://github.com/open-mpi/ompi/pull/10793- When you pass mca parameter to PRTERUN, it has to figure out which MCA system it's going to.
- If you want to be sure, just say -omca,-prtemca,-pmximca
- Jeff and Briant came up with a solution, they're working on.
 
 
- When you pass mca parameter to PRTERUN, it has to figure out which MCA system it's going to.
- 
Is this related to submodule? - 
Unrelated to -MCA, we share a lot of replicated M4 code between OMPI, PMIX, PRRTE. - They have diverged in radical and subtle ways.
 
- 
Last week, added another submodule pointer to OMPI 
- 
Took handful of M4 macros and combined them there. 
- 
More consolidation there over time. 
- 
Most part this is behind the scene, but will need to git submodule init. 
- 
Purpose is it'll just be M4 files. 
- 
--mcais how we've set OMPI mca parameters in Open MPI- Could PRRTE just "do the right thing" for --mca
- Agree --mcais Open MPI specific options.
- when pprte and pmix split off they prefixed.
- They don't have ownership over MCA.
- End of the day our docs can't change bec
 
- Could PRRTE just "do the right thing" for 
 
- 
- 
10779 OPAL "core" library for internal usage - NEED to see if it made it's way to v5
- Approach to seperate out pieces of OPAL for core and top
- All internal things, not exposed to user
- Brian and George worked on it, and then Josh picked it up and PRed 10779
- Still in Draft because he wants to resolve any high level issues
- As far as code layout, could move some things around, but if we do this too much, worried about dropping history...
- We'd have hundreds or thousands of
 
 
- 
Discuss mca_base_env_listhttps://github.com/open-mpi/ompi/pull/10788- Did google around, and this is documented https://oar.imag.fr/wiki:passing_environment_variables_to_openmpi_nodes
- Mentions that -xis deprecated?
 
- Mentions that 
- Easy to fix Mellanox CI, but SHOULD we?
- Lets remove the test, and add it to an Issue 10698.
 
- Did google around, and this is documented https://oar.imag.fr/wiki:passing_environment_variables_to_openmpi_nodes
- 
Discuss Remaining PRRTE CLI issues (https://github.com/open-mpi/ompi/issues/10698) - 
-Ndocument an error if they try to error if--map-byconflict.
- 
--show-progress- do the little...on terminal to display, now it doesn't do anything.- DOE may set this by default in MCA parameters (makes some users feel happy)
 
- 
--display-topoGenerally we've tried to be backwards compatible.
- 
-vversion
- 
-Vverbose
- 
-s|--preload-binary<- functionally it works, but with-ngets messed up
- rankfile <- NOT deprecating
- --mca is Open MPI's framework
- No gprtemca. Created by PRRTE, but do we continue to support --gpmixmca?
- --test-suicide and others all prrtedameon not exposed to the users.
- passed to prrte launcher
 
 
- 
- 
Posted Issue Open-MPI #10698 with about 13 issue, that will need 
- 
No longer trust the verbage here, based on Ralph's comment - Not recognized from mpirun, but sited in --help.
- Some of these aren't possible??? and mpirun -> prterun (one shot thing)
 
- 
Should mpirun be able to talk to an existing dvm??? - Or is it always a 1 shot thing?
- If we have it talk to an existing DVM,
- prte to startup prteds, and pruns at that.
- If you're using MPI front-end, and want to interact with DVM, how should we tell users to do that?
- What should they do?
- Go through mpirun, or go through prun (with ompi personality?)
 
- Thomas can look and see if you can get everything you need.
- There were some common things that were difficult when switching between the two.
- Was there an option for this in v4.1?
- Yes, but perhaps wasn't working much.
- Are there legacy command line options that we should support or alias?
 
 
- 
Are we dropping DVM support for v5? - How did this work in v4?
- Howard thought you fired up an orte something, and that would provide a command line
- Couldn't do all of this with mpirun, it was a two stage process.
- Had to start DVM manually, and got back a URI
- But thought if you sourced this scziso and gave it a URI, it would do all of the right things.
 
- Could add support if the user fired up using PRTE the DVM, and got URI back.
- Don't have ompi-dvm executable in v5, so this is already a deviation.
 
- What do we do?
- support same CLI options (and executables, etc as documented for v4.x
- Don't support at all in v5, and if you want to do DVM things
- Maybe something in the middle.
 
- Does anyone care about DVM?
- Can we run ompi_scizo / personality with vanilla PRUN?
- Some people on call DO care about DVM.
 
- Early days of Sesions needed DVM run (no longer needed in main/v5)
 
- 
Usually if customers are interested in doing this, they're willing do to a bit more work. - But if we want to get v5.0.0 out in near future, it'd be more likely if we
- Thomas gets a lot of use with mini-task, some are MPI parallel.
- This is where DVM is useful because slamming lots of serial and parallel jobs in a short time.
- If they can do this via prun to get ompi_schziso doesn't matter the path.
- Thomas will investigate proper options.
 
- Could do a CLI interface for mpirun in a future version to have mpirun not call prterun
- Don't want to rush this.
 
 
- 
Schedule: - PMIx and PRRTE changes coming at end of August.
- PMIx v3.2 released.
- Try to have bugfixes PRed end of August, to give time to iterate and merged.
 
- Still using Critical v5.0.x Issues (https://github.com/open-mpi/ompi/projects/3) yesterday
 
- PMIx and PRRTE changes coming at end of August.
- 
Docs - 
mpirun --helpis OUT OF DATE.- Have to do this relatively quickly, before PRRTE releases.
- Austen, Geoff and Tomi will be
- REASON for this, is because mpirun command line is in PRRTE.
 
 
- 
- 
mpirun manpage needs to be re-written. - Docs are online and can be updates asyncronously.
- Jeff posted PR to document runpath vs rpath
- Our configure checks some linker flags, but there might be default in linker or in system that really governs what happens.
 
 
- 
Symbol Pollution - Need an issue posted. - OPAL_DECLSPEC - Do we have docs on this?
- No.  Intent is where do you want a symbol available?
- Outside of your library, then use OPAL_DECLSPEC (like Windows DECLSPEC)
- I want you to export this symbol.
 
 
- No.  Intent is where do you want a symbol available?
- need to clean up as much as possible.
- Open-MPI community's perspective, our ABI is just MPI_Symbols
- Still unfortunate. We need to clean up as much as possible.
 
- OPAL_DECLSPEC - Do we have docs on this?
- Case of QThreds, where they need a recursive lock.
- A configury problem was fixed.
 
- Just working on getting it ready for OMPI.
- converting structures to OPAL objects.
- Also adding libcuda linking (instead of DLOPEN)
 
- William will test JEff's PR [10763?] this week.
- In Jeffs Roll Up the docs
- Called out accelerator and show-load-errors
- Not sure what distros will want to do, since some of these accelerator are not open
 
- Packager building Open MPI,
- Example: say only 20% of nodes have accelerators, only installed libraries on those nodes.
- Problem why everything today is dlopened...
- Get scary warnings to fail to open components on some nodes.
- If you build accelerator components by default, they'll be part of libmpi.so
 
- But if you know you have hetrogenous in software/hardware (only accelerators on 20%)
- Build accelerator components as so components.
- Can still run, but don't want scary warnings.
 
- Packager build accelerator components as SOs.
- Put SOs in sub package of Open MPI, and only that subpackage depends on ACCELERATOR LIBs
- WONT get scary message since SOs only on nodes that have these libs
 
 
- Switching to builtin atomics,
- 10613 - Prefered PR. GCC / Clang should have that.
- Next step would be to refactor the atomics for post v5.0.
- Waiting on Brian's review and CI fixes.
 
- Joseph will post some additional info thing in the ticket
- Jenkins is currently messed up.  Brian is looking at it.
- New PRs will be stuck for a while.
 
- We're probably not getting together in person anytime soon.
- So we'll send around a doodle to have time to talk about our rules.
- Reflect the way we worked several years ago, but not really right now.
 
- we're to review the admin steering committee in July (per our rules):
- we're to review the technical steering committee in July (per our rules):
- We should also review all the OMPI github, slack, and coverity members during the month of July.
- Jeff will kick that off sometime this week or next week.
 
- In the call we mentioned this, but no real discussion.
- Wiki for face to face: https://github.com/open-mpi/ompi/wiki/Meeting-2022
- Might be better to do a half-day/day-long virtual working session.
- Due to company's travel policies, and convenience.
- Could do administrative tasks here too.
 
 
- Might be better to do a half-day/day-long virtual working session.