Skip to content

Meeting 2025 03 14

Jeff Squyres edited this page Mar 14, 2025 · 10 revisions

Summary of discussion with George/Ralph/Tommy/Jeff

This prefix thing kinda sucks (https://github.com/openpmix/prrte/pull/2154); it's getting complicated and Jeff fears it will be difficult to maintain over time. Is there a long-term path to get us out of this business?

Idea:

  • This particular problem comes down to installdirs.

    • We have installdirs for those who relocate installations (e.g., NVIDIA's Open MPI packaging).
    • At run time, we need to find plugins and show_help files.
      • Do we need to find anything else?
    • Can we solve this? I.e., can we find what we need at runtime via some other mechanism?
  • We still have problems of multiple levels in the stack (OMPI, PRTE, PMIX) re-using MCA things.

    • We've sorta solved that by replicating everything and using different prefixes in env variable names and the like.
    • But it still kinda sucks -- lots of corner cases come up. And code duplication.
    • We're not going to solve that problem today.

Let's look at the installdirs issue.

Short term / v5.0.x

  • Ralph's PR already merged to PRTE: we now have 4 prefixes (CLI params and env vars)

  • George will investigate: in OMPI's installdirs init:

    • If user sets env variable(s), use that(them)
    • If user didn't set env variable(s):
      • Make LD call to find filesystem path of library containing opal_init (or whatever symbol makes sense)

      • Take dirname of that

      • Compare to installdirs libdir

      • If it's the same -- ok, we're done

      • If it's not the same:

        • Look at old libdir: is it defined in terms of prefix? If so, see if new libdir can distill prefix from that.
        • Otherwise, assume prefix is one dir up from that
        • Set installdirs prefix to that value
      • This is good enough for OMPI v5.0.x / NVIDIA

      • Make sure to document this process in the RST docs somewhere

Can we get this to work with a small-ish patch? Assume yes. George will prototype.

Longer term / main/v6.0.x

  • Include everything that George did for v5.0.x

    • Perhaps get fancier trying to distill prefix from libdir (TBD)
    • Document in the RST whatever fanciness we do
  • installdirs currently has a bunch of dirs that nothing in the C code uses

    • Let's remove all the dirs that we are not using -- only keep the ones that we actually need.
      • Perhaps we only need libdir and help files dir...? (TBD)
    • If we remove things, we need to update documentation to remove all corresponding env variables / MCA params.
  • After removing what dirs aren't necessary, we should stat() all dirs in installdirs and complain if something doesn't exist

  • Can we slurp the text help files into C code somehow?

    • This would be one less thing we have to find at run time
      • ...and potentially one more entry we can remove from installdirs
    • Maybe run some (python?) script during make that converts the text files into C code that is then compiled.
      • Random note: clang v16 doesn't like multi-line C strings. Will need to be a little clever about how to encode the strings.
    • Will also need to upate opal_show_help() to get text source from C variables instead of reading text files.
  • Open question: if the new prefix-setting mechanism works reliably, can we sunset the prefix-setting CLI/env var mechanisms?

Clone this wiki locally