Skip to content

mpi extension changes#207

Merged
samtkaplan merged 15 commits intomasterfrom
sam/mpi-as-ext
Sep 2, 2025
Merged

mpi extension changes#207
samtkaplan merged 15 commits intomasterfrom
sam/mpi-as-ext

Conversation

@samtkaplan
Copy link
Member

@samtkaplan samtkaplan commented Aug 28, 2025

Changes compared to #206:

  1. MPIDevitoExt -> MPIExt
  2. do not use Requires.jl
  3. remove dependence on MPIPreferences
  4. switch from [extras] section in Project.toml to test/Project.toml
  5. add test/LocalPreferences.toml to encourage use of the system mpi
  6. comment out MPI tests that were failing locally for me :(
  7. comment out devitopro (when running via the decoupler) tests that are failing.

TODO:

  • turn back on DevitoPro tests in ci.

@mloubout
Copy link
Contributor

What's the issue with the skipped mpi tests? They pass in my PR so not sure why they wouldn't now.

@samtkaplan
Copy link
Member Author

What's the issue with the skipped mpi tests? They pass in my PR so not sure why they wouldn't now.

I wish I knew. They failed (with PR #206) when I ran those tests locally.

@mloubout
Copy link
Contributor

You likely need to turn autopadding off locally. seems to create few issues with the mpi array. See
https://github.com/ChevronETC/Devito.jl/pull/206/files#diff-b803fcb7f17ed9235f1e5cb1fcd2f5d3b2838429d4368ae4c57ce4436577f03fR30

# 2024-08-15 JKW these two ABox tests are broken -- some kind of API change?
@testset "ABox Time Function" begin
# TODO - 2024-08-15 JKW these two ABox tests are broken -- some kind of API change?
@test_skip @testset "ABox Time Function" begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those pass for me, what error do you get?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ Info: running pro tests with the decoupler
Precompiling Devito...
   2923.0 ms  ✓ Devito
  1 dependency successfully precompiled in 3 seconds. 27 already precompiled.
Test Summary:         | Pass  Total  Time
ABox Expanding Source |    5      5  4.8s
Decoupling enabled with 2 workers
No override provided for `abox(time, n)`
Operator `Kernel` ran in 0.05 s
ABox Time Function: Error During Test at /home/cvx/.julia/dev/Devito/test/devitoprotests.jl:37
  Test threw exception
  Expression: (data(u))[:, :, 1] ≈ zeros(Float32, 5, 5)
  PyError (PyObject_GetAttrString) <class 'ValueError'>
  ValueError('Cannot access `_data_allocated` as unfinalized')
    File "/home/cvx/.conda/envs/conda_jl/lib/python3.10/site-packages/devitopro/types/enriched.py", line 354, in wrapper
      raise ValueError("Cannot access `%s` as unfinalized" % func.__name__)
  
  Stacktrace:
   [1] _getproperty(o::PyObject, s::String)
     @ PyCall ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:304
   [2] __getproperty
     @ ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:312 [inlined]
   [3] getproperty
     @ ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:317 [inlined]
   [4] data(x::TimeFunction{Float32, 3, Devito.DevitoMPIFalse})
     @ Devito ~/.julia/dev/Devito/src/Devito.jl:956
   [5] macro expansion
     @ /opt/julia/share/julia/stdlib/v1.11/Test/src/Test.jl:676 [inlined]
   [6] macro expansion
     @ ~/.julia/dev/Devito/test/devitoprotests.jl:37 [inlined]
   [7] macro expansion
     @ /opt/julia/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
   [8] top-level scope
     @ ~/.julia/dev/Devito/test/devitoprotests.jl:22
ABox Time Function: Error During Test at /home/cvx/.julia/dev/Devito/test/devitoprotests.jl:38
  Test threw exception
  Expression: (data(u))[2:end - 1, 2:end - 1, 2] ≈ ones(Float32, 3, 3)
  PyError (PyObject_GetAttrString) <class 'ValueError'>
  ValueError('Cannot access `_data_allocated` as unfinalized')
    File "/home/cvx/.conda/envs/conda_jl/lib/python3.10/site-packages/devitopro/types/enriched.py", line 354, in wrapper
      raise ValueError("Cannot access `%s` as unfinalized" % func.__name__)
  
  Stacktrace:
   [1] _getproperty(o::PyObject, s::String)
     @ PyCall ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:304
   [2] __getproperty
     @ ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:312 [inlined]
   [3] getproperty
     @ ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:317 [inlined]
   [4] data(x::TimeFunction{Float32, 3, Devito.DevitoMPIFalse})
     @ Devito ~/.julia/dev/Devito/src/Devito.jl:956
   [5] macro expansion
     @ /opt/julia/share/julia/stdlib/v1.11/Test/src/Test.jl:676 [inlined]
   [6] macro expansion
     @ ~/.julia/dev/Devito/test/devitoprotests.jl:38 [inlined]
   [7] macro expansion
     @ /opt/julia/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
   [8] top-level scope
     @ ~/.julia/dev/Devito/test/devitoprotests.jl:22
ABox Time Function: Error During Test at /home/cvx/.julia/dev/Devito/test/devitoprotests.jl:21
  Got exception outside of a @test
  PyError (PyObject_GetAttrString) <class 'ValueError'>
  ValueError('Cannot access `_data_allocated` as unfinalized')
    File "/home/cvx/.conda/envs/conda_jl/lib/python3.10/site-packages/devitopro/types/enriched.py", line 354, in wrapper
      raise ValueError("Cannot access `%s` as unfinalized" % func.__name__)
  
  Stacktrace:
    [1] _getproperty(o::PyObject, s::String)
      @ PyCall ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:304
    [2] __getproperty
      @ ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:312 [inlined]
    [3] getproperty
      @ ~/.julia/packages/PyCall/1gn3u/src/PyCall.jl:317 [inlined]
    [4] data(x::TimeFunction{Float32, 3, Devito.DevitoMPIFalse})
      @ Devito ~/.julia/dev/Devito/src/Devito.jl:956
    [5] macro expansion
      @ ~/.julia/dev/Devito/test/devitoprotests.jl:39 [inlined]
    [6] macro expansion
      @ /opt/julia/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
    [7] top-level scope
      @ ~/.julia/dev/Devito/test/devitoprotests.jl:22
    [8] include(mod::Module, _path::String)
      @ Base ./Base.jl:557
    [9] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:323
   [10] _start()
      @ Base ./client.jl:531
Test Summary:      | Error  Total  Time
ABox Time Function |     3      3  4.9s
ERROR: LoadError: Some tests did not pass: 0 passed, 0 failed, 3 errored, 0 broken.
in expression starting at /home/cvx/.julia/dev/Devito/test/devitoprotests.jl:21
/home/cvx/.conda/envs/conda_jl/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[10707,1],0]
  Exit code:    1
--------------------------------------------------------------------------
ERROR: LoadError: failed process: Process(`mpiexec -n 1 julia --code-coverage devitoprotests.jl`, ProcessExited(1)) [1]

Stacktrace:
 [1] pipeline_error
   @ ./process.jl:598 [inlined]
 [2] run(::Cmd; wait::Bool)
   @ Base ./process.jl:513
 [3] run
   @ ./process.jl:510 [inlined]
 [4] #9
   @ ~/.julia/dev/Devito/test/runtests.jl:26 [inlined]
 [5] withenv(::var"#9#10", ::Pair{String, String}, ::Vararg{Pair{String, String}})
   @ Base ./env.jl:265
 [6] top-level scope
   @ ~/.julia/dev/Devito/test/runtests.jl:25
 [7] include(fname::String)
   @ Main ./sysimg.jl:38
 [8] top-level scope
   @ none:6
in expression starting at /home/cvx/.julia/dev/Devito/test/runtests.jl:20
ERROR: Package Devito errored during testing

src/Devito.jl Outdated
export nsimplify, origin, size_with_halo, simplify, solve, space_order, spacing, spacing_map
export step, subdomains, subs, thickness, value, value!

if !isdefined(Base, :get_extension)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the intention, but it won't work, that's why it needs that annoying Requires workaround. MPI is not a dependency, so this will error with a missing dependency if it tries to include it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to follow this: https://pkgdocs.julialang.org/v1/creating-packages/#Transition-from-normal-dependency-to-extension. I guess I need to add MPI to the [deps] section. Anyways, since Julia 1.10 is lts, not sure how much to worry about it. We only test for 1.10 and latest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I just noticed that Devito.jl seems to require Julia >= 1.9 anyways. So, I guess it is safe to remove that conditional entirely.

@samtkaplan
Copy link
Member Author

You likely need to turn autopadding off locally. seems to create few issues with the mpi array. See https://github.com/ChevronETC/Devito.jl/pull/206/files#diff-b803fcb7f17ed9235f1e5cb1fcd2f5d3b2838429d4368ae4c57ce4436577f03fR30

Thanks. Using that environment variable allowed me to turn back on a bunch of mpi tests.

[MPIPreferences]
__clear__ = ["preloads_env_switch"]
_format = "1.0"
abi = "OpenMPI"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised this works, the CI installs mpich not openmpi

Copy link
Member Author

@samtkaplan samtkaplan Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I was wondering about that too. I guess it is because it is using system MPI, so things like "abi" in the LocalPreferences.toml is informative but does not really contribute to the config. Anyways, is there a preferred mpi for devito?

devitoversion:
- 'main'
- 'devitopro'
# - 'devitopro'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats going on here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is related to credentials. let's chat offline about it.

env:
DEVITO_LANGUAGE: "openmp"
DEVITO_ARCH: "gcc"
OMP_NUM_THREADS: "2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we get the number of threads with a function call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the function call that you have in mind? Do you mean using the openmp c api?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github runners are fixed 4-core config, so that's bit overkill to go through the trouble of infering it
https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories

Copy link
Member

@jkwashbourne-oss jkwashbourne-oss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments about minor things like number of threads by default and why devitopro is comments out on line 19 in ci.yml. otherwise I am excited to test.

@samtkaplan samtkaplan merged commit 952677d into master Sep 2, 2025
7 checks passed
@samtkaplan samtkaplan deleted the sam/mpi-as-ext branch September 2, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants