Skip to content

Conversation

@mrava87
Copy link
Contributor

@mrava87 mrava87 commented Jul 1, 2025

This PR fixes a number issues that arose when trying to increase the number of ranks in the test suite to 9 or 16.

As a consquence, I have also decided to modify the GA build to run tests with 2,4,9 ranks instead of 2,3,4

@mrava87 mrava87 requested a review from rohanbabbar04 July 1, 2025 08:39
@mrava87
Copy link
Contributor Author

mrava87 commented Jul 1, 2025

@rohanbabbar04 no idea why the openmpi runs with N=9 stall, but those with intelmpi and mpich don't... any idea?

note that without the proposed changes some of the tests failed for N=9 for me locally... most are tests I added with the mask and add some clear if/else statement on the number of ranks and did not stop if a different one was passed, and another one was for the solvers due to some stochasticity in the random (if I run it independently, instead as part of the entire test suite, they were passing for N=9)

PS: in my local conda env, I used install_conda and got mpi4py installed with MPICH (which seems to be the default if not specifying anything... so I guess I can try openmpi and debug locally..

@mrava87
Copy link
Contributor Author

mrava87 commented Jul 2, 2025

This works locally, so not really sure what is happening in the CI...

image

@rohanbabbar04
Copy link
Collaborator

Hmm, let me give it a look

@mrava87
Copy link
Contributor Author

mrava87 commented Jul 10, 2025

Hmm, let me give it a look

@rohanbabbar04 did you manage to have some time to take a look at this?

@rohanbabbar04
Copy link
Collaborator

Hmm, let me give it a look

@rohanbabbar04 did you manage to have some time to take a look at this?

Oops, I missed this one. I will look at it during the weekend to find out why it is happening, and let you know

Copy link
Collaborator

@rohanbabbar04 rohanbabbar04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a minor change to the OpenMPI execution by using ^openib to exclude InfiniBand, since GitHub Actions runners may stall/show error messages when running with higher ranks. I think we should add it since it will be required as well.

I will now merge this 🙂

@rohanbabbar04 rohanbabbar04 merged commit c6b3dfc into PyLops:main Jul 14, 2025
62 checks passed
@mrava87 mrava87 deleted the test-highsize branch July 14, 2025 07:07
@mrava87
Copy link
Contributor Author

mrava87 commented Jul 14, 2025

Great, thanks a lot @rohanbabbar04 for investigating this 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants