Skip to content

Conversation

@astroC86
Copy link
Contributor

@astroC86 astroC86 commented Jun 2, 2025

depends on #139

@astroC86 astroC86 changed the title Initial implementation of SUMMA like Matrix Mul #129 Initial implementation of SUMMA like Matrix Mul Jun 2, 2025
Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astroC86 good start!

As you will see I left a number of comments, many are stylistic (still to review test_matrixmult, but in some cases I am not entirely sure I can follow (especially as I am not sure if this is meant to match the algorithm in your GSoC proposal Appendix)...

In general, I think it would be important if you add a well written docstring to the SUMMAMatrixMult method and comments in both the code and the example; after that I will do another full review and we will hopefully be closer to a final version that we can have into the pylops-mpi library 🤓

Finally, whilst I think that this algorithm is very interesting and worth having, we should make sure to understand if this is really the SUMMA algorithm from https://www.netlib.org/lapack/lawnspdf/lawn96.pdf (I am not so sure, as there they block each matrix over both rows and columns) and either refer to a paper that implements your algorithm or give it a name (not SUMMA) and explain it in quite some details in the Notes of the docstring of the class.

@mrava87 mrava87 changed the title Initial implementation of SUMMA like Matrix Mul Initial implementation of SUMMA like MatrixMult Jun 9, 2025
@mrava87 mrava87 added the enhancement New feature or request label Jun 9, 2025
@astroC86 astroC86 force-pushed the astroC86-SUMMA branch 2 times, most recently from 12d73e4 to f22f5ec Compare June 13, 2025 20:02
Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astroC86 thanks for the updates, the main class and example start to look much better.

I have pushed some code, mostly trying to clean things up a bit and make them more consistent with the rest of the codebase. Whilst unfortunately we have not yet been able to setup proper linting in pylops-mpi - we will eventually do it soon following what we have in pylops - we try to keep the line length below 88 (some of your lines are very very long)... also when I rebuilt the documentation, I noticed that some of your text both in the docstring of MPIMatrixMult and in the example was not rendering properly - I guess you didn't build the documentation yourself yet?

More importantly, I made a change towards the end of the example as discussed offline trying once again to be consistent with the rest of the codebase: basically we want to check consistency with serial code in rank0 AFTER grouping the distributed arrray... whilst doing this, I realized that your definition of mask in x was wrong. I would like to ask you to modify the tests accordingly to have the same kind of checks. Note that this code won't run if you don't stack it on top of this PR ( (#139) - once in, we can rebase it.

And our convention in PyLops and PyLops-MPI (See for example https://pylops.readthedocs.io/en/v2.2.0/gallery/plot_matrixmult.html#sphx-glr-gallery-plot-matrixmult-py) is to have operators that map inputs of size m to outputs of size n (so A is nxm) but you have decided to do the opposite... I understand that here neither of the two is strictly possible but for consistency, and to avoid confusion, I'd like to ask to modify all our codes so that A is nxk and X is kxm so that n and m still are in the right position although multiplied by k 😄

Finally, this one I did not change as I need your feedback. I noticed that if I swap X_local = self._layer_comm.bcast(x_arr if self._group_id == self._layer_id else None, root=self._layer_id) with `X_local = x_arr I still get correct results.. and looking back at your notes it makes somehow sense to me as hat you bcast (say B0|B1 from P0 to L0) are already present in the rank you passed them to (P1 has already B0|B1)... unless I am missing something?

@hongyx11 I think it is time for you to do a proper technical review on the approach and implementation

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astroC86 good stuff!

I agree with the last changes (see I only left one comment on wider implications of having if not is_active: exit(0)) and I am going to merge this 😄

base_comm = MPI.COMM_WORLD
comm, rank, row_id, col_id, is_active = MPIMatrixMult.active_grid_comm(base_comm, N, M)
print(f"Process {base_comm.Get_rank()} is {"active" if is_active else "inactive"}")
if not is_active: exit(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in real-life problem this would not fly as we usually don't want to use operators in isolation but combine them... so this way I suspect some ranks will be killed and if used by other operators the overall run will crash.

I am not so concerned as in practice we will hardly end up in these edge cases, so now if someone does for just MatrixMult things will still work, which is nice... if they do it for more complex operators that include MatrixMult, they will have to handle it...

Copy link
Contributor Author

@astroC86 astroC86 Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can get rid of the exit if you want, it shouldn't affect anything since all operations from then on would use the new communicator that includes only the active procs in the newly defined square grid and so other procs are free to do what they want, or just remain idle. In theory we can come up with an example where:

base_comm = MPI.COMM_WORLD
comm, ......= MPIMatrixMult.active_grid_comm(base_comm, N, M)
....
## matrix mul here would use the new comm `comm`
.....
# all procs have to reach this barrier before proceeding to perform the second op
base_comm.barrier() 
## second op here
....

Copy link
Contributor

@mrava87 mrava87 Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure you get my point here.. in PyLops-MPI we do not really care to apply two operators one after the other, this is not the purpose of the library; what we want/need is to be able to chain/stack atomic operators (like MatrixMult) to create more scientifically interesting ones and solve inverse problems. So what I really want to be able to do is something like:

Aop = MPIMatrixMult(A_p, M, base_comm=comm, dtype=dtype_str)
...

Dop = FirstDerivative(dims=(N, col_end_X - col_start_X), axis=0, dtype=np.float32)
DBop = MPIBlockDiag(ops=[Dop, ], base_comm=comm, mask=cols_id)
Op = DBop @ Aop
y1_dist = Op @ x_dist

I added this to the test and check consistency with the serial version and everything seems to work (at least for the case with all rank active)

This was an oversight from my side, I should have asked to check / checked that we could do it before merging... not too bad, we can tests and examples now - will try tonight to open a PR with what I have so far

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ook, I see your concern! That is weird I will investigate

@mrava87 mrava87 merged commit 6066ec3 into PyLops:main Jul 2, 2025
61 checks passed
@mrava87 mrava87 changed the title Initial implementation of SUMMA like MatrixMult Initial implementation of row/column blocked MPIMatrixMult Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants