Update ASTRA integration #703

askorikov · 2025-10-16T05:42:08Z

This PR adds support for CUDA backend of ASTRA, including the support for CuPy inputs (partially addressing #701). As you can see, the code becomes quite a bit more messy, but that is to cover several scenarios at the same time:

CPU input + CPU execution
CPU input + GPU execution (either via setting engine or projector_type to cuda)
GPU input + GPU execution (including zero-copy GPU data exchange, which is only supported for 3D data in ASTRA at the moment, so we need to do a hack by representing 2D geometry as 3D with one slice).

With this, I have the following observations/questions:

The adjoint for the CUDA backend of ASTRA is much less matched, so the tests produce >10% relative error. We do expect the adjoint to be mismatched, but this error is still way too high. I will be looking at that on the ASTRA side in the coming time. In the meantime, the tests with CUDA backend are skipped.
Do we need to assert that the input is on CPU for cpu engine and on GPU for cuda engine?
In principle, this operator almost works with JAX on both CPU and GPU, except for ensuring devices appropriately. What is now the ambition for adding JAX support in PyLops operators?
The default projector_type now (strip) favors accuracy at the expense of performance. For experimental data the accuracy difference is usually not noticeable, so the faster linear projector is more commonly used. It's also the only projector available in CUDA backend. Let me know what you think fits the expected audience better.
ASTRA only supports float32 dtype internally at the moment. How do we go about it?

This includes supporting CuPy inputs (via zero-copy GPU data exchange). ci: add ASTRA to dev-gpu requirements

mrava87 · 2025-10-18T21:23:25Z

Thanks @askorikov!

Let me reply to the questions below and then I will do a more general review of the code.

This PR adds support for CUDA backend of ASTRA, including the support for CuPy inputs (partially addressing #701). As you can see, the code becomes quite a bit more messy, but that is to cover several scenarios at the same time:

CPU input + CPU execution

CPU input + GPU execution (either via setting engine or projector_type to cuda)

GPU input + GPU execution (including zero-copy GPU data exchange, which is only supported for 3D data in ASTRA at the moment, so we need to do a hack by representing 2D geometry as 3D with one slice).

It is kind of expected that the code becomes a bit more complicated for operators having to handle multiple backend when it's not a pure numpy/cupy switch, so in principle I have no problem.

So far PyLops' philosophy has been that we don't want operators to be magic black boxes so we want users to always provide numpy arrays when the operator works on CPU and cupy (or Jax) arrays for GPUs. In special cases where you cannot solve the entire problem on the GPU, we have an auxiliary operator called ToCupy that can be chained like any other PyLops operator and can be used to lift data in and out of the GPU as needed. An example for your case would be, split the CT operator into a stack of N CT operators, each with a portion of angles, and put them into a VStack; this way, even if the entire data does not fit the GPU one, one can still apply the operator on the GPU and then move the partial data out prior to moving to the next bit. Now, if your GPU operator does fancier things (like streaming), I am happy to keep option 2 but if you just move everything to the GPU and then apply the operator, probably having options 1 and 3 is enough, and more in line with the rest of PyLops operators?

With this, I have the following observations/questions:

The adjoint for the CUDA backend of ASTRA is much less matched, so the tests produce >10% relative error. We do expect the adjoint to be mismatched, but this error is still way too high. I will be looking at that on the ASTRA side in the coming time. In the meantime, the tests with CUDA backend are skipped.

Well, in general we strive for forward-adjoint pairs that pass the dot test, for fp64 quite tighly (say atol=1e-6) and for fp32 of course much less.... if you know that your operator isn't perfectly matched we can still add it and add a test with the threshold you expect, I would prefer that to skip (or at least if you skip it I would like the tests to run both the forward and adjoint somehow to make sure we test they at least run 😄

Do we need to assert that the input is on CPU for cpu engine and on GPU for cuda engine?

See above; in general, we do not assert but we have this as a convention... or in other words, if one expects to pass a CuPy array the other input parameters of the operator should also be passed as CuPy arrays - though I don't think this applies to CT2D as you dispatch everything to the Astra operator....

In principle, this operator almost works with JAX on both CPU and GPU, except for ensuring devices appropriately. What is now the ambition for adding JAX support in PyLops operators?

We do not aim a 100% coverage with JAX - see this tables https://pylops.readthedocs.io/en/latest/gpu.html#supported-operators. So if you support it great, if not we just need to make sure the row for CT2D is updated accordingly.

The default projector_type now (strip) favors accuracy at the expense of performance. For experimental data the accuracy difference is usually not noticeable, so the faster linear projector is more commonly used. It's also the only projector available in CUDA backend. Let me know what you think fits the expected audience better.

Mmh I guess this was me making this choice... I trust your judgement of what you think is best. Maybe we can just follow the ASTRA default?

ASTRA only supports float32 dtype internally at the moment. How do we go about it?

I think this is fine. I would suggest we do something like that

xdtype = x.dtype
...
y = y.astype(x)
return y

so when we chain operators if one wants to use fp64 we don't break the chain by all of a sudden passing out a fp32. We can add a note to the doc saying that even if you pass fp64 the internal operations are done in fp32.

mrava87

Very nice addition!

Overall the code changes look great, just left a few minor suggestions 😄

pylops/medical/ct.py

…ange with ASTRA

* Separate basic functionality and adjointness tests, disable adjointness tests for CUDA backend of ASTRA * Add tests for data that is not natively compatible with ASTRA * Remove projector_type tests (too messy for the purpose, and CUDA backend doesn't support them at all) * Use fixture pattern

* Probably related to float32 dtype used by ASTRA

2.2+ is needed for NumPy 2 support, 2.3+ for GPU data exchange support

askorikov · 2025-10-24T12:49:23Z

A couple more questions:

I've added casting of the output to the dtype of the input (potentially with a warning), but what is the semantics of dtype argument in the __init__? Does it need to force the dtype as well?
I forgot that for GPU input we also require the input to be contiguous. Now I added this check, but when testing I discovered that the .dot method of LinearOperator already (implicitly) makes the input contiguous here, potentially at the expense of copying the array:

pylops/pylops/linearoperator.py

Line 685 in 18321da

x = x.ravel()

Is it an intended behavior? In this case, I can remove the redundant check.
I guess it would be nice to set engine="cuda" when using CuPy. Is checking pylops.utils.deps.cupy_enabled a good way to determine this default?
Is JAX with CPU backend relevant, or do you expect mostly GPU?

mrava87 · 2025-10-24T18:48:44Z

A couple more questions:

I've added casting of the output to the dtype of the input (potentially with a warning), but what is the semantics of dtype argument in the __init__? Does it need to force the dtype as well?

So dtype is there for a bit of an historical reason.. when we started PyLops we were subclassing from scipy.sparse.linalg.LinearOperator and there dtype was mandatory. In principle we use it to keep track of the overall type of an operator when we chain/combine multiple operators, eg:

d = np.ones(10)
D1 = Diagonal(d.astype(np.float32), dtype="float32")
D2 = Diagonal(d*2, dtype="float64")
D = D1 @ D2
print(D)
> <10x10 _ProductLinearOperator with dtype=float64>

but in practice we are not so strict, in the sense that the below does not really respect the dtype of the operator....

d = np.ones(10, dtype="float32")
D1 = Diagonal(d, dtype="float32")
D2 = Diagonal(d*2, dtype="float32")
D = D1 @ D2

x = np.ones(10, dtype="float64")
y = D @ x

print(D, y.dtype)
><10x10 _ProductLinearOperator with dtype=float32>
>dtype('float64')

I have been always tempted to make this more strict but i) it would require a major version bump, ii) it would require some conventions to be set (ie if the input and operator do not match in dtype, which one wins) or being very strict raising errors all the time there is no match.. so not sure at this point 😉

I forgot that for GPU input we also require the input to be contiguous. Now I added this check, but when testing I discovered that the .dot method of LinearOperator already (implicitly) makes the input contiguous here, potentially at the expense of copying the array:

pylops/pylops/linearoperator.py

Line 685 in 18321da

x = x.ravel()

Is it an intended behavior? In this case, I can remove the redundant check.
Yes and no. Dot is invoked if you do D @ x but one could also call directly D.matvec(x) (we tend to do that inside solvers for example as it avoids doing a bunch of checks that make dot a bit slower that just invoking the matvec/rmatvec directly.. so if you really need contiguous arrays, I would suggest to keep your internal check 😄

I guess it would be nice to set engine="cuda" when using CuPy. Is checking pylops.utils.deps.cupy_enabled a good way to determine this default?
You mean if a user does not sets engine="cuda" but then passes a CuPy array? We tend to avoid outsmarting users, we expect users to be smart... so if they do something silly I rather an error is raised than something is changed under the hood for them. So pylops.utils.deps.cupy_enabled (and similar) is something we only use to check if a library is present and import it (or import some method), not for the check you want to do (if I understand what you want to do...)

Is JAX with CPU backend relevant, or do you expect mostly GPU?
Yeah why not.. we see JAX as a replacement for the pair numpy/cupy since JAX claims to be a library where you write one code and run it on different hardware 😉

Hope this makes sense?

askorikov added 5 commits October 15, 2025 16:59

tests: enabled fanflat geometry test

8cd0eeb

fix: type annotation

1c96727

fix: deleter failing if object initialization failed before

f8c7f1e

feat: add support for CUDA backend of ASTRA

84e2d1e

This includes supporting CuPy inputs (via zero-copy GPU data exchange). ci: add ASTRA to dev-gpu requirements

tests: make parameter sets named

e3996ca

askorikov force-pushed the update-astra-integration branch from 00fe264 to e3996ca Compare October 16, 2025 12:40

mrava87 requested changes Oct 18, 2025

View reviewed changes

askorikov added 4 commits October 22, 2025 10:22

doc: fix argument description wording

1a08cd8

fix: add missing type annotation

0eaf364

fix: catch unsupported engine types

e1a9eb8

doc: add docstring to the hack used to enable zero-copy GPU data exch…

0a5666d

…ange with ASTRA

askorikov force-pushed the update-astra-integration branch from baa4d61 to 0a5666d Compare October 22, 2025 14:59

askorikov marked this pull request as draft October 22, 2025 15:05

askorikov added 7 commits October 23, 2025 15:19

fix: type annotations

c8ccfa5

fix: projector types for fanflat geometry and cuda

4729f73

minor: change default dtype and add a note

f8e4c48

fix: ensure compatibility with ASTRA in terms of dtype and contiguity

29cbdec

tests: increase tolerance in astra dottests

0b8d73b

* Probably related to float32 dtype used by ASTRA

ci: add bound on astra version in the requirements

7599a3e

2.2+ is needed for NumPy 2 support, 2.3+ for GPU data exchange support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update ASTRA integration #703

Update ASTRA integration #703

Uh oh!

askorikov commented Oct 16, 2025

Uh oh!

mrava87 commented Oct 18, 2025

Uh oh!

mrava87 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

askorikov commented Oct 24, 2025

Uh oh!

mrava87 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update ASTRA integration #703

Are you sure you want to change the base?

Update ASTRA integration #703

Uh oh!

Conversation

askorikov commented Oct 16, 2025

Uh oh!

mrava87 commented Oct 18, 2025

Uh oh!

mrava87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

askorikov commented Oct 24, 2025

Uh oh!

mrava87 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants