Skip to content

Conversation

@sdvillal
Copy link
Contributor

@sdvillal sdvillal commented Jan 5, 2026

EvoformerAttnBuilder has some problems which preclude compiling the extension on several scenarios (e.g., isolated conda environment with cuda toolchain, lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities.

Changes

  • Fix evoformer CUTLASS detection:

    • Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain)
    • Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as these bindings are not maintained anymore
  • Fix evoformer compilation with no GPU is present:

    • this is taken care correctly and more generally by builder.compute_capability_args
    • allow for cross-compilation in systems without GPU
    • allows for compilation against all available virtual architectures and binary outputs
    • see e.g., [REQUEST] build prebuilt wheels #5308
  • Make all these changes configurable and explicit through documented environment variables

Tested in all scenarios.

@sdvillal sdvillal force-pushed the improve-evoformer-compilation branch from 50e371a to 643cac1 Compare January 5, 2026 10:29
@sdvillal sdvillal changed the title Fix Evoformer compilation when no GPU is present Fix Evoformer compilation Jan 5, 2026
@sdvillal sdvillal force-pushed the improve-evoformer-compilation branch from 0fd060b to 1e71bdb Compare January 5, 2026 11:04
@sdvillal sdvillal force-pushed the improve-evoformer-compilation branch 2 times, most recently from 4ea96ed to 8c9ef4c Compare January 17, 2026 14:19
@sdvillal sdvillal marked this pull request as ready for review January 17, 2026 14:32
@sdvillal sdvillal marked this pull request as draft January 17, 2026 15:29
@sdvillal sdvillal force-pushed the improve-evoformer-compilation branch from 9c049d9 to a9701c2 Compare January 17, 2026 19:04
@sdvillal sdvillal force-pushed the improve-evoformer-compilation branch from a9701c2 to f0c7b42 Compare January 17, 2026 19:05
@sdvillal sdvillal marked this pull request as ready for review January 17, 2026 19:07
@sdvillal
Copy link
Contributor Author

Tested and ready for review @loadams @tjruwase

sdvillal added a commit to sdvillal/openfold-3 that referenced this pull request Jan 17, 2026
@tohtana
Copy link
Collaborator

tohtana commented Jan 17, 2026

Hi @sdvillal
Thank you for your contribution! This looks good to me. Can you just fix the format? It seems I don't have write access to your branch.

By the way, test_DS4Sci_EvoformerAttention shows some mismatch. This is unrelated to your change, but have you experieced it?

Test Status
test_DS4Sci_EvoformerAttention[tensor_shape0-dtype1] (bfloat16, 256×256) PASSED
test_DS4Sci_EvoformerAttention[tensor_shape1-dtype0] (float16, 512×256) PASSED
test_DS4Sci_EvoformerAttention[tensor_shape1-dtype1] (bfloat16, 512×256) FAILED

Signed-off-by: Santi Villalba <[email protected]>
@tohtana tohtana enabled auto-merge (squash) January 18, 2026 08:10
@tohtana tohtana merged commit 43125a7 into deepspeedai:master Jan 18, 2026
12 checks passed
@sdvillal
Copy link
Contributor Author

Thanks a lot for the quick review and merge @tohtana!

I have fixed formatting (sorry about it, one should read the contributing guidelines before contributing...).

I have not personally experienced the mismatch. We have been running on:

  • A100 GPUs, for which the extension was developed for (a quick peek online [1, 2] seems to indicate this is a Hopper problem?)
  • only very recently we are also using B300s, where we have not yet experienced troubles
  • only using modern/latest dependencies (particularly, I feel maybe relevant, cutlass)

I could try to run the test a few times in this context and see if it happens for me, could that info be useful?

In any case, I feel the extension is showing its age and it might require some love to these GEMMs and generally to make it worthwhile to use on Hopper and newer.

tohtana added a commit that referenced this pull request Jan 18, 2026
`EvoformerAttnBuilder` has some problems which preclude compiling the
extension on several scenarios (e.g., [isolated conda environment with
cuda toolchain](aqlaboratory/openfold-3#34),
lack of hardware in the system) and breaks some standard DeepSpeed
configuration of target capabilities.

*Changes*

  - Fix evoformer CUTLASS detection:
- Allow to skip it, useful when CUTLASS is already correctly setup
(e.g., in a conda environment with CUTLASS and the CUDA toolchain)
- Fix misleading use of deprecated nvidia-cutlass pypi package by
actually using the provided bindings but discouraging this route as
[these bindings are not maintained
anymore](NVIDIA/cutlass#2119)

  - Fix evoformer compilation with no GPU is present:
- this is taken care correctly and more generally by
builder.compute_capability_args
    - allow for cross-compilation in systems without GPU
- allows for compilation against all available virtual architectures and
binary outputs
    - see e.g., #5308

- Make all these changes configurable and explicit through documented
environment variables

Tested in all scenarios.

---------

Signed-off-by: Santi Villalba <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
@tohtana
Copy link
Collaborator

tohtana commented Jan 20, 2026

@sdvillal Yes, I encountered the issue with H100. The test doesn't throw an error with L40S on our CI.
At the moment we don't really have the bandwidth/expertise on our side to chase down and fix an H100/Hopper-specific issue, so I'd suggest we leave it as-is for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants