Fix Evoformer compilation by sdvillal · Pull Request #7760 · deepspeedai/DeepSpeed

sdvillal · 2026-01-05T10:26:49Z

EvoformerAttnBuilder has some problems which preclude compiling the extension on several scenarios (e.g., isolated conda environment with cuda toolchain, lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities.

Changes

Fix evoformer CUTLASS detection:
- Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain)
- Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as these bindings are not maintained anymore
Fix evoformer compilation with no GPU is present:
- this is taken care correctly and more generally by builder.compute_capability_args
- allow for cross-compilation in systems without GPU
- allows for compilation against all available virtual architectures and binary outputs
- see e.g., [REQUEST] build prebuilt wheels #5308
Make all these changes configurable and explicit through documented environment variables

Tested in all scenarios.

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

…ystems Signed-off-by: Santi Villalba <sdvillal@gmail.com>

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

sdvillal · 2026-01-17T19:08:48Z

Tested and ready for review @loadams @tjruwase

See: deepspeedai/DeepSpeed#7760

tohtana · 2026-01-17T22:05:03Z

Hi @sdvillal
Thank you for your contribution! This looks good to me. Can you just fix the format? It seems I don't have write access to your branch.

By the way, test_DS4Sci_EvoformerAttention shows some mismatch. This is unrelated to your change, but have you experieced it?

Test	Status
test_DS4Sci_EvoformerAttention[tensor_shape0-dtype1] (bfloat16, 256×256)	PASSED
test_DS4Sci_EvoformerAttention[tensor_shape1-dtype0] (float16, 512×256)	PASSED
test_DS4Sci_EvoformerAttention[tensor_shape1-dtype1] (bfloat16, 512×256)	FAILED

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

sdvillal · 2026-01-18T08:36:12Z

Thanks a lot for the quick review and merge @tohtana!

I have fixed formatting (sorry about it, one should read the contributing guidelines before contributing...).

I have not personally experienced the mismatch. We have been running on:

A100 GPUs, for which the extension was developed for (a quick peek online [1, 2] seems to indicate this is a Hopper problem?)
only very recently we are also using B300s, where we have not yet experienced troubles
only using modern/latest dependencies (particularly, I feel maybe relevant, cutlass)

I could try to run the test a few times in this context and see if it happens for me, could that info be useful?

In any case, I feel the extension is showing its age and it might require some love to these GEMMs and generally to make it worthwhile to use on Hopper and newer.

`EvoformerAttnBuilder` has some problems which preclude compiling the extension on several scenarios (e.g., [isolated conda environment with cuda toolchain](aqlaboratory/openfold-3#34), lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities. *Changes* - Fix evoformer CUTLASS detection: - Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain) - Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as [these bindings are not maintained anymore](NVIDIA/cutlass#2119) - Fix evoformer compilation with no GPU is present: - this is taken care correctly and more generally by builder.compute_capability_args - allow for cross-compilation in systems without GPU - allows for compilation against all available virtual architectures and binary outputs - see e.g., #5308 - Make all these changes configurable and explicit through documented environment variables Tested in all scenarios. --------- Signed-off-by: Santi Villalba <sdvillal@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>

tohtana · 2026-01-20T07:29:57Z

@sdvillal Yes, I encountered the issue with H100. The test doesn't throw an error with L40S on our CI.
At the moment we don't really have the bandwidth/expertise on our side to chase down and fix an H100/Hopper-specific issue, so I'd suggest we leave it as-is for now.

`EvoformerAttnBuilder` has some problems which preclude compiling the extension on several scenarios (e.g., [isolated conda environment with cuda toolchain](aqlaboratory/openfold-3#34), lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities. *Changes* - Fix evoformer CUTLASS detection: - Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain) - Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as [these bindings are not maintained anymore](NVIDIA/cutlass#2119) - Fix evoformer compilation with no GPU is present: - this is taken care correctly and more generally by builder.compute_capability_args - allow for cross-compilation in systems without GPU - allows for compilation against all available virtual architectures and binary outputs - see e.g., deepspeedai#5308 - Make all these changes configurable and explicit through documented environment variables Tested in all scenarios. --------- Signed-off-by: Santi Villalba <sdvillal@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Signed-off-by: Phalani Paladugu <mailofphalani@gmail.com>

`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes deepspeedai#7760

Flamefire · 2026-02-20T11:30:25Z

After this it doesn't work at all for me anymore:

One issue without this PR is with e.g. test_DS4Sci_EvoformerAttention[tensor_shape0-dtype0] on A100: Depending on the TORCH_CUDA_ARCHLIST used at build time (with DS_BUILD_OPS) it fails with a cudaErrorInvalidDeviceFunction:

TORCH_CUDA_ARCHLIST=7.0;8.0 works
TORCH_CUDA_ARCHLIST=8.0;7.0 fails

We want to use the same installation also for older GPUs and this failure is very unexpected as the order shouldn't matter.

With this PR it doesn't compile anymore because the include_paths function is supposed to return a list of strings not Path instances. self.strip_empty_entries then fails calling len on it.

How did that work for you? I don't see how it could ever succeed.

Proposed fix for that in #7862

`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes deepspeedai#7760 Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de>

sdvillal · 2026-02-20T14:22:21Z

Thanks for the report and the fix @Flamefire, and apologies that this broke compilation for manually defined paths - I believe testing this case likely slipped my attention as I was getting the PR ready for review and it is not being tested by the CIs? I will give a hand with the PR, I suggest we propose a test to avoid this happening again.

Regarding the order issue, do you think it is also related to the changes in this PR?

Flamefire · 2026-02-20T14:27:57Z

Thanks for the report and the fix @Flamefire, and apologies that this broke compilation for manually defined paths - I believe testing this case likely slipped my attention as I was getting the PR ready for review and it is not being tested by the CIs? I will give a hand with the PR, I suggest we propose a test to avoid this happening again.

The test seems to not be run on CI and nothing on CI seems to build this kernel/op at all.
There is a CI workflow that can manually be triggered that would run this test but it has been excluded recently

Regarding the order issue, do you think it is also related to the changes in this PR?

No, I noticed the issue since at least 0.14.5, way before this PR. But it persists on current master

sdvillal · 2026-02-20T14:36:24Z

As a workaround, would pointing to the relevant include dirs (e.g., using CPATH) and removing deepspeed specific configuration (i.e., setting CUTLASS_PATH=DS_IGNORE_CUTLASS_DETECTION) work for you?

Flamefire · 2026-02-20T14:46:45Z

Workaround for what exactly? This Path issue?
Then yes I guess setting CPATH and then CUTLASS_PATH=DS_IGNORE_CUTLASS_DETECTION could avoid triggering returning of Path instances

Doesn't help for the arch issues of course

sdvillal · 2026-02-20T14:56:44Z

I would recommend opening a separate issue for the ordering problem, if there is no one already, as it is unrelated to the changes in this PR.

Flamefire · 2026-02-20T17:11:32Z

Done: #7863

@sdvillal

`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes regression introduced in #7760 cc @sdvillal Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de>

sdvillal force-pushed the improve-evoformer-compilation branch from 50e371a to 643cac1 Compare January 5, 2026 10:29

sdvillal changed the title ~~Fix Evoformer compilation when no GPU is present~~ Fix Evoformer compilation Jan 5, 2026

sdvillal force-pushed the improve-evoformer-compilation branch from 0fd060b to 1e71bdb Compare January 5, 2026 11:04

sdvillal force-pushed the improve-evoformer-compilation branch 2 times, most recently from 4ea96ed to 8c9ef4c Compare January 17, 2026 14:19

sdvillal marked this pull request as ready for review January 17, 2026 14:32

sdvillal requested review from loadams and tjruwase as code owners January 17, 2026 14:32

sdvillal marked this pull request as draft January 17, 2026 15:29

sdvillal force-pushed the improve-evoformer-compilation branch from 9c049d9 to a9701c2 Compare January 17, 2026 19:04

sdvillal added 6 commits January 17, 2026 20:05

Fix broken CUTLASS version comparison

afd9af6

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

Make CUTLASS detection options configurable and explicit

026303e

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

Make CUDA and GPU checks optional, to enable compilation on non-GPU s…

2ec60ae

…ystems Signed-off-by: Santi Villalba <sdvillal@gmail.com>

Make GPU architecture detection optional for evoformer

6efa69d

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

Use double quotes consistently

1d8012b

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

Improve comments for target GPU_ARCH

f0c7b42

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

sdvillal force-pushed the improve-evoformer-compilation branch from a9701c2 to f0c7b42 Compare January 17, 2026 19:05

sdvillal marked this pull request as ready for review January 17, 2026 19:07

Merge branch 'master' into improve-evoformer-compilation

b271273

sdvillal added a commit to sdvillal/openfold-3 that referenced this pull request Jan 17, 2026

Vendor improved deepspeed builder from upstream PR

946e316

See: deepspeedai/DeepSpeed#7760

Fix formatting using yapf

b35ff26

Signed-off-by: Santi Villalba <sdvillal@gmail.com>

tohtana enabled auto-merge (squash) January 18, 2026 08:10

tohtana approved these changes Jan 18, 2026

View reviewed changes

tohtana merged commit 43125a7 into deepspeedai:master Jan 18, 2026
12 checks passed

sdvillal mentioned this pull request Jan 18, 2026

Modernize conda environment aqlaboratory/openfold-3#34

Draft

sdvillal mentioned this pull request Jan 31, 2026

Deepspeed v0.18.5, disable osx-64 builds, add sdvillal as maintainer conda-forge/deepspeed-feedstock#109

Merged

5 tasks

Flamefire mentioned this pull request Feb 20, 2026

Fix compilation of Evoformer #7862

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix Evoformer compilation#7760

Fix Evoformer compilation#7760
tohtana merged 8 commits intodeepspeedai:masterfrom
sdvillal:improve-evoformer-compilation

sdvillal commented Jan 5, 2026 •

edited

Loading

Uh oh!

sdvillal commented Jan 17, 2026

Uh oh!

tohtana commented Jan 17, 2026

Uh oh!

Uh oh!

sdvillal commented Jan 18, 2026

Uh oh!

tohtana commented Jan 20, 2026

Uh oh!

Flamefire commented Feb 20, 2026

Uh oh!

sdvillal commented Feb 20, 2026

Uh oh!

Flamefire commented Feb 20, 2026

Uh oh!

sdvillal commented Feb 20, 2026 •

edited

Loading

Uh oh!

Flamefire commented Feb 20, 2026

Uh oh!

sdvillal commented Feb 20, 2026

Uh oh!

Flamefire commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

sdvillal commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdvillal commented Jan 17, 2026

Uh oh!

tohtana commented Jan 17, 2026

Uh oh!

Uh oh!

sdvillal commented Jan 18, 2026

Uh oh!

tohtana commented Jan 20, 2026

Uh oh!

Flamefire commented Feb 20, 2026

Uh oh!

sdvillal commented Feb 20, 2026

Uh oh!

Flamefire commented Feb 20, 2026

Uh oh!

sdvillal commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Flamefire commented Feb 20, 2026

Uh oh!

sdvillal commented Feb 20, 2026

Uh oh!

Flamefire commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sdvillal commented Jan 5, 2026 •

edited

Loading

sdvillal commented Feb 20, 2026 •

edited

Loading