Fix Evoformer compilation#7760
Conversation
50e371a to
643cac1
Compare
0fd060b to
1e71bdb
Compare
4ea96ed to
8c9ef4c
Compare
9c049d9 to
a9701c2
Compare
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
…ystems Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
a9701c2 to
f0c7b42
Compare
|
Hi @sdvillal By the way,
|
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
|
Thanks a lot for the quick review and merge @tohtana! I have fixed formatting (sorry about it, one should read the contributing guidelines before contributing...). I have not personally experienced the mismatch. We have been running on:
I could try to run the test a few times in this context and see if it happens for me, could that info be useful? In any case, I feel the extension is showing its age and it might require some love to these GEMMs and generally to make it worthwhile to use on Hopper and newer. |
`EvoformerAttnBuilder` has some problems which preclude compiling the extension on several scenarios (e.g., [isolated conda environment with cuda toolchain](aqlaboratory/openfold-3#34), lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities. *Changes* - Fix evoformer CUTLASS detection: - Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain) - Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as [these bindings are not maintained anymore](NVIDIA/cutlass#2119) - Fix evoformer compilation with no GPU is present: - this is taken care correctly and more generally by builder.compute_capability_args - allow for cross-compilation in systems without GPU - allows for compilation against all available virtual architectures and binary outputs - see e.g., #5308 - Make all these changes configurable and explicit through documented environment variables Tested in all scenarios. --------- Signed-off-by: Santi Villalba <sdvillal@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
|
@sdvillal Yes, I encountered the issue with H100. The test doesn't throw an error with L40S on our CI. |
`EvoformerAttnBuilder` has some problems which preclude compiling the extension on several scenarios (e.g., [isolated conda environment with cuda toolchain](aqlaboratory/openfold-3#34), lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities. *Changes* - Fix evoformer CUTLASS detection: - Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain) - Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as [these bindings are not maintained anymore](NVIDIA/cutlass#2119) - Fix evoformer compilation with no GPU is present: - this is taken care correctly and more generally by builder.compute_capability_args - allow for cross-compilation in systems without GPU - allows for compilation against all available virtual architectures and binary outputs - see e.g., deepspeedai#5308 - Make all these changes configurable and explicit through documented environment variables Tested in all scenarios. --------- Signed-off-by: Santi Villalba <sdvillal@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Signed-off-by: Phalani Paladugu <mailofphalani@gmail.com>
`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes deepspeedai#7760
|
After this it doesn't work at all for me anymore: One issue without this PR is with e.g.
We want to use the same installation also for older GPUs and this failure is very unexpected as the order shouldn't matter. With this PR it doesn't compile anymore because the How did that work for you? I don't see how it could ever succeed. Proposed fix for that in #7862 |
`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes deepspeedai#7760 Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de>
|
Thanks for the report and the fix @Flamefire, and apologies that this broke compilation for manually defined paths - I believe testing this case likely slipped my attention as I was getting the PR ready for review and it is not being tested by the CIs? I will give a hand with the PR, I suggest we propose a test to avoid this happening again. Regarding the order issue, do you think it is also related to the changes in this PR? |
The test seems to not be run on CI and nothing on CI seems to build this kernel/op at all.
No, I noticed the issue since at least 0.14.5, way before this PR. But it persists on current master |
|
As a workaround, would pointing to the relevant include dirs (e.g., using CPATH) and removing deepspeed specific configuration (i.e., setting CUTLASS_PATH=DS_IGNORE_CUTLASS_DETECTION) work for you? |
|
Workaround for what exactly? This Path issue? Doesn't help for the arch issues of course |
|
I would recommend opening a separate issue for the ordering problem, if there is no one already, as it is unrelated to the changes in this PR. |
|
Done: #7863 |
`EvoformerAttnBuilder` returns instances of `Path` from `include_paths` which then cause failures in `OpBuilder.builder` when passing them to `strip_empty_entries` that calls `len` on them which isn't defined for `Path` instances: > TypeError: object of type 'PosixPath' has no len() Fixes regression introduced in #7760 cc @sdvillal Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de>
EvoformerAttnBuilderhas some problems which preclude compiling the extension on several scenarios (e.g., isolated conda environment with cuda toolchain, lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities.Changes
Fix evoformer CUTLASS detection:
Fix evoformer compilation with no GPU is present:
Make all these changes configurable and explicit through documented environment variables
Tested in all scenarios.