Skip to content

Conversation

ChrisRackauckas
Copy link
Member

@ChrisRackauckas ChrisRackauckas commented Aug 9, 2025

FluxML/Zygote.jl#1583 identified that some of the CUDA.jl minimum versions are incorrect. In particular, it's possible to get versions of GPUCompiler that don't match the requirements of the package.

@ChrisRackauckas
Copy link
Member Author

The Downgrade.yml doesn't work quite right because, well, of course this requires CUDA 😅 But this seems to be the right set of package versions. I can at least isolate this to be an improvement and downgrade CI could come later?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 2f847f3 Previous: c05359d Ratio
latency/precompile 43009370470.5 ns 42922336650.5 ns 1.00
latency/ttfp 7011870092 ns 7015168424 ns 1.00
latency/import 3572091762 ns 3571269514 ns 1.00
integration/volumerhs 9627594 ns 9608723 ns 1.00
integration/byval/slices=1 147087 ns 146920.5 ns 1.00
integration/byval/slices=3 426017 ns 425845 ns 1.00
integration/byval/reference 145114 ns 145020 ns 1.00
integration/byval/slices=2 286435 ns 286380 ns 1.00
integration/cudadevrt 103547 ns 103554 ns 1.00
kernel/indexing 14145 ns 14235 ns 0.99
kernel/indexing_checked 14764 ns 14711 ns 1.00
kernel/occupancy 671.3630573248407 ns 672.5506329113924 ns 1.00
kernel/launch 2125.2 ns 2270.3333333333335 ns 0.94
kernel/rand 17437 ns 14669 ns 1.19
array/reverse/1d 19771 ns 19682 ns 1.00
array/reverse/2d 24668 ns 23613.5 ns 1.04
array/reverse/1d_inplace 10153.333333333334 ns 10461 ns 0.97
array/reverse/2d_inplace 12029 ns 13212 ns 0.91
array/copy 20668 ns 20972 ns 0.99
array/iteration/findall/int 157264 ns 157808 ns 1.00
array/iteration/findall/bool 139314.5 ns 139837 ns 1.00
array/iteration/findfirst/int 161899.5 ns 164937 ns 0.98
array/iteration/findfirst/bool 163430.5 ns 165868 ns 0.99
array/iteration/scalar 70556 ns 73041 ns 0.97
array/iteration/logical 212221.5 ns 214850 ns 0.99
array/iteration/findmin/1d 46278 ns 46704 ns 0.99
array/iteration/findmin/2d 96586 ns 96962.5 ns 1.00
array/reductions/reduce/Int64/1d 44092.5 ns 46033 ns 0.96
array/reductions/reduce/Int64/dims=1 55520.5 ns 55193 ns 1.01
array/reductions/reduce/Int64/dims=2 62743.5 ns 62917 ns 1.00
array/reductions/reduce/Int64/dims=1L 89021 ns 88869 ns 1.00
array/reductions/reduce/Int64/dims=2L 88699 ns 87079 ns 1.02
array/reductions/reduce/Float32/1d 35221 ns 34606 ns 1.02
array/reductions/reduce/Float32/dims=1 51495 ns 43875 ns 1.17
array/reductions/reduce/Float32/dims=2 59721.5 ns 59705 ns 1.00
array/reductions/reduce/Float32/dims=1L 52335 ns 52260 ns 1.00
array/reductions/reduce/Float32/dims=2L 70346 ns 70051.5 ns 1.00
array/reductions/mapreduce/Int64/1d 44753 ns 42671.5 ns 1.05
array/reductions/mapreduce/Int64/dims=1 48290 ns 45980 ns 1.05
array/reductions/mapreduce/Int64/dims=2 62460.5 ns 62143.5 ns 1.01
array/reductions/mapreduce/Int64/dims=1L 89024 ns 88812 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 87076.5 ns 86818 ns 1.00
array/reductions/mapreduce/Float32/1d 35034 ns 34742 ns 1.01
array/reductions/mapreduce/Float32/dims=1 41996 ns 43090.5 ns 0.97
array/reductions/mapreduce/Float32/dims=2 59859 ns 60061 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52524 ns 52528 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 70104 ns 70191 ns 1.00
array/broadcast 20048 ns 20155 ns 0.99
array/copyto!/gpu_to_gpu 12892 ns 11294 ns 1.14
array/copyto!/cpu_to_gpu 217203 ns 216503 ns 1.00
array/copyto!/gpu_to_cpu 285399 ns 284237 ns 1.00
array/accumulate/Int64/1d 125289.5 ns 125529 ns 1.00
array/accumulate/Int64/dims=1 83781 ns 84037 ns 1.00
array/accumulate/Int64/dims=2 158173.5 ns 159166 ns 0.99
array/accumulate/Int64/dims=1L 1711336.5 ns 1720376 ns 0.99
array/accumulate/Int64/dims=2L 966771.5 ns 968348 ns 1.00
array/accumulate/Float32/1d 109881 ns 109984 ns 1.00
array/accumulate/Float32/dims=1 80990.5 ns 81082 ns 1.00
array/accumulate/Float32/dims=2 147723.5 ns 148760 ns 0.99
array/accumulate/Float32/dims=1L 1619738 ns 1629307.5 ns 0.99
array/accumulate/Float32/dims=2L 698792 ns 701479 ns 1.00
array/construct 1277.6 ns 1287.2 ns 0.99
array/random/randn/Float32 44953.5 ns 44176 ns 1.02
array/random/randn!/Float32 25288.5 ns 24930 ns 1.01
array/random/rand!/Int64 27570 ns 27547 ns 1.00
array/random/rand!/Float32 8640.333333333334 ns 8724.666666666666 ns 0.99
array/random/rand/Int64 30128 ns 30114 ns 1.00
array/random/rand/Float32 13079 ns 13059 ns 1.00
array/permutedims/4d 60556 ns 60761 ns 1.00
array/permutedims/2d 54347 ns 54037 ns 1.01
array/permutedims/3d 55391 ns 54954 ns 1.01
array/sorting/1d 2759605 ns 2756544 ns 1.00
array/sorting/by 3347300.5 ns 3343249 ns 1.00
array/sorting/2d 1082195.5 ns 1080799 ns 1.00
cuda/synchronization/stream/auto 1017 ns 1040.3 ns 0.98
cuda/synchronization/stream/nonblocking 7719 ns 7220 ns 1.07
cuda/synchronization/stream/blocking 829.4268292682926 ns 802.3333333333334 ns 1.03
cuda/synchronization/context/auto 1165.7 ns 1203.5 ns 0.97
cuda/synchronization/context/nonblocking 7199.1 ns 7276.700000000001 ns 0.99
cuda/synchronization/context/blocking 900.1730769230769 ns 900.4347826086956 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member

maleadt commented Aug 11, 2025

I don't see a Downgrade.yml being added here? And I'm not sure bluntly bumping all compats to the latest version is the way to go here; that'll just arbitrarily restrict compatibility with users' environments. What was the specific issue you ran into? I don't see anything being reported in FluxML/Zygote.jl#1583.

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Aug 11, 2025

The last commit deleted it because you cannot pass due to the machine not having CUDA ff19390.

Why do you call this "bluntly"? Before this PR you cannot using CUDA with minimum versions, it will fail as shown in the CI logs. Before:

https://github.com/JuliaGPU/CUDA.jl/actions/runs/16853250942/job/47742615017

After:

https://github.com/JuliaGPU/CUDA.jl/actions/runs/16853423430/job/47743015124

Can you be more specific on which minimum versions you think I got wrong? Because according to CI the Project.toml previously allowed minimum versions that would lead to an unusable package and CI says that this now picks versions for which that is not possible. Is there a specific package you think was pushed to far up?

@maleadt
Copy link
Member

maleadt commented Aug 11, 2025

Why do you call this "bluntly"? Before this PR you cannot using CUDA with minimum versions, it will fail as shown in the CI logs.

I didn't say all the minimum versions were correct (even though I expected them to mostly be), but simply removing them all is pretty blunt, no?

Can you be more specific on which minimum versions you think I got wrong?

Removing all compat versions and bumping them to the latest isn't "wrong", it just unnecessarily restricts installability of the package. So let's flip the question, which existing minimum bound is wrong? We can remove that one.

I do agree that a way to test these would be good; it's a shame https://github.com/julia-actions/julia-downgrade-compat is GH:A specific.

@ChrisRackauckas
Copy link
Member Author

Starting by cutting anything out from earlier than 6 months ago and then refining it based on CI results was a heuristic that got to a resolvable set that was still fairly conservative in many respects. Do you have a better system? I think this set is pretty clearly better because it resolves unlike the previous version. Even if we did get a script that could fully brute force it, I don't think we have the compute resources to fully brute force it right now.

@maleadt
Copy link
Member

maleadt commented Aug 11, 2025

Do you have a better system?

Yes, use Resolver.jl, just like the GH:A plug-in does. That gives me:

⌃ [621f4979] AbstractFFTs v0.4.0
⌃ [79e6a3ab] Adapt v4.0.0
⌃ [ab4f0b2a] BFloat16s v0.4.0
⌃ [fa961155] CEnum v0.2.0
  [1af6417a] CUDA_Runtime_Discovery v1.0.0
⌃ [a8cc5b0e] Crayons v4.0.0
⌃ [a93c6f00] DataFrames v1.4.0
⌃ [e2ba6199] ExprTools v0.1.0
⌃ [0c68f7d7] GPUArrays v11.2.1
⌃ [61eb1bfa] GPUCompiler v0.27.1
  [096a3bc2] GPUToolbox v0.3.0
  [63c18a36] KernelAbstractions v0.9.38
⌃ [929cbde3] LLVM v9.1.0
  [8b046642] LLVMLoopInfo v1.0.0
⌃ [5da4648a] NVTX v1.0.0
⌃ [21216c6a] Preferences v1.4.0
⌃ [08abe8d2] PrettyTables v2.1.0
⌃ [74087812] Random123 v1.2.0
⌃ [e6cf234a] RandomNumbers v1.5.3
⌃ [189a3867] Reexport v1.0.0
⌃ [ae029012] Requires v1.3.0
⌃ [90137ffa] StaticArrays v1.0.0
⌃ [10745b16] Statistics v1.10.0
  [d1e2174e] CUDA_Compiler_jll v0.2.0+0
  [4ee394cb] CUDA_Driver_jll v13.0.0+0
  [76a88914] CUDA_Runtime_jll v0.19.0+0
  [1e29f10c] demumble_jll v1.3.0+0
  [4af54fe1] LazyArtifacts
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [de0858da] Printf
  [9a3f8284] Random
  [2f01184e] SparseArrays v1.10.0

@maleadt
Copy link
Member

maleadt commented Aug 11, 2025

I've created a new PR with a CI action as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants