Skip to content

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Aug 11, 2025

No description provided.

amontoison added a commit to MadNLP/MadNLP.jl that referenced this pull request Aug 11, 2025
CUDA 5.8.3 supports the toolkit v13.0 and they removed all the legacy API of CUSOLVER.
See JuliaGPU/CUDA.jl#2842
amontoison added a commit to MadNLP/MadNLP.jl that referenced this pull request Aug 11, 2025
CUDA 5.8.3 supports the toolkit v13.0 and they removed all the legacy API of CUSOLVER.
See JuliaGPU/CUDA.jl#2842
@amontoison amontoison force-pushed the tb/cuda_13_headers branch 2 times, most recently from a3f8168 to 14d8969 Compare August 16, 2025 05:26
Copy link

codecov bot commented Aug 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.83%. Comparing base (c8c2142) to head (ec7a874).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2842       +/-   ##
===========================================
- Coverage   89.64%   74.83%   -14.82%     
===========================================
  Files         150      150               
  Lines       13229    13143       -86     
===========================================
- Hits        11859     9835     -2024     
- Misses       1370     3308     +1938     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt maleadt marked this pull request as ready for review September 1, 2025 10:51
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 5f95b39 Previous: 756ce54 Ratio
latency/precompile 43831307498.5 ns 43117380449.5 ns 1.02
latency/ttfp 7094033713 ns 7014904934 ns 1.01
latency/import 3642236512 ns 3575439492 ns 1.02
integration/volumerhs 9610433.5 ns 9615802.5 ns 1.00
integration/byval/slices=1 147060 ns 147023 ns 1.00
integration/byval/slices=3 426048 ns 425923 ns 1.00
integration/byval/reference 145057 ns 145240 ns 1.00
integration/byval/slices=2 286519 ns 286571 ns 1.00
integration/cudadevrt 103588 ns 103642 ns 1.00
kernel/indexing 14142 ns 14460 ns 0.98
kernel/indexing_checked 14907 ns 15152 ns 0.98
kernel/occupancy 669.7341772151899 ns 668.3417721518987 ns 1.00
kernel/launch 2207.5555555555557 ns 2221.6666666666665 ns 0.99
kernel/rand 14871 ns 18586 ns 0.80
array/reverse/1d 19901.5 ns 20105 ns 0.99
array/reverse/2d 25094 ns 25222 ns 0.99
array/reverse/1d_inplace 10568 ns 10753 ns 0.98
array/reverse/2d_inplace 12379 ns 12440 ns 1.00
array/copy 20842 ns 21263 ns 0.98
array/iteration/findall/int 157210 ns 157788 ns 1.00
array/iteration/findall/bool 139030.5 ns 139822 ns 0.99
array/iteration/findfirst/int 157488 ns 165033 ns 0.95
array/iteration/findfirst/bool 158053 ns 167860 ns 0.94
array/iteration/scalar 71112 ns 74462 ns 0.96
array/iteration/logical 207092 ns 216613 ns 0.96
array/iteration/findmin/1d 46638 ns 47358 ns 0.98
array/iteration/findmin/2d 96764 ns 97030 ns 1.00
array/reductions/reduce/Int64/1d 45630 ns 44082.5 ns 1.04
array/reductions/reduce/Int64/dims=1 49248.5 ns 49423.5 ns 1.00
array/reductions/reduce/Int64/dims=2 63710.5 ns 62870 ns 1.01
array/reductions/reduce/Int64/dims=1L 89052 ns 89243 ns 1.00
array/reductions/reduce/Int64/dims=2L 89447.5 ns 88709 ns 1.01
array/reductions/reduce/Float32/1d 34252 ns 35342 ns 0.97
array/reductions/reduce/Float32/dims=1 41759.5 ns 52276 ns 0.80
array/reductions/reduce/Float32/dims=2 59534 ns 60280 ns 0.99
array/reductions/reduce/Float32/dims=1L 52407 ns 52693 ns 0.99
array/reductions/reduce/Float32/dims=2L 70314.5 ns 70553.5 ns 1.00
array/reductions/mapreduce/Int64/1d 45363 ns 44275 ns 1.02
array/reductions/mapreduce/Int64/dims=1 50614 ns 52605 ns 0.96
array/reductions/mapreduce/Int64/dims=2 62382.5 ns 61775 ns 1.01
array/reductions/mapreduce/Int64/dims=1L 89024 ns 89085 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 88006 ns 87203 ns 1.01
array/reductions/mapreduce/Float32/1d 34346 ns 35440 ns 0.97
array/reductions/mapreduce/Float32/dims=1 51585.5 ns 42076.5 ns 1.23
array/reductions/mapreduce/Float32/dims=2 59671 ns 60334 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 52869 ns 53110.5 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 70678.5 ns 70458 ns 1.00
array/broadcast 20182 ns 20315 ns 0.99
array/copyto!/gpu_to_gpu 11290 ns 12955 ns 0.87
array/copyto!/cpu_to_gpu 214786 ns 217419 ns 0.99
array/copyto!/gpu_to_cpu 284493 ns 283111 ns 1.00
array/accumulate/Int64/1d 125305.5 ns 125814 ns 1.00
array/accumulate/Int64/dims=1 83877 ns 84926 ns 0.99
array/accumulate/Int64/dims=2 159257 ns 159079.5 ns 1.00
array/accumulate/Int64/dims=1L 1720678 ns 1719956 ns 1.00
array/accumulate/Int64/dims=2L 967972 ns 968078 ns 1.00
array/accumulate/Float32/1d 109759 ns 109898 ns 1.00
array/accumulate/Float32/dims=1 81027 ns 81238 ns 1.00
array/accumulate/Float32/dims=2 148485 ns 148942.5 ns 1.00
array/accumulate/Float32/dims=1L 1629583 ns 1628483 ns 1.00
array/accumulate/Float32/dims=2L 701228 ns 701941 ns 1.00
array/construct 1283.3 ns 1308.95 ns 0.98
array/random/randn/Float32 44309 ns 45219 ns 0.98
array/random/randn!/Float32 24914 ns 25287 ns 0.99
array/random/rand!/Int64 27544 ns 27594 ns 1.00
array/random/rand!/Float32 8810 ns 8898.333333333334 ns 0.99
array/random/rand/Int64 30125 ns 38444 ns 0.78
array/random/rand/Float32 13095 ns 13410 ns 0.98
array/permutedims/4d 60327 ns 60388 ns 1.00
array/permutedims/2d 54178.5 ns 54648 ns 0.99
array/permutedims/3d 54798 ns 55774.5 ns 0.98
array/sorting/1d 2756295 ns 2757370 ns 1.00
array/sorting/by 3342522 ns 3343262.5 ns 1.00
array/sorting/2d 1080089 ns 1080991 ns 1.00
cuda/synchronization/stream/auto 1023.3 ns 1045.4 ns 0.98
cuda/synchronization/stream/nonblocking 8331.4 ns 7545.299999999999 ns 1.10
cuda/synchronization/stream/blocking 805.7765957446809 ns 818.7 ns 0.98
cuda/synchronization/context/auto 1167.4 ns 1196.3 ns 0.98
cuda/synchronization/context/nonblocking 7158.8 ns 7154.1 ns 1.00
cuda/synchronization/context/blocking 891.9629629629629 ns 929.3684210526316 ns 0.96

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member Author

maleadt commented Sep 1, 2025

Passes CI locally, at least mostly. So we can go ahead with this.

@maleadt maleadt merged commit 0f915b0 into master Sep 1, 2025
1 of 3 checks passed
@maleadt maleadt deleted the tb/cuda_13_headers branch September 1, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants