Skip to content

Profiler: relax parsing of color flags in marker data.#3075

Merged
maleadt merged 1 commit intomasterfrom
tb/cupti_color_relax
Apr 1, 2026
Merged

Profiler: relax parsing of color flags in marker data.#3075
maleadt merged 1 commit intomasterfrom
tb/cupti_color_relax

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Apr 1, 2026

Even though the flag enum supports explicitly indicating the color is not set, I guess it's also valid to provide no indication of the color; although it seems weird to me that both should be supported.

In any case, relaxed the parsing because @vchuravy actually reported encountering this:

┌ Error: Unexpected CUPTI marker color flag 0. Please file an issue.
└ @ CUDA.Profile ~/.julia/packages/CUDA/Il00B/src/profile.jl:596

@maleadt maleadt merged commit a79b516 into master Apr 1, 2026
1 of 2 checks passed
@maleadt maleadt deleted the tb/cupti_color_relax branch April 1, 2026 14:07
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.42%. Comparing base (0b05451) to head (29a4d18).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3075   +/-   ##
=======================================
  Coverage   90.42%   90.42%           
=======================================
  Files         141      141           
  Lines       11993    11993           
=======================================
  Hits        10845    10845           
  Misses       1148     1148           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 29a4d18 Previous: 0b05451 Ratio
latency/precompile 4605232371 ns 4612556023 ns 1.00
latency/ttfp 4414186531.5 ns 4408562143 ns 1.00
latency/import 3826664571 ns 3821839641 ns 1.00
integration/volumerhs 9439705 ns 9448669.5 ns 1.00
integration/byval/slices=1 146084 ns 145839 ns 1.00
integration/byval/slices=3 423274 ns 423425 ns 1.00
integration/byval/reference 144230 ns 144031 ns 1.00
integration/byval/slices=2 284858 ns 284559 ns 1.00
integration/cudadevrt 102869 ns 102661 ns 1.00
kernel/indexing 13635 ns 13693 ns 1.00
kernel/indexing_checked 14420.5 ns 14309 ns 1.01
kernel/occupancy 698.658064516129 ns 663.8086419753087 ns 1.05
kernel/launch 2208.4444444444443 ns 2210.6666666666665 ns 1.00
kernel/rand 14756 ns 16005 ns 0.92
array/reverse/1d 18787.5 ns 18568 ns 1.01
array/reverse/2dL_inplace 66216 ns 66179 ns 1.00
array/reverse/1dL 69404 ns 69124 ns 1.00
array/reverse/2d 21174 ns 21081 ns 1.00
array/reverse/1d_inplace 10516.333333333334 ns 8671.333333333334 ns 1.21
array/reverse/2d_inplace 10621.5 ns 10405 ns 1.02
array/reverse/2dL 73531 ns 73273 ns 1.00
array/reverse/1dL_inplace 66197 ns 66158 ns 1.00
array/copy 18997.5 ns 18801 ns 1.01
array/iteration/findall/int 150723.5 ns 149677 ns 1.01
array/iteration/findall/bool 133364.5 ns 132815 ns 1.00
array/iteration/findfirst/int 85154 ns 84077 ns 1.01
array/iteration/findfirst/bool 83075 ns 81834 ns 1.02
array/iteration/scalar 69897 ns 67842 ns 1.03
array/iteration/logical 204503.5 ns 205668 ns 0.99
array/iteration/findmin/1d 89275 ns 88143.5 ns 1.01
array/iteration/findmin/2d 117807 ns 117602 ns 1.00
array/reductions/reduce/Int64/1d 44171 ns 43879 ns 1.01
array/reductions/reduce/Int64/dims=1 42699 ns 42363.5 ns 1.01
array/reductions/reduce/Int64/dims=2 59878 ns 59729 ns 1.00
array/reductions/reduce/Int64/dims=1L 88101 ns 87783 ns 1.00
array/reductions/reduce/Int64/dims=2L 85199 ns 84600.5 ns 1.01
array/reductions/reduce/Float32/1d 36297 ns 35554 ns 1.02
array/reductions/reduce/Float32/dims=1 40431.5 ns 40471.5 ns 1.00
array/reductions/reduce/Float32/dims=2 57268 ns 57348 ns 1.00
array/reductions/reduce/Float32/dims=1L 52335 ns 52155 ns 1.00
array/reductions/reduce/Float32/dims=2L 70371 ns 70051 ns 1.00
array/reductions/mapreduce/Int64/1d 44409 ns 43658 ns 1.02
array/reductions/mapreduce/Int64/dims=1 42847 ns 42850 ns 1.00
array/reductions/mapreduce/Int64/dims=2 60052 ns 60283 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 88214 ns 87856 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 85258 ns 84886 ns 1.00
array/reductions/mapreduce/Float32/1d 36210 ns 35439 ns 1.02
array/reductions/mapreduce/Float32/dims=1 40252.5 ns 49602 ns 0.81
array/reductions/mapreduce/Float32/dims=2 57304 ns 57112 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52092 ns 52066 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69839 ns 70220.5 ns 0.99
array/broadcast 20864 ns 20835 ns 1.00
array/copyto!/gpu_to_gpu 11655 ns 11409 ns 1.02
array/copyto!/cpu_to_gpu 218542 ns 217482 ns 1.00
array/copyto!/gpu_to_cpu 284489 ns 284547 ns 1.00
array/accumulate/Int64/1d 118991 ns 119536 ns 1.00
array/accumulate/Int64/dims=1 80432 ns 80581.5 ns 1.00
array/accumulate/Int64/dims=2 156582 ns 156671 ns 1.00
array/accumulate/Int64/dims=1L 1695513 ns 1706437 ns 0.99
array/accumulate/Int64/dims=2L 962297 ns 962104 ns 1.00
array/accumulate/Float32/1d 102099 ns 102057 ns 1.00
array/accumulate/Float32/dims=1 78136.5 ns 77830 ns 1.00
array/accumulate/Float32/dims=2 144751.5 ns 144868 ns 1.00
array/accumulate/Float32/dims=1L 1587486 ns 1586085 ns 1.00
array/accumulate/Float32/dims=2L 658744 ns 661758 ns 1.00
array/construct 1333.3 ns 1349.1 ns 0.99
array/random/randn/Float32 38775 ns 42622 ns 0.91
array/random/randn!/Float32 31534 ns 31841 ns 0.99
array/random/rand!/Int64 34323 ns 34412 ns 1.00
array/random/rand!/Float32 8716.666666666666 ns 8601 ns 1.01
array/random/rand/Int64 34806 ns 37538 ns 0.93
array/random/rand/Float32 13272.5 ns 13018 ns 1.02
array/permutedims/4d 52392.5 ns 52619.5 ns 1.00
array/permutedims/2d 52667 ns 52940 ns 0.99
array/permutedims/3d 52914 ns 53310 ns 0.99
array/sorting/1d 2736370 ns 2735465.5 ns 1.00
array/sorting/by 3305502.5 ns 3315555 ns 1.00
array/sorting/2d 1070338 ns 1072718 ns 1.00
cuda/synchronization/stream/auto 1059.7 ns 1042.1 ns 1.02
cuda/synchronization/stream/nonblocking 7939 ns 7867.299999999999 ns 1.01
cuda/synchronization/stream/blocking 873.95 ns 828.4299065420561 ns 1.05
cuda/synchronization/context/auto 1214.3 ns 1198.8 ns 1.01
cuda/synchronization/context/nonblocking 7457.1 ns 7295 ns 1.02
cuda/synchronization/context/blocking 929.4324324324324 ns 929.6410256410256 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant