Skip to content

Add Python benchmarks for the new CUDA DSL/JIT #9

@ashvardanian

Description

@ashvardanian

Now that CCCL v3 can be used for efficient parallel reductions in Python it would be great to create an additional benchmark file - reduce_bench.py with Python-ic JIT-ed kernels for parallel reductions, showcasing the impact of different hyper-parameters on the result.

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions