Update README.md #2

pbalcer · 2025-02-20T11:02:06Z

No description provided.

github-actions · 2025-02-20T11:02:42Z

Compute Benchmarks level_zero_v2 run (with params: ):
https://github.com/pbalcer/llvm/actions/runs/13433790679

github-actions · 2025-02-20T11:31:48Z

Benchmarks level_zero_v2 run ():
https://github.com/pbalcer/llvm/actions/runs/13433790679
Job status: success. Test status: success.

Summary

(Emphasized values are the best results)
No diffs to calculate performance change

Performance change in benchmark groups

Compute Benchmarks

Relative perf in group SubmitKernel (7)

Benchmark	This PR
api_overhead_benchmark_l0 SubmitKernel out of order	12.167000 μs
api_overhead_benchmark_l0 SubmitKernel in order	11.596000 μs
api_overhead_benchmark_sycl SubmitKernel out of order	20.954000 μs
api_overhead_benchmark_sycl SubmitKernel in order	21.729000 μs
api_overhead_benchmark_ur SubmitKernel out of order	13.270000 μs
api_overhead_benchmark_ur SubmitKernel in order	13.362000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion	20.528000 μs

Relative perf in group Other (17)

Benchmark	This PR
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	203.116000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	87.775000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.074000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	2.848000 GB/s
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	1.859000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.351000 μs
miscellaneous_benchmark_sycl VectorSum	860.370000 bw GB/s
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	5445.824000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	13828.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	17295.624000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	793.706000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	5946.110000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	6465.155000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	17336.962000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	800.720000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	28703.305000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	89292.465000 μs

Relative perf in group SinKernelGraph (4)

Benchmark	This PR
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10	72022.923000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10	69351.378000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100	356598.780000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100	351253.763000 μs

Relative perf in group SubmitGraph (1)

Benchmark	This PR
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100	664.076000 μs

Relative perf in group ExecGraph (1)

Benchmark	This PR
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100	56302.234000 μs

Relative perf in group SubmitKernel CPU count (3)

Benchmark	This PR
api_overhead_benchmark_ur SubmitKernel out of order CPU count	91838.000000 instr
api_overhead_benchmark_ur SubmitKernel in order CPU count	91838.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	94321.000000 instr

Velocity Bench

Relative perf in group Other (5)

Benchmark	This PR
Velocity-Bench Hashtable	363.114205 M keys/sec
Velocity-Bench Bitcracker	35.454400 s
Velocity-Bench CudaSift	201.646000 ms
Velocity-Bench QuickSilver	116.820000 MMS/CTT
Velocity-Bench Sobel Filter	608.306000 ms

SYCL-Bench

Relative perf in group Other (53)

Benchmark	This PR
Runtime_IndependentDAGTaskThroughput_SingleTask	169.165000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	173.726000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	176.032000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	172.151000 ms
Runtime_DAGTaskThroughput_SingleTask	1166.386000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	1216.151000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1212.065000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	1182.922000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	5.337000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	5.215000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	5.259000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	5.117000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	4.773000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	4.898000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	4.823000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	5.064000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	5.244000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	5.321000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	5.168000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	5.180000 ms
MicroBench_LocalMem_int32_4096	29.845000 ms
MicroBench_LocalMem_fp32_4096	29.897000 ms
Pattern_Reduction_NDRange_int32	16.548000 ms
Pattern_Reduction_Hierarchical_int32	16.397000 ms
ScalarProduct_NDRange_int32	3.768000 ms
ScalarProduct_NDRange_int64	5.449000 ms
ScalarProduct_NDRange_fp32	3.752000 ms
ScalarProduct_Hierarchical_int32	10.527000 ms
ScalarProduct_Hierarchical_int64	11.523000 ms
ScalarProduct_Hierarchical_fp32	10.163000 ms
Pattern_SegmentedReduction_NDRange_int16	2.262000 ms
Pattern_SegmentedReduction_NDRange_int32	2.161000 ms
Pattern_SegmentedReduction_NDRange_int64	2.338000 ms
Pattern_SegmentedReduction_NDRange_fp32	2.158000 ms
Pattern_SegmentedReduction_Hierarchical_int16	11.808000 ms
Pattern_SegmentedReduction_Hierarchical_int32	11.588000 ms
Pattern_SegmentedReduction_Hierarchical_int64	11.781000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	11.586000 ms
USM_Allocation_latency_fp32_device	0.058000 ms
USM_Allocation_latency_fp32_host	37.282000 ms
USM_Allocation_latency_fp32_shared	0.061000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.478000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	1.019000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.749000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.169000 ms
VectorAddition_int32	1.471000 ms
VectorAddition_int64	3.060000 ms
VectorAddition_fp32	1.460000 ms
Polybench_2mm	1.218000 ms
Polybench_3mm	1.806000 ms
Polybench_Atax	6.845000 ms
Kmeans_fp32	16.052000 ms
MolecularDynamics	0.030000 ms

Details

Benchmark details - environment, command...

api_overhead_benchmark_l0 SubmitKernel out of order

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Velocity-Bench Hashtable

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Velocity-Bench QuickSilver

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Environment Variables:

QS_DEVICE=GPU

Velocity-Bench Sobel Filter

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Runtime_IndependentDAGTaskThroughput_SingleTask

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_DAGTaskThroughput_SingleTask

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_BasicParallelFor

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_NDRangeParallelFor

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous