Skip to content

Conversation

@lslusarczyk
Copy link
Contributor

@lslusarczyk lslusarczyk commented Sep 27, 2024

Utilize support for using UR binary through find_package added in intel/compute-benchmarks#21

Before this change compute-benchmarks always builds UR from sources which is unnecessary since we already build UR as part of SYCL on UR CI.

Add installing adapters by "cmake --install" which is needed by new benchmarks code.

@github-actions github-actions bot added ci/cd Continuous integration/devliery loader Loader related feature/bug level-zero L0 adapter specific issues cuda CUDA adapter specific issues hip HIP adapter specific issues native-cpu Native CPU adapter specific issues labels Sep 27, 2024
@lslusarczyk lslusarczyk force-pushed the bench_no_recompile_ur branch 2 times, most recently from e9582f3 to 91cd46a Compare September 27, 2024 22:10
def __init__(self, directory):
self.directory = directory
self.adapter_path = os.path.join(options.ur_dir, 'build', 'lib', f"libur_adapter_{options.ur_adapter_name}.so")
for libs_dir_name in ['lib', 'lib64']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the make install fix is fine, but, in this case, not doing the install makes this deterministic - the file will be located in the lib directory.
Why the change to do a make install?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make 'get lib path' into a generic function?

Copy link
Contributor Author

@lslusarczyk lslusarczyk Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the make install fix is fine, but, in this case, not doing the install makes this deterministic - the file will be located in the lib directory. Why the change to do a make install?

Not doing install will fail benchmark in an assert in the next line with message like could not find adapter file /home/lslusarczyk/src2/ur_install/lib64/libur_adapter_level_zero.so (and in similar lib paths) which looks what I would like to se, that is sample full path which was searched and still being a short message

"make install" used as cleanest way of using binaries and includes of already built UR

can you make 'get lib path' into a generic function?

yes, separate function added

@lslusarczyk lslusarczyk force-pushed the bench_no_recompile_ur branch from 7488aa7 to 053d092 Compare October 2, 2024 11:25
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2024

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11149704094

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2024

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11149704094
Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): cannot calculate
Benchmark This PR Relative perf Change -
api_overhead_benchmark_sycl SubmitKernel out of order 22.516000 μs
api_overhead_benchmark_sycl SubmitKernel in order 25.388000 μs
api_overhead_benchmark_ur SubmitKernel out of order 14.319000 μs
api_overhead_benchmark_ur SubmitKernel in order 15.928000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.403000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.648000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 222.173000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 127.015000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.723000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.186000 μs
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 860.568000 μs
Relative perf in group Velocity-Bench (5): cannot calculate
Benchmark This PR Relative perf Change -
Velocity-Bench Hashtable 360.808426 M keys/sec
Velocity-Bench Bitcracker 35.465800 s
Velocity-Bench CudaSift 221.398000 ms
Velocity-Bench QuickSilver 118.090000 MMS/CTT
Velocity-Bench Sobel Filter 549.611000 ms
Relative perf in group Runtime (24): cannot calculate
Benchmark This PR Relative perf Change -
Runtime_BlockedTransform_iter_64_blocksize_1024 0.278000 ms
Runtime_BlockedTransform_iter_256_blocksize_2048 0.256000 ms
Runtime_BlockedTransform_iter_128_blocksize_8192 0.181000 ms
Runtime_BlockedTransform_iter_512_blocksize_8192 0.227000 ms
Runtime_BlockedTransform_iter_64_blocksize_4096 0.198000 ms
Runtime_BlockedTransform_iter_64_blocksize_2048 0.211000 ms
Runtime_BlockedTransform_iter_256_blocksize_8192 0.187000 ms
Runtime_BlockedTransform_iter_128_blocksize_1024 0.255000 ms
Runtime_BlockedTransform_iter_256_blocksize_4096 0.239000 ms
Runtime_BlockedTransform_iter_512_blocksize_1024 0.406000 ms
Runtime_BlockedTransform_iter_512_blocksize_4096 0.233000 ms
Runtime_BlockedTransform_iter_512_blocksize_2048 0.254000 ms
Runtime_BlockedTransform_iter_256_blocksize_1024 0.298000 ms
Runtime_BlockedTransform_iter_64_blocksize_8192 0.185000 ms
Runtime_BlockedTransform_iter_128_blocksize_4096 0.195000 ms
Runtime_BlockedTransform_iter_128_blocksize_2048 0.211000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 281.654000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 280.001000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 272.398000 ms
Runtime_IndependentDAGTaskThroughput_SingleTask 256.481000 ms
Runtime_DAGTaskThroughput_BasicParallelFor 1706.371000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1684.495000 ms
Runtime_DAGTaskThroughput_SingleTask 1634.206000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor 1649.073000 ms
Relative perf in group MicroBench (15): cannot calculate
Benchmark This PR Relative perf Change -
MicroBench_LocalMem_int32_4096 0.193000 ms
MicroBench_LocalMem_fp32_4096 0.191000 ms
MicroBench_L2_fp32_2 0.010000 ms
MicroBench_L2_fp32_1 0.010000 ms
MicroBench_L2_fp32_16 0.012000 ms
MicroBench_L2_int32_1 0.015000 ms
MicroBench_L2_fp32_8 0.010000 ms
MicroBench_L2_int32_8 0.011000 ms
MicroBench_L2_int32_2 0.011000 ms
MicroBench_L2_fp32_4 0.010000 ms
MicroBench_L2_int32_16 0.012000 ms
MicroBench_L2_int32_4 0.010000 ms
MicroBench_Arith_int32_512 0.041000 ms
MicroBench_Arith_fp32_512 0.020000 ms
MicroBench_sf_fp32_16 0.009000 ms
Relative perf in group Pattern (14): cannot calculate
Benchmark This PR Relative perf Change -
Pattern_Reduction_NDRange_int64 0.022000 ms
Pattern_Reduction_Hierarchical_int64 0.046000 ms
Pattern_Reduction_NDRange_int32 0.026000 ms
Pattern_Reduction_Hierarchical_int32 0.046000 ms
Pattern_Reduction_NDRange_fp32 0.022000 ms
Pattern_Reduction_Hierarchical_fp32 0.046000 ms
Pattern_SegmentedReduction_Hierarchical_int16 0.025000 ms
Pattern_SegmentedReduction_Hierarchical_int32 0.025000 ms
Pattern_SegmentedReduction_Hierarchical_int64 0.025000 ms
Pattern_SegmentedReduction_NDRange_int64 0.015000 ms
Pattern_SegmentedReduction_NDRange_int32 0.015000 ms
Pattern_SegmentedReduction_NDRange_fp32 0.014000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 0.024000 ms
Pattern_SegmentedReduction_NDRange_int16 0.019000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR Relative perf Change -
ScalarProduct_Hierarchical_int32 0.063000 ms
ScalarProduct_Hierarchical_fp32 0.062000 ms
ScalarProduct_NDRange_fp32 0.034000 ms
ScalarProduct_NDRange_int32 0.041000 ms
ScalarProduct_Hierarchical_int64 0.062000 ms
ScalarProduct_NDRange_int64 0.038000 ms
Relative perf in group SYCL2020 (2): cannot calculate
Benchmark This PR Relative perf Change -
SYCL2020_Accessors_Latency_fp32_out_of_order__ 36.924000 ms
SYCL2020_Accessors_Latency_fp32_in_order__ 34.933000 ms
Relative perf in group USM (17): cannot calculate
Benchmark This PR Relative perf Change -
USM_Latency_fp32_in_order__ 16.041000 ms
USM_Latency_fp32_out_of_order__ 24.358000 ms
USM_Allocation_latency_fp32_host 0.001000 ms
USM_Allocation_latency_fp32_device 0.002000 ms
USM_Allocation_latency_fp32_shared 0.030000 ms
USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch 12.076000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.049000 ms
USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch 12.053000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 0.891000 ms
USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch 11.357000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.434000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.592000 ms
USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch 11.685000 ms
USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1 0.168000 ms
USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1 0.012000 ms
USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1 0.004000 ms
USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1 0.003000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR Relative perf Change -
VectorAddition_int64 0.015000 ms
VectorAddition_fp32 0.014000 ms
VectorAddition_int32 0.015000 ms
Relative perf in group Polybench (12): cannot calculate
Benchmark This PR Relative perf Change -
Polybench_2DConvolution 0.196000 ms
Polybench_2mm 1.223000 ms
Polybench_3mm 1.739000 ms
Polybench_Atax 6.818000 ms
Polybench_Bicg 14.338000 ms
Polybench_Correlation 3009.203000 ms
Polybench_Covariance 2997.658000 ms
Polybench_Gesummv 7.247000 ms
Polybench_Gramschmidt 284.794000 ms
Polybench_Mvt 40.178000 ms
Polybench_Syr2k 4221.023000 ms
Polybench_Syrk 209.786000 ms
Relative perf in group ReductionAtomic (4): cannot calculate
Benchmark This PR Relative perf Change -
ReductionAtomic_int32 0.013000 ms
ReductionAtomic_fp64 0.021000 ms
ReductionAtomic_int64 0.013000 ms
ReductionAtomic_fp32 0.020000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR Relative perf Change -
Kmeans_fp32 16.169000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR Relative perf Change -
LinearRegressionCoeff_fp32 966.703000 ms
Relative perf in group LinearRegression (1): cannot calculate
Benchmark This PR Relative perf Change -
LinearRegression_fp32 408.148000 ms
Relative perf in group MatmulChain (1): cannot calculate
Benchmark This PR Relative perf Change -
MatmulChain 85.311000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR Relative perf Change -
MolecularDynamics 0.028000 ms

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),22.516,22.489,3.29%,21.845,233.033,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.388,25.416,3.92%,22.137,105.543,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.319,14.228,3.62%,13.705,76.650,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),15.928,16.038,7.68%,12.781,255.103,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),222.173,222.084,1.12%,218.021,438.347,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),127.015,117.477,27.57%,113.667,305.394,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.723,5.547,13.54%,5.093,50.229,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.186,3.194,3.13%,0.421,3.410,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.403,2.435,7.78%,1.956,10.595,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.648,1.643,3.59%,1.554,5.748,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),860.568,861.253,0.46%,819.734,867.488,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.371992 s
360.808426 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00451105 s
bitcracker - total time for whole calculation: 35.4658 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1258 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1259 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1251 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1270 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1271 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1266 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1170 1259 31.7676% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1266 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1267 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1210 1263 32.8537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1202 1262 32.6364% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1256 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1206 1258 32.745% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1171 1244 31.7947% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1089 1265 29.5683% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1249 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1256 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1272 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1123 1261 30.4914% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1254 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1272 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1268 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1261 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1272 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1256 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1254 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1074 1270 29.161% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1260 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1265 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1256 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1207 1268 32.7722% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1206 1258 32.745% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1134 1271 30.7901% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1205 1261 32.7179% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1268 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1117 1263 30.3285% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1263 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1261 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1275 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1203 1273 32.6636% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1266 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1271 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1143 1253 31.0345% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1261 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1203 1262 32.6636% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1261 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1260 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1256 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 221.398 ms

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.321660e-01 6.168240e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.651260e-01 7.512570e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.364410e-01 7.652690e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.669210e-01 8.297590e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.602370e-01 7.975150e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.380780e-01 7.670290e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.315490e-01 7.673950e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.323860e-01 7.871690e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.318600e-01 7.855490e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.323590e-01 7.606550e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.116e+07 1.116e+07 1.116e+07 0.000e+00 100.00
cycleInit 10 3.527e+06 3.527e+06 3.527e+06 0.000e+00 100.00
cycleTracking 10 7.628e+06 7.628e+06 7.628e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.942e+06 4.942e+06 4.942e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.217e+05 2.217e+05 2.217e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.100e+02 4.100e+02 4.100e+02 0.000e+00 100.00
Figure Of Merit 118.09 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.43037 s
sobelfilter - total time for whole calculation: 0.549611 s

Runtime_BlockedTransform_iter_64_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000815', '0.000278', '0.000257', '0.000257 0.000278 0.001912', '0.000949', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000282', '0.000256', '0.000256', '0.000256 0.000256 0.000333', '0.000045', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000182', '0.000181', '0.000180', '0.000180 0.000181 0.000184', '0.000002', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000228', '0.000227', '0.000220', '0.000220 0.000227 0.000237', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000218', '0.000198', '0.000191', '0.000191 0.000198 0.000267', '0.000042', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000198', '0.000211', '0.000123', '0.000123 0.000211 0.000261', '0.000070', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000199', '0.000187', '0.000184', '0.000184 0.000187 0.000226', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000259', '0.000255', '0.000237', '0.000237 0.000255 0.000286', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000230', '0.000239', '0.000196', '0.000196 0.000239 0.000255', '0.000031', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000402', '0.000406', '0.000394', '0.000394 0.000406 0.000407', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000224', '0.000233', '0.000201', '0.000201 0.000233 0.000237', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000270', '0.000254', '0.000254', '0.000254 0.000254 0.000301', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000309', '0.000298', '0.000249', '0.000249 0.000298 0.000381', '0.000067', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000202', '0.000185', '0.000177', '0.000177 0.000185 0.000243', '0.000036', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000201', '0.000195', '0.000193', '0.000193 0.000195 0.000215', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000211', '0.000211', '0.000206', '0.000206 0.000211 0.000216', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.284251', '0.281654', '0.280674', '0.280674 0.281654 0.290426', '0.005370', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.291002', '0.280001', '0.277531', '0.277531 0.280001 0.315473', '0.021229', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.272182', '0.272398', '0.270547', '0.270547 0.272398 0.273600', '0.001538', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.257160', '0.256481', '0.256441', '0.256441 0.256481 0.258559', '0.001211', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.705522', '1.706371', '1.701669', '1.701669 1.706371 1.708525', '0.003506', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.683313', '1.684495', '1.680821', '1.680821 1.684495 1.684622', '0.002159', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.637029', '1.634206', '1.632021', '1.632021 1.634206 1.644859', '0.006868', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.649680', '1.649073', '1.645641', '1.645641 1.649073 1.654324', '0.004373', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=512

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000197', '0.000193', '0.000188', '0.000188 0.000193 0.000212', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=512

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000192', '0.000191', '0.000187', '0.000187 0.000191 0.000198', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

MicroBench_L2_fp32_2

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_2', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000011', '0.000010', '0.000010', '0.000010 0.000010 0.000014', '0.000002', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000012', '0.000010', '0.000010', '0.000010 0.000010 0.000016', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_16', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000014', '0.000012', '0.000012', '0.000012 0.000012 0.000017', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000021', '0.000015', '0.000013', '0.000013 0.000015 0.000034', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_8

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_8', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000012', '0.000010', '0.000010', '0.000010 0.000010 0.000016', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_8

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_8', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000012', '0.000011', '0.000011', '0.000011 0.000011 0.000016', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_2

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_2', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000013', '0.000011', '0.000010', '0.000010 0.000011 0.000017', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_4

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_4', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000012', '0.000010', '0.000010', '0.000010 0.000010 0.000017', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_16', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000013', '0.000012', '0.000012', '0.000012 0.000012 0.000016', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_4

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_4', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000013', '0.000010', '0.000010', '0.000010 0.000010 0.000019', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000025', '0.000022', '0.000021', '0.000021 0.000022 0.000031', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000055', '0.000046', '0.000045', '0.000045 0.000046 0.000073', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000033', '0.000026', '0.000023', '0.000023 0.000026 0.000049', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000057', '0.000046', '0.000045', '0.000045 0.000046 0.000079', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000024', '0.000022', '0.000020', '0.000020 0.000022 0.000029', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000054', '0.000046', '0.000045', '0.000045 0.000046 0.000073', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000076', '0.000063', '0.000062', '0.000062 0.000063 0.000104', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000074', '0.000062', '0.000062', '0.000062 0.000062 0.000096', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000045', '0.000034', '0.000033', '0.000033 0.000034 0.000066', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000058', '0.000041', '0.000034', '0.000034 0.000041 0.000097', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000073', '0.000062', '0.000061', '0.000061 0.000062 0.000095', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000048', '0.000038', '0.000036', '0.000036 0.000038 0.000069', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000029', '0.000025', '0.000025', '0.000025 0.000025 0.000036', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000027', '0.000025', '0.000024', '0.000024 0.000025 0.000031', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000027', '0.000025', '0.000025', '0.000025 0.000025 0.000031', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000017', '0.000015', '0.000014', '0.000014 0.000015 0.000021', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000018', '0.000015', '0.000014', '0.000014 0.000015 0.000024', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000016', '0.000014', '0.000013', '0.000013 0.000014 0.000019', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000027', '0.000024', '0.000024', '0.000024 0.000024 0.000032', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000029', '0.000019', '0.000016', '0.000016 0.000019 0.000052', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

SYCL2020_Accessors_Latency_fp32_out_of_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['SYCL2020_Accessors_Latency_fp32_out_of_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.037422', '0.036924', '0.036704', '0.036704 0.036924 0.038637', '0.001058', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

SYCL2020_Accessors_Latency_fp32_in_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['SYCL2020_Accessors_Latency_fp32_in_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.034980', '0.034933', '0.034808', '0.034808 0.034933 0.035199', '0.000200', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Latency_fp32_in_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['USM_Latency_fp32_in_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.015454', '0.016041', '0.014166', '0.014166 0.016041 0.016154', '0.001117', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Latency_fp32_out_of_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['USM_Latency_fp32_out_of_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.023914', '0.024358', '0.023013', '0.023013 0.024358 0.024370', '0.000780', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000001', '0.000001', '0.000000', '0.000000 0.000001 0.000001', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000014', '0.000002', '0.000001', '0.000001 0.000002 0.000039', '0.000022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000035', '0.000030', '0.000021', '0.000021 0.000030 0.000053', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.012076', '0.012076', '0.012076', '0.012076', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001066', '0.001049', '0.001045', '0.001045 0.001049 0.001105', '0.000033', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.012053', '0.012053', '0.012053', '0.012053', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000897', '0.000891', '0.000890', '0.000890 0.000891 0.000910', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.011357', '0.011357', '0.011357', '0.011357', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001926', '0.001434', '0.001415', '0.001415 0.001434 0.002928', '0.000868', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001619', '0.001592', '0.001588', '0.001588 0.001592 0.001678', '0.000051', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.011685', '0.011685', '0.011685', '0.011685', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000188', '0.000168', '0.000034', '0.000034 0.000168 0.000364', '0.000166', '0.339165', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000015', '0.000012', '0.000012', '0.000012 0.000012 0.000022', '0.000006', '0.993152', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000026', '0.000004', '0.000002', '0.000002 0.000004 0.000071', '0.000039', '5.054811', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000012', '0.000003', '0.000002', '0.000002 0.000003 0.000031', '0.000016', '4.754504', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000018', '0.000015', '0.000015', '0.000015 0.000015 0.000023', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000016', '0.000014', '0.000013', '0.000013 0.000014 0.000021', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000023', '0.000015', '0.000013', '0.000013 0.000015 0.000041', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2DConvolution

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2DConvolution --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2DConvolution.csv

Output:

['Polybench_2DConvolution', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000195', '0.000196', '0.000186', '0.000186 0.000196 0.000203', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001223', '0.001223', '0.001215', '0.001215 0.001223 0.001230', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001739', '0.001739', '0.001731', '0.001731 0.001739 0.001747', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_Arith_int32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_int32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000060', '0.000041', '0.000038', '0.000038 0.000041 0.000102', '0.000036', '821.719695', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

MicroBench_Arith_fp32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_fp32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000023', '0.000020', '0.000020', '0.000020 0.000020 0.000031', '0.000006', '1593.249720', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006797', '0.006818', '0.006690', '0.006690 0.006818 0.006884', '0.000098', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000021', '0.000013', '0.000012', '0.000012 0.000013 0.000038', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000071', '0.000021', '0.000020', '0.000020 0.000021 0.000170', '0.000086', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000071', '0.000013', '0.000011', '0.000011 0.000013 0.000188', '0.000102', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000022', '0.000020', '0.000020', '0.000020 0.000020 0.000025', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Bicg

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/bicg --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Bicg.csv --size=20480

Output:

['Polybench_Bicg', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '20480', '0.014338', '0.014338', '0.014338', '0.014338', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Correlation

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/correlation --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Correlation.csv --size=2048

Output:

['Polybench_Correlation', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '2048', '3.009203', '3.009203', '3.009203', '3.009203', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Covariance

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/covariance --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Covariance.csv --size=2048

Output:

['Polybench_Covariance', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '2048', '2.997658', '2.997658', '2.997658', '2.997658', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gesummv

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gesummv --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gesummv.csv --size=8192

Output:

['Polybench_Gesummv', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.007247', '0.007247', '0.007247', '0.007247', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gramschmidt

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gramschmidt --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gramschmidt.csv --size=512

Output:

['Polybench_Gramschmidt', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.284794', '0.284794', '0.284794', '0.284794', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016170', '0.016169', '0.016166', '0.016166 0.016169 0.016175', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966610', '0.966703', '0.966412', '0.966412 0.966703 0.966716', '0.000172', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegression_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_error --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LinearRegression.csv --size=640000

Output:

['LinearRegression_fp32', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '640000', '0.408148', '0.408148', '0.408148', '0.408148', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MatmulChain

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/matmulchain --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MatmulChain.csv --size=2048

Output:

['MatmulChain', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '2048', '0.085311', '0.085311', '0.085311', '0.085311', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000036', '0.000028', '0.000026', '0.000026 0.000028 0.000055', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Mvt

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mvt --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Mvt.csv --size=32767

Output:

['Polybench_Mvt', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32767', '0.040178', '0.040178', '0.040178', '0.040178', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_sf_fp32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/sf --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/sf_16.csv --size=--size=100000000

Output:

['MicroBench_sf_fp32_16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '0', '0.000013', '0.000009', '0.000008', '0.000008 0.000009 0.000023', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

Polybench_Syr2k

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syr2k --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syr2k.csv --size=6144

Output:

['Polybench_Syr2k', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '6144', '4.221023', '4.221023', '4.221023', '4.221023', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Syrk

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syrk --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syrk.csv --size=4096

Output:

['Polybench_Syrk', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '4096', '0.209786', '0.209786', '0.209786', '0.209786', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2024

Compute Benchmarks level_zero_v2 run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11157704223

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2024

Compute Benchmarks level_zero_v2 run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11157704223
Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): cannot calculate
Benchmark This PR Relative perf Change -
api_overhead_benchmark_sycl SubmitKernel out of order 20.564000 μs
api_overhead_benchmark_sycl SubmitKernel in order 21.222000 μs
api_overhead_benchmark_ur SubmitKernel out of order 14.275000 μs
api_overhead_benchmark_ur SubmitKernel in order 11.822000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 1.468000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.460000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 356.741000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 84.317000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 7.386000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.373000 μs
Relative perf in group Velocity-Bench (5): cannot calculate
Benchmark This PR Relative perf Change -
Velocity-Bench Hashtable 363.307101 M keys/sec
Velocity-Bench Bitcracker 35.436000 s
Velocity-Bench CudaSift 219.354000 ms
Velocity-Bench QuickSilver 107.310000 MMS/CTT
Velocity-Bench Sobel Filter 551.541000 ms
Relative perf in group Runtime (20): cannot calculate
Benchmark This PR Relative perf Change -
Runtime_BlockedTransform_iter_512_blocksize_4096 0.244000 ms
Runtime_BlockedTransform_iter_64_blocksize_2048 0.233000 ms
Runtime_BlockedTransform_iter_128_blocksize_8192 0.196000 ms
Runtime_BlockedTransform_iter_256_blocksize_8192 0.198000 ms
Runtime_BlockedTransform_iter_256_blocksize_1024 0.298000 ms
Runtime_BlockedTransform_iter_128_blocksize_2048 0.213000 ms
Runtime_BlockedTransform_iter_256_blocksize_2048 0.255000 ms
Runtime_BlockedTransform_iter_128_blocksize_4096 0.196000 ms
Runtime_BlockedTransform_iter_256_blocksize_4096 0.238000 ms
Runtime_BlockedTransform_iter_512_blocksize_8192 0.233000 ms
Runtime_BlockedTransform_iter_64_blocksize_8192 0.198000 ms
Runtime_BlockedTransform_iter_64_blocksize_4096 0.203000 ms
Runtime_BlockedTransform_iter_64_blocksize_1024 0.282000 ms
Runtime_BlockedTransform_iter_128_blocksize_1024 0.257000 ms
Runtime_BlockedTransform_iter_512_blocksize_1024 0.407000 ms
Runtime_BlockedTransform_iter_512_blocksize_2048 0.256000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor 1649.073000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1679.087000 ms
Runtime_DAGTaskThroughput_BasicParallelFor 1706.371000 ms
Runtime_DAGTaskThroughput_SingleTask 1633.964000 ms
Relative perf in group MicroBench (16): cannot calculate
Benchmark This PR Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 791.859000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 796.541000 ms
MicroBench_LocalMem_fp32_4096 0.191000 ms
MicroBench_LocalMem_int32_4096 0.195000 ms
MicroBench_L2_int32_2 0.011000 ms
MicroBench_L2_fp32_8 0.011000 ms
MicroBench_L2_int32_16 0.012000 ms
MicroBench_L2_int32_1 0.015000 ms
MicroBench_L2_int32_8 0.011000 ms
MicroBench_L2_fp32_16 0.012000 ms
MicroBench_L2_fp32_2 0.010000 ms
MicroBench_L2_fp32_1 0.010000 ms
MicroBench_L2_int32_4 0.010000 ms
MicroBench_L2_fp32_4 0.010000 ms
MicroBench_Arith_fp32_512 0.020000 ms
MicroBench_Arith_int32_512 0.041000 ms
Relative perf in group Pattern (14): cannot calculate
Benchmark This PR Relative perf Change -
Pattern_Reduction_NDRange_fp32 0.022000 ms
Pattern_Reduction_Hierarchical_int32 0.054000 ms
Pattern_Reduction_Hierarchical_int64 0.048000 ms
Pattern_Reduction_Hierarchical_fp32 0.049000 ms
Pattern_Reduction_NDRange_int32 0.027000 ms
Pattern_Reduction_NDRange_int64 0.023000 ms
Pattern_SegmentedReduction_NDRange_int16 0.019000 ms
Pattern_SegmentedReduction_Hierarchical_int64 0.026000 ms
Pattern_SegmentedReduction_NDRange_fp32 0.015000 ms
Pattern_SegmentedReduction_NDRange_int64 0.015000 ms
Pattern_SegmentedReduction_Hierarchical_int32 0.026000 ms
Pattern_SegmentedReduction_Hierarchical_int16 0.025000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 0.028000 ms
Pattern_SegmentedReduction_NDRange_int32 0.016000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR Relative perf Change -
ScalarProduct_Hierarchical_fp32 0.063000 ms
ScalarProduct_NDRange_int64 0.038000 ms
ScalarProduct_Hierarchical_int32 0.063000 ms
ScalarProduct_NDRange_int32 0.041000 ms
ScalarProduct_NDRange_fp32 0.034000 ms
ScalarProduct_Hierarchical_int64 0.062000 ms
Relative perf in group USM (17): cannot calculate
Benchmark This PR Relative perf Change -
USM_Latency_fp32_out_of_order__ 24.129000 ms
USM_Latency_fp32_in_order__ 16.088000 ms
USM_Allocation_latency_fp32_host 0.001000 ms
USM_Allocation_latency_fp32_device 0.002000 ms
USM_Allocation_latency_fp32_shared 0.029000 ms
USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch 11.358000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.050000 ms
USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch 11.685000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 0.892000 ms
USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch 12.654000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.581000 ms
USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch 12.662000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.435000 ms
USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1 0.164000 ms
USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1 0.003000 ms
USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1 0.012000 ms
USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1 0.004000 ms
Relative perf in group SYCL2020 (2): cannot calculate
Benchmark This PR Relative perf Change -
SYCL2020_Accessors_Latency_fp32_in_order__ 34.240000 ms
SYCL2020_Accessors_Latency_fp32_out_of_order__ 36.447000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR Relative perf Change -
VectorAddition_int64 0.016000 ms
VectorAddition_fp32 0.014000 ms
VectorAddition_int32 0.015000 ms
Relative perf in group Polybench (12): cannot calculate
Benchmark This PR Relative perf Change -
Polybench_2DConvolution 0.196000 ms
Polybench_2mm 1.223000 ms
Polybench_3mm 1.735000 ms
Polybench_Atax 6.858000 ms
Polybench_Bicg 14.444000 ms
Polybench_Correlation 3009.203000 ms
Polybench_Covariance 3016.879000 ms
Polybench_Gesummv 7.179000 ms
Polybench_Gramschmidt 284.794000 ms
Polybench_Mvt 40.186000 ms
Polybench_Syr2k 4316.102000 ms
Polybench_Syrk 207.153000 ms
Relative perf in group ReductionAtomic (4): cannot calculate
Benchmark This PR Relative perf Change -
ReductionAtomic_fp32 0.020000 ms
ReductionAtomic_int32 0.013000 ms
ReductionAtomic_int64 0.013000 ms
ReductionAtomic_fp64 0.021000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR Relative perf Change -
Kmeans_fp32 16.206000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR Relative perf Change -
LinearRegressionCoeff_fp32 966.732000 ms
Relative perf in group LinearRegression (1): cannot calculate
Benchmark This PR Relative perf Change -
LinearRegression_fp32 407.906000 ms
Relative perf in group MatmulChain (1): cannot calculate
Benchmark This PR Relative perf Change -
MatmulChain 85.311000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR Relative perf Change -
MolecularDynamics 0.029000 ms

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),20.564,20.500,4.51%,19.728,277.503,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),21.222,21.167,4.20%,20.286,271.096,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.275,14.271,2.35%,12.975,63.156,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),11.822,11.745,3.63%,11.006,28.225,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),356.741,356.423,1.23%,349.012,498.438,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),84.317,83.312,14.16%,81.016,272.395,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),7.386,7.283,14.00%,6.118,81.928,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.373,3.402,3.75%,0.420,3.594,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),1.468,1.463,23.13%,1.329,106.390,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.460,1.455,17.12%,1.309,77.665,[CPU],[us]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.369433 s
363.307101 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00437926 s
bitcracker - total time for whole calculation: 35.436 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1129 1267 30.6544% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1260 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1150 1256 31.2245% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1261 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1267 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1265 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1252 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1059 1272 28.7537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1076 1258 29.2153% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1262 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1259 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1273 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1086 1271 29.4868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1185 1260 32.1749% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1274 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1155 1258 31.3603% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1262 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1147 1255 31.1431% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1268 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1257 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1201 1266 32.6093% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1256 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1259 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1216 1265 33.0166% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1255 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1268 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1271 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1266 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1108 1261 30.0842% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1258 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1089 1261 29.5683% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1262 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1261 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1258 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1271 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1251 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1276 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1272 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1269 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1267 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1257 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1187 1272 32.2292% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1252 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1259 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1262 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 219.354 ms

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.799980e-01 7.059100e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.348570e-01 8.240000e-01 1.000000e-06
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.357440e-01 8.400740e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.589160e-01 8.940810e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.321030e-01 8.707390e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.344490e-01 8.534110e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.345990e-01 8.433300e-01 1.000000e-06
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.351950e-01 8.655890e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.349460e-01 8.621210e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.355630e-01 8.357160e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.181e+07 1.181e+07 1.181e+07 0.000e+00 100.00
cycleInit 10 3.416e+06 3.416e+06 3.416e+06 0.000e+00 100.00
cycleTracking 10 8.395e+06 8.395e+06 8.395e+06 0.000e+00 100.00
cycleTracking_Kernel 104 5.086e+06 5.086e+06 5.086e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.953e+05 1.953e+05 1.953e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.000e+02 4.000e+02 4.000e+02 0.000e+00 100.00
Figure Of Merit 107.31 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.48093 s
sobelfilter - total time for whole calculation: 0.551541 s

Runtime_BlockedTransform_iter_512_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000250', '0.000244', '0.000241', '0.000241 0.000244 0.000264', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000247', '0.000233', '0.000227', '0.000227 0.000233 0.000283', '0.000031', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000211', '0.000196', '0.000190', '0.000190 0.000196 0.000246', '0.000031', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000208', '0.000198', '0.000190', '0.000190 0.000198 0.000235', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000309', '0.000298', '0.000249', '0.000249 0.000298 0.000381', '0.000067', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000211', '0.000213', '0.000206', '0.000206 0.000213 0.000213', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000271', '0.000255', '0.000214', '0.000214 0.000255 0.000343', '0.000066', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000194', '0.000196', '0.000189', '0.000189 0.000196 0.000197', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000224', '0.000238', '0.000196', '0.000196 0.000238 0.000238', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000241', '0.000233', '0.000217', '0.000217 0.000233 0.000274', '0.000030', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_8192

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_8192', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000217', '0.000198', '0.000192', '0.000192 0.000198 0.000259', '0.000037', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000216', '0.000203', '0.000198', '0.000198 0.000203 0.000248', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000790', '0.000282', '0.000255', '0.000255 0.000282 0.001834', '0.000904', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000286', '0.000257', '0.000222', '0.000222 0.000257 0.000378', '0.000082', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000409', '0.000407', '0.000407', '0.000407 0.000407 0.000415', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=16384 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '16384', '0.000246', '0.000256', '0.000223', '0.000223 0.000256 0.000259', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.649680', '1.649073', '1.645641', '1.645641 1.649073 1.654324', '0.004373', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.678543', '1.679087', '1.676954', '1.676954 1.679087 1.679587', '0.001398', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.705522', '1.706371', '1.701669', '1.701669 1.706371 1.708525', '0.003506', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.636287', '1.633964', '1.632412', '1.632412 1.633964 1.642484', '0.005423', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.792080', '0.791859', '0.790973', '0.790973 0.791859 0.793408', '0.001233', '34.135190', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '27.000000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.796934', '0.796541', '0.790812', '0.790812 0.796541 0.803448', '0.006327', '34.142127', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '27.000000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=512

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000192', '0.000191', '0.000187', '0.000187 0.000191 0.000198', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=512

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.000198', '0.000195', '0.000188', '0.000188 0.000195 0.000211', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000000']

MicroBench_L2_int32_2

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_2', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000013', '0.000011', '0.000010', '0.000010 0.000011 0.000018', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_8

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_8', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000012', '0.000011', '0.000011', '0.000011 0.000011 0.000015', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_16', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000014', '0.000012', '0.000012', '0.000012 0.000012 0.000017', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000021', '0.000015', '0.000012', '0.000012 0.000015 0.000034', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_8

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_8', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000012', '0.000011', '0.000010', '0.000010 0.000011 0.000016', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_16', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000014', '0.000012', '0.000011', '0.000011 0.000012 0.000017', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_2

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_2', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000011', '0.000010', '0.000010', '0.000010 0.000010 0.000014', '0.000002', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000012', '0.000010', '0.000010', '0.000010 0.000010 0.000015', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_int32_4

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_int32_4', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000013', '0.000010', '0.000009', '0.000009 0.000010 0.000019', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_L2_fp32_4

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/pattern_L2 --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/L2_multi.csv

Output:

['MicroBench_L2_fp32_4', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000013', '0.000010', '0.000010', '0.000010 0.000010 0.000018', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000023', '0.000022', '0.000020', '0.000020 0.000022 0.000028', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000060', '0.000054', '0.000048', '0.000048 0.000054 0.000078', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000057', '0.000048', '0.000048', '0.000048 0.000048 0.000075', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000056', '0.000049', '0.000048', '0.000048 0.000049 0.000072', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000032', '0.000027', '0.000022', '0.000022 0.000027 0.000049', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv

Output:

['Pattern_Reduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000025', '0.000023', '0.000021', '0.000021 0.000023 0.000030', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000073', '0.000063', '0.000061', '0.000061 0.000063 0.000096', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000048', '0.000038', '0.000036', '0.000036 0.000038 0.000069', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000075', '0.000063', '0.000062', '0.000062 0.000063 0.000100', '0.000022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000058', '0.000041', '0.000034', '0.000034 0.000041 0.000097', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000044', '0.000034', '0.000032', '0.000032 0.000034 0.000065', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000073', '0.000062', '0.000062', '0.000062 0.000062 0.000095', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000030', '0.000019', '0.000015', '0.000015 0.000019 0.000055', '0.000022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000027', '0.000026', '0.000025', '0.000025 0.000026 0.000030', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000017', '0.000015', '0.000013', '0.000013 0.000015 0.000022', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000017', '0.000015', '0.000014', '0.000014 0.000015 0.000021', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000028', '0.000026', '0.000025', '0.000025 0.000026 0.000033', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000030', '0.000025', '0.000025', '0.000025 0.000025 0.000041', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000028', '0.000028', '0.000024', '0.000024 0.000028 0.000031', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000017', '0.000016', '0.000014', '0.000014 0.000016 0.000022', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Latency_fp32_out_of_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['USM_Latency_fp32_out_of_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.023615', '0.024129', '0.022292', '0.022292 0.024129 0.024425', '0.001156', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Latency_fp32_in_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['USM_Latency_fp32_in_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.015418', '0.016088', '0.013925', '0.013925 0.016088 0.016243', '0.001296', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

SYCL2020_Accessors_Latency_fp32_in_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['SYCL2020_Accessors_Latency_fp32_in_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.034288', '0.034240', '0.034107', '0.034107 0.034240 0.034516', '0.000208', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

SYCL2020_Accessors_Latency_fp32_out_of_order__

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_accessors_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Latency_multi.csv

Output:

['SYCL2020_Accessors_Latency_fp32_out_of_order__', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.037079', '0.036447', '0.036271', '0.036271 0.036447 0.038519', '0.001250', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000001', '0.000001', '0.000000', '0.000000 0.000001 0.000001', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000014', '0.000002', '0.000001', '0.000001 0.000002 0.000039', '0.000022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000033', '0.000029', '0.000021', '0.000021 0.000029 0.000049', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.011358', '0.011358', '0.011358', '0.011358', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001056', '0.001050', '0.001047', '0.001047 0.001050 0.001071', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.011685', '0.011685', '0.011685', '0.011685', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000899', '0.000892', '0.000889', '0.000889 0.000892 0.000916', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.012654', '0.012654', '0.012654', '0.012654', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001600', '0.001581', '0.001573', '0.001573 0.001581 0.001646', '0.000040', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.012662', '0.012662', '0.012662', '0.012662', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.001934', '0.001435', '0.001428', '0.001428 0.001435 0.002941', '0.000872', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000216', '0.000164', '0.000108', '0.000108 0.000164 0.000377', '0.000142', '0.106401', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000010', '0.000003', '0.000002', '0.000002 0.000003 0.000026', '0.000014', '4.861551', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000017', '0.000012', '0.000012', '0.000012 0.000012 0.000028', '0.000009', '0.932613', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_pinned_overhead --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Pinned_Overhead_multi.csv

Output:

['USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1', 'N/A', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000025', '0.000004', '0.000002', '0.000002 0.000004 0.000069', '0.000038', '5.237571', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.000011']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000018', '0.000016', '0.000015', '0.000015 0.000016 0.000022', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000016', '0.000014', '0.000014', '0.000014 0.000014 0.000020', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000025', '0.000015', '0.000013', '0.000013 0.000015 0.000046', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2DConvolution

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2DConvolution --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2DConvolution.csv

Output:

['Polybench_2DConvolution', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000197', '0.000196', '0.000189', '0.000189 0.000196 0.000204', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001223', '0.001223', '0.001215', '0.001215 0.001223 0.001230', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001735', '0.001735', '0.001729', '0.001729 0.001735 0.001742', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_Arith_fp32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_fp32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000023', '0.000020', '0.000020', '0.000020 0.000020 0.000030', '0.000006', '1592.600143', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

MicroBench_Arith_int32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_int32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000060', '0.000041', '0.000038', '0.000038 0.000041 0.000102', '0.000036', '828.648706', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006848', '0.006858', '0.006821', '0.006821 0.006858 0.006865', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000022', '0.000020', '0.000020', '0.000020 0.000020 0.000026', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000021', '0.000013', '0.000012', '0.000012 0.000013 0.000039', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000070', '0.000013', '0.000011', '0.000011 0.000013 0.000185', '0.000100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000069', '0.000021', '0.000020', '0.000020 0.000021 0.000168', '0.000085', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Bicg

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/bicg --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Bicg.csv --size=20480

Output:

['Polybench_Bicg', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '20480', '0.014444', '0.014444', '0.014444', '0.014444', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Correlation

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/correlation --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Correlation.csv --size=2048

Output:

['Polybench_Correlation', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '2048', '3.009203', '3.009203', '3.009203', '3.009203', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Covariance

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/covariance --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Covariance.csv --size=2048

Output:

['Polybench_Covariance', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '2048', '3.016879', '3.016879', '3.016879', '3.016879', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gesummv

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gesummv --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gesummv.csv --size=8192

Output:

['Polybench_Gesummv', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.007179', '0.007179', '0.007179', '0.007179', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Gramschmidt

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/gramschmidt --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Gramschmidt.csv --size=512

Output:

['Polybench_Gramschmidt', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.284794', '0.284794', '0.284794', '0.284794', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016197', '0.016206', '0.016179', '0.016179 0.016206 0.016207', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966767', '0.966732', '0.966675', '0.966675 0.966732 0.966894', '0.000113', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegression_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_error --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LinearRegression.csv --size=640000

Output:

['LinearRegression_fp32', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '640000', '0.407906', '0.407906', '0.407906', '0.407906', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MatmulChain

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/matmulchain --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MatmulChain.csv --size=2048

Output:

['MatmulChain', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '2048', '0.085311', '0.085311', '0.085311', '0.085311', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000036', '0.000029', '0.000025', '0.000025 0.000029 0.000056', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Mvt

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mvt --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Mvt.csv --size=32767

Output:

['Polybench_Mvt', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32767', '0.040186', '0.040186', '0.040186', '0.040186', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Syr2k

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syr2k --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syr2k.csv --size=6144

Output:

['Polybench_Syr2k', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '6144', '4.316102', '4.316102', '4.316102', '4.316102', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Syrk

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syrk --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syrk.csv --size=4096

Output:

['Polybench_Syrk', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '4096', '0.207153', '0.207153', '0.207153', '0.207153', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

@lslusarczyk lslusarczyk force-pushed the bench_no_recompile_ur branch from 053d092 to 1360515 Compare October 3, 2024 09:49
Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, but it might be easier to merge if we were to split this into two PRs: one that fixes make install (and maybe adds a test?) and the second one that does the benchmarks change.

@lslusarczyk
Copy link
Contributor Author

lslusarczyk commented Oct 3, 2024

I've compared level_zero_v2 benchmarks on main and on my PR.

Results are also comparable. E.g. "api_overhead_benchmark_sycl/ur SubmitKernel out of order/in order.

  • using main: 20.153 μs, 24.01 μs, 14.613 μs, 11.627 μs
  • using this PR: 20.564 μs, 21.222 μs, 14.275 μs, 11.822 μs

So it seems I've not broken anything I think.

@lslusarczyk
Copy link
Contributor Author

overall lgtm, but it might be easier to merge if we were to split this into two PRs: one that fixes make install (and maybe adds a test?) and the second one that does the benchmarks change.

separate PR: #2169
Please check if my way of testing cmake install looks sufficient to you there.

@lslusarczyk lslusarczyk force-pushed the bench_no_recompile_ur branch from 1360515 to bf46d7f Compare October 4, 2024 17:06
@lslusarczyk lslusarczyk marked this pull request as ready for review October 4, 2024 17:11
@lslusarczyk lslusarczyk requested a review from a team as a code owner October 4, 2024 17:11
@pbalcer
Copy link
Contributor

pbalcer commented Oct 7, 2024

the cmake change got merged, please rebase.

@lslusarczyk
Copy link
Contributor Author

the cmake change got merged, please rebase.

already rebased

@pbalcer pbalcer merged commit cf90cb1 into oneapi-src:main Oct 7, 2024
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd Continuous integration/devliery cuda CUDA adapter specific issues hip HIP adapter specific issues level-zero L0 adapter specific issues loader Loader related feature/bug native-cpu Native CPU adapter specific issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants