Compute-benchmarks use already compiled UR #2146

lslusarczyk · 2024-09-27T21:41:22Z

Utilize support for using UR binary through find_package added in intel/compute-benchmarks#21

Before this change compute-benchmarks always builds UR from sources which is unnecessary since we already build UR as part of SYCL on UR CI.

Add installing adapters by "cmake --install" which is needed by new benchmarks code.

scripts/benchmarks/benches/base.py

source/adapters/cuda/CMakeLists.txt

test/adapters/level_zero/CMakeLists.txt

source/adapters/cuda/CMakeLists.txt

pbalcer · 2024-09-30T09:01:22Z

scripts/benchmarks/benches/base.py

    def __init__(self, directory):
        self.directory = directory
-        self.adapter_path = os.path.join(options.ur_dir, 'build', 'lib', f"libur_adapter_{options.ur_adapter_name}.so")
+        for libs_dir_name in ['lib', 'lib64']:


I think the make install fix is fine, but, in this case, not doing the install makes this deterministic - the file will be located in the lib directory.
Why the change to do a make install?

can you make 'get lib path' into a generic function?

I think the make install fix is fine, but, in this case, not doing the install makes this deterministic - the file will be located in the lib directory. Why the change to do a make install?

Not doing install will fail benchmark in an assert in the next line with message like could not find adapter file /home/lslusarczyk/src2/ur_install/lib64/libur_adapter_level_zero.so (and in similar lib paths) which looks what I would like to se, that is sample full path which was searched and still being a short message

"make install" used as cleanest way of using binaries and includes of already built UR

can you make 'get lib path' into a generic function?

yes, separate function added

github-actions · 2024-10-02T18:44:07Z

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11149704094

github-actions · 2024-10-02T19:37:56Z

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11149704094
Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): cannot calculate

Benchmark	This PR	Relative perf	Change	-
api_overhead_benchmark_sycl SubmitKernel out of order	22.516000 μs
api_overhead_benchmark_sycl SubmitKernel in order	25.388000 μs
api_overhead_benchmark_ur SubmitKernel out of order	14.319000 μs
api_overhead_benchmark_ur SubmitKernel in order	15.928000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	2.403000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.648000 μs

Relative perf in group memory (4): cannot calculate

Benchmark	This PR	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	222.173000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	127.015000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.723000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.186000 μs

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	860.568000 μs

Relative perf in group Velocity-Bench (5): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Velocity-Bench Hashtable	360.808426 M keys/sec
Velocity-Bench Bitcracker	35.465800 s
Velocity-Bench CudaSift	221.398000 ms
Velocity-Bench QuickSilver	118.090000 MMS/CTT
Velocity-Bench Sobel Filter	549.611000 ms

Relative perf in group Runtime (24): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Runtime_BlockedTransform_iter_64_blocksize_1024	0.278000 ms
Runtime_BlockedTransform_iter_256_blocksize_2048	0.256000 ms
Runtime_BlockedTransform_iter_128_blocksize_8192	0.181000 ms
Runtime_BlockedTransform_iter_512_blocksize_8192	0.227000 ms
Runtime_BlockedTransform_iter_64_blocksize_4096	0.198000 ms
Runtime_BlockedTransform_iter_64_blocksize_2048	0.211000 ms
Runtime_BlockedTransform_iter_256_blocksize_8192	0.187000 ms
Runtime_BlockedTransform_iter_128_blocksize_1024	0.255000 ms
Runtime_BlockedTransform_iter_256_blocksize_4096	0.239000 ms
Runtime_BlockedTransform_iter_512_blocksize_1024	0.406000 ms
Runtime_BlockedTransform_iter_512_blocksize_4096	0.233000 ms
Runtime_BlockedTransform_iter_512_blocksize_2048	0.254000 ms
Runtime_BlockedTransform_iter_256_blocksize_1024	0.298000 ms
Runtime_BlockedTransform_iter_64_blocksize_8192	0.185000 ms
Runtime_BlockedTransform_iter_128_blocksize_4096	0.195000 ms
Runtime_BlockedTransform_iter_128_blocksize_2048	0.211000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	281.654000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	280.001000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	272.398000 ms
Runtime_IndependentDAGTaskThroughput_SingleTask	256.481000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	1706.371000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1684.495000 ms
Runtime_DAGTaskThroughput_SingleTask	1634.206000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	1649.073000 ms

Relative perf in group MicroBench (15): cannot calculate

Benchmark	This PR	Relative perf	Change	-
MicroBench_LocalMem_int32_4096	0.193000 ms
MicroBench_LocalMem_fp32_4096	0.191000 ms
MicroBench_L2_fp32_2	0.010000 ms
MicroBench_L2_fp32_1	0.010000 ms
MicroBench_L2_fp32_16	0.012000 ms
MicroBench_L2_int32_1	0.015000 ms
MicroBench_L2_fp32_8	0.010000 ms
MicroBench_L2_int32_8	0.011000 ms
MicroBench_L2_int32_2	0.011000 ms
MicroBench_L2_fp32_4	0.010000 ms
MicroBench_L2_int32_16	0.012000 ms
MicroBench_L2_int32_4	0.010000 ms
MicroBench_Arith_int32_512	0.041000 ms
MicroBench_Arith_fp32_512	0.020000 ms
MicroBench_sf_fp32_16	0.009000 ms

Relative perf in group Pattern (14): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Pattern_Reduction_NDRange_int64	0.022000 ms
Pattern_Reduction_Hierarchical_int64	0.046000 ms
Pattern_Reduction_NDRange_int32	0.026000 ms
Pattern_Reduction_Hierarchical_int32	0.046000 ms
Pattern_Reduction_NDRange_fp32	0.022000 ms
Pattern_Reduction_Hierarchical_fp32	0.046000 ms
Pattern_SegmentedReduction_Hierarchical_int16	0.025000 ms
Pattern_SegmentedReduction_Hierarchical_int32	0.025000 ms
Pattern_SegmentedReduction_Hierarchical_int64	0.025000 ms
Pattern_SegmentedReduction_NDRange_int64	0.015000 ms
Pattern_SegmentedReduction_NDRange_int32	0.015000 ms
Pattern_SegmentedReduction_NDRange_fp32	0.014000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	0.024000 ms
Pattern_SegmentedReduction_NDRange_int16	0.019000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	Relative perf	Change	-
ScalarProduct_Hierarchical_int32	0.063000 ms
ScalarProduct_Hierarchical_fp32	0.062000 ms
ScalarProduct_NDRange_fp32	0.034000 ms
ScalarProduct_NDRange_int32	0.041000 ms
ScalarProduct_Hierarchical_int64	0.062000 ms
ScalarProduct_NDRange_int64	0.038000 ms

Relative perf in group SYCL2020 (2): cannot calculate

Benchmark	This PR	Relative perf	Change	-
SYCL2020_Accessors_Latency_fp32_out_of_order__	36.924000 ms
SYCL2020_Accessors_Latency_fp32_in_order__	34.933000 ms

Relative perf in group USM (17): cannot calculate

Benchmark	This PR	Relative perf	Change	-
USM_Latency_fp32_in_order__	16.041000 ms
USM_Latency_fp32_out_of_order__	24.358000 ms
USM_Allocation_latency_fp32_host	0.001000 ms
USM_Allocation_latency_fp32_device	0.002000 ms
USM_Allocation_latency_fp32_shared	0.030000 ms
USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch	12.076000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.049000 ms
USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch	12.053000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	0.891000 ms
USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch	11.357000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.434000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.592000 ms
USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch	11.685000 ms
USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1	0.168000 ms
USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1	0.012000 ms
USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1	0.004000 ms
USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1	0.003000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	Relative perf	Change	-
VectorAddition_int64	0.015000 ms
VectorAddition_fp32	0.014000 ms
VectorAddition_int32	0.015000 ms

Relative perf in group Polybench (12): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Polybench_2DConvolution	0.196000 ms
Polybench_2mm	1.223000 ms
Polybench_3mm	1.739000 ms
Polybench_Atax	6.818000 ms
Polybench_Bicg	14.338000 ms
Polybench_Correlation	3009.203000 ms
Polybench_Covariance	2997.658000 ms
Polybench_Gesummv	7.247000 ms
Polybench_Gramschmidt	284.794000 ms
Polybench_Mvt	40.178000 ms
Polybench_Syr2k	4221.023000 ms
Polybench_Syrk	209.786000 ms

Relative perf in group ReductionAtomic (4): cannot calculate

Benchmark	This PR	Relative perf	Change	-
ReductionAtomic_int32	0.013000 ms
ReductionAtomic_fp64	0.021000 ms
ReductionAtomic_int64	0.013000 ms
ReductionAtomic_fp32	0.020000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Kmeans_fp32	16.169000 ms

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
LinearRegressionCoeff_fp32	966.703000 ms

Relative perf in group LinearRegression (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
LinearRegression_fp32	408.148000 ms

Relative perf in group MatmulChain (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
MatmulChain	85.311000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
MolecularDynamics	0.028000 ms

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00451105 s
bitcracker - total time for whole calculation: 35.4658 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1258 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1259 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1251 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1270 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1271 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1266 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1170 1259 31.7676% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1266 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1267 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1210 1263 32.8537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1202 1262 32.6364% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1256 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1206 1258 32.745% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1171 1244 31.7947% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1089 1265 29.5683% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1249 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1256 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1272 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1123 1261 30.4914% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1254 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1272 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1268 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1261 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1272 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1256 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1254 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1074 1270 29.161% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1260 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1265 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1256 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1207 1268 32.7722% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1206 1258 32.745% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1134 1271 30.7901% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1205 1261 32.7179% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1268 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1117 1263 30.3285% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1263 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1261 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1275 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1203 1273 32.6636% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1266 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1271 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1143 1253 31.0345% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1261 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1203 1262 32.6636% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1261 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1260 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1256 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 221.398 ms

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.321660e-01 6.168240e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.651260e-01 7.512570e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.364410e-01 7.652690e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.669210e-01 8.297590e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.602370e-01 7.975150e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.380780e-01 7.670290e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.315490e-01 7.673950e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.323860e-01 7.871690e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.318600e-01 7.855490e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.323590e-01 7.606550e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.116e+07 1.116e+07 1.116e+07 0.000e+00 100.00
cycleInit 10 3.527e+06 3.527e+06 3.527e+06 0.000e+00 100.00
cycleTracking 10 7.628e+06 7.628e+06 7.628e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.942e+06 4.942e+06 4.942e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.217e+05 2.217e+05 2.217e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.100e+02 4.100e+02 4.100e+02 0.000e+00 100.00
Figure Of Merit 118.09 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syrk --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syrk.csv --size=4096

Output:

['Polybench_Syrk', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '4096', '0.209786', '0.209786', '0.209786', '0.209786', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

github-actions · 2024-10-03T08:16:04Z

Compute Benchmarks level_zero_v2 run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11157704223

github-actions · 2024-10-03T09:08:26Z

Compute Benchmarks level_zero_v2 run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11157704223
Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): cannot calculate

Benchmark	This PR	Relative perf	Change	-
api_overhead_benchmark_sycl SubmitKernel out of order	20.564000 μs
api_overhead_benchmark_sycl SubmitKernel in order	21.222000 μs
api_overhead_benchmark_ur SubmitKernel out of order	14.275000 μs
api_overhead_benchmark_ur SubmitKernel in order	11.822000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	1.468000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.460000 μs

Relative perf in group memory (4): cannot calculate

Benchmark	This PR	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	356.741000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	84.317000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	7.386000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.373000 μs

Relative perf in group Velocity-Bench (5): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Velocity-Bench Hashtable	363.307101 M keys/sec
Velocity-Bench Bitcracker	35.436000 s
Velocity-Bench CudaSift	219.354000 ms
Velocity-Bench QuickSilver	107.310000 MMS/CTT
Velocity-Bench Sobel Filter	551.541000 ms

Relative perf in group Runtime (20): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Runtime_BlockedTransform_iter_512_blocksize_4096	0.244000 ms
Runtime_BlockedTransform_iter_64_blocksize_2048	0.233000 ms
Runtime_BlockedTransform_iter_128_blocksize_8192	0.196000 ms
Runtime_BlockedTransform_iter_256_blocksize_8192	0.198000 ms
Runtime_BlockedTransform_iter_256_blocksize_1024	0.298000 ms
Runtime_BlockedTransform_iter_128_blocksize_2048	0.213000 ms
Runtime_BlockedTransform_iter_256_blocksize_2048	0.255000 ms
Runtime_BlockedTransform_iter_128_blocksize_4096	0.196000 ms
Runtime_BlockedTransform_iter_256_blocksize_4096	0.238000 ms
Runtime_BlockedTransform_iter_512_blocksize_8192	0.233000 ms
Runtime_BlockedTransform_iter_64_blocksize_8192	0.198000 ms
Runtime_BlockedTransform_iter_64_blocksize_4096	0.203000 ms
Runtime_BlockedTransform_iter_64_blocksize_1024	0.282000 ms
Runtime_BlockedTransform_iter_128_blocksize_1024	0.257000 ms
Runtime_BlockedTransform_iter_512_blocksize_1024	0.407000 ms
Runtime_BlockedTransform_iter_512_blocksize_2048	0.256000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	1649.073000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1679.087000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	1706.371000 ms
Runtime_DAGTaskThroughput_SingleTask	1633.964000 ms

Relative perf in group MicroBench (16): cannot calculate

Benchmark	This PR	Relative perf	Change	-
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	791.859000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	796.541000 ms
MicroBench_LocalMem_fp32_4096	0.191000 ms
MicroBench_LocalMem_int32_4096	0.195000 ms
MicroBench_L2_int32_2	0.011000 ms
MicroBench_L2_fp32_8	0.011000 ms
MicroBench_L2_int32_16	0.012000 ms
MicroBench_L2_int32_1	0.015000 ms
MicroBench_L2_int32_8	0.011000 ms
MicroBench_L2_fp32_16	0.012000 ms
MicroBench_L2_fp32_2	0.010000 ms
MicroBench_L2_fp32_1	0.010000 ms
MicroBench_L2_int32_4	0.010000 ms
MicroBench_L2_fp32_4	0.010000 ms
MicroBench_Arith_fp32_512	0.020000 ms
MicroBench_Arith_int32_512	0.041000 ms

Relative perf in group Pattern (14): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Pattern_Reduction_NDRange_fp32	0.022000 ms
Pattern_Reduction_Hierarchical_int32	0.054000 ms
Pattern_Reduction_Hierarchical_int64	0.048000 ms
Pattern_Reduction_Hierarchical_fp32	0.049000 ms
Pattern_Reduction_NDRange_int32	0.027000 ms
Pattern_Reduction_NDRange_int64	0.023000 ms
Pattern_SegmentedReduction_NDRange_int16	0.019000 ms
Pattern_SegmentedReduction_Hierarchical_int64	0.026000 ms
Pattern_SegmentedReduction_NDRange_fp32	0.015000 ms
Pattern_SegmentedReduction_NDRange_int64	0.015000 ms
Pattern_SegmentedReduction_Hierarchical_int32	0.026000 ms
Pattern_SegmentedReduction_Hierarchical_int16	0.025000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	0.028000 ms
Pattern_SegmentedReduction_NDRange_int32	0.016000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	Relative perf	Change	-
ScalarProduct_Hierarchical_fp32	0.063000 ms
ScalarProduct_NDRange_int64	0.038000 ms
ScalarProduct_Hierarchical_int32	0.063000 ms
ScalarProduct_NDRange_int32	0.041000 ms
ScalarProduct_NDRange_fp32	0.034000 ms
ScalarProduct_Hierarchical_int64	0.062000 ms

Relative perf in group USM (17): cannot calculate

Benchmark	This PR	Relative perf	Change	-
USM_Latency_fp32_out_of_order__	24.129000 ms
USM_Latency_fp32_in_order__	16.088000 ms
USM_Allocation_latency_fp32_host	0.001000 ms
USM_Allocation_latency_fp32_device	0.002000 ms
USM_Allocation_latency_fp32_shared	0.029000 ms
USM_Instr_Mix_fp32_shared_1:1mix_no_init_no_prefetch	11.358000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.050000 ms
USM_Instr_Mix_fp32_shared_1:1mix_with_init_no_prefetch	11.685000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	0.892000 ms
USM_Instr_Mix_fp32_shared_1:1mix_no_init_with_prefetch	12.654000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.581000 ms
USM_Instr_Mix_fp32_shared_1:1mix_with_init_with_prefetch	12.662000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.435000 ms
USM_Pinned_Overhead_fp32_DeviceHost_NonPinned_Init_1	0.164000 ms
USM_Pinned_Overhead_fp32_HostDevice_Pinned_Init_1	0.003000 ms
USM_Pinned_Overhead_fp32_DeviceHost_Pinned_Init_1	0.012000 ms
USM_Pinned_Overhead_fp32_HostDevice_NonPinned_Init_1	0.004000 ms

Relative perf in group SYCL2020 (2): cannot calculate

Benchmark	This PR	Relative perf	Change	-
SYCL2020_Accessors_Latency_fp32_in_order__	34.240000 ms
SYCL2020_Accessors_Latency_fp32_out_of_order__	36.447000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	Relative perf	Change	-
VectorAddition_int64	0.016000 ms
VectorAddition_fp32	0.014000 ms
VectorAddition_int32	0.015000 ms

Relative perf in group Polybench (12): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Polybench_2DConvolution	0.196000 ms
Polybench_2mm	1.223000 ms
Polybench_3mm	1.735000 ms
Polybench_Atax	6.858000 ms
Polybench_Bicg	14.444000 ms
Polybench_Correlation	3009.203000 ms
Polybench_Covariance	3016.879000 ms
Polybench_Gesummv	7.179000 ms
Polybench_Gramschmidt	284.794000 ms
Polybench_Mvt	40.186000 ms
Polybench_Syr2k	4316.102000 ms
Polybench_Syrk	207.153000 ms

Relative perf in group ReductionAtomic (4): cannot calculate

Benchmark	This PR	Relative perf	Change	-
ReductionAtomic_fp32	0.020000 ms
ReductionAtomic_int32	0.013000 ms
ReductionAtomic_int64	0.013000 ms
ReductionAtomic_fp64	0.021000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
Kmeans_fp32	16.206000 ms

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
LinearRegressionCoeff_fp32	966.732000 ms

Relative perf in group LinearRegression (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
LinearRegression_fp32	407.906000 ms

Relative perf in group MatmulChain (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
MatmulChain	85.311000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	Relative perf	Change	-
MolecularDynamics	0.029000 ms

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00437926 s
bitcracker - total time for whole calculation: 35.436 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1129 1267 30.6544% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1260 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1150 1256 31.2245% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1261 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1267 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1265 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1252 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1059 1272 28.7537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1076 1258 29.2153% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1262 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1259 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1273 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1086 1271 29.4868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1185 1260 32.1749% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1274 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1155 1258 31.3603% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1262 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1147 1255 31.1431% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1268 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1257 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1201 1266 32.6093% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1256 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1259 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1216 1265 33.0166% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1255 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1268 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1271 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1266 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1108 1261 30.0842% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1258 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1089 1261 29.5683% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1262 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1261 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1258 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1271 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1251 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1276 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1272 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1269 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1267 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1257 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1187 1272 32.2292% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1252 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1259 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1262 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 219.354 ms

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.799980e-01 7.059100e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.348570e-01 8.240000e-01 1.000000e-06
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.357440e-01 8.400740e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.589160e-01 8.940810e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.321030e-01 8.707390e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.344490e-01 8.534110e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.345990e-01 8.433300e-01 1.000000e-06
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.351950e-01 8.655890e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.349460e-01 8.621210e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.355630e-01 8.357160e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.181e+07 1.181e+07 1.181e+07 0.000e+00 100.00
cycleInit 10 3.416e+06 3.416e+06 3.416e+06 0.000e+00 100.00
cycleTracking 10 8.395e+06 8.395e+06 8.395e+06 0.000e+00 100.00
cycleTracking_Kernel 104 5.086e+06 5.086e+06 5.086e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.953e+05 1.953e+05 1.953e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.000e+02 4.000e+02 4.000e+02 0.000e+00 100.00
Figure Of Merit 107.31 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/syrk --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Syrk.csv --size=4096

Output:

['Polybench_Syrk', 'FAIL', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '4096', '0.207153', '0.207153', '0.207153', '0.207153', '0.000000', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

pbalcer

overall lgtm, but it might be easier to merge if we were to split this into two PRs: one that fixes make install (and maybe adds a test?) and the second one that does the benchmarks change.

lslusarczyk · 2024-10-03T13:09:22Z

I've compared level_zero_v2 benchmarks on main and on my PR.

using main: https://github.com/oneapi-src/unified-runtime/actions/runs/11159687520/job/31018579072
using this PR: https://github.com/oneapi-src/unified-runtime/actions/runs/11157704223/job/31012487313
What passes and what fails inside looks similar.
Comparing to full benchmark time gain is small (1 minute), but this will have impact if one wants to run e.g. one specific benchmark multiple times.

Results are also comparable. E.g. "api_overhead_benchmark_sycl/ur SubmitKernel out of order/in order.

using main: 20.153 μs, 24.01 μs, 14.613 μs, 11.627 μs
using this PR: 20.564 μs, 21.222 μs, 14.275 μs, 11.822 μs

So it seems I've not broken anything I think.

lslusarczyk · 2024-10-03T13:43:58Z

overall lgtm, but it might be easier to merge if we were to split this into two PRs: one that fixes make install (and maybe adds a test?) and the second one that does the benchmarks change.

separate PR: #2169
Please check if my way of testing cmake install looks sufficient to you there.

pbalcer · 2024-10-07T08:43:29Z

the cmake change got merged, please rebase.

lslusarczyk · 2024-10-07T08:44:57Z

the cmake change got merged, please rebase.

already rebased

github-actions bot added ci/cd Continuous integration/devliery loader Loader related feature/bug level-zero L0 adapter specific issues cuda CUDA adapter specific issues hip HIP adapter specific issues native-cpu Native CPU adapter specific issues labels Sep 27, 2024

lslusarczyk force-pushed the bench_no_recompile_ur branch 2 times, most recently from e9582f3 to 91cd46a Compare September 27, 2024 22:10

lslusarczyk mentioned this pull request Sep 27, 2024

checking llvm compiling with new UR from 2146 PR intel/llvm#15542

Closed

igchor reviewed Sep 27, 2024

View reviewed changes

scripts/benchmarks/benches/base.py Outdated Show resolved Hide resolved

source/adapters/cuda/CMakeLists.txt Show resolved Hide resolved

test/adapters/level_zero/CMakeLists.txt Show resolved Hide resolved

source/adapters/cuda/CMakeLists.txt Show resolved Hide resolved

lslusarczyk force-pushed the bench_no_recompile_ur branch from 91cd46a to c03d7d0 Compare September 30, 2024 08:22

pbalcer reviewed Sep 30, 2024

View reviewed changes

lslusarczyk force-pushed the bench_no_recompile_ur branch from 7488aa7 to 053d092 Compare October 2, 2024 11:25

lslusarczyk force-pushed the bench_no_recompile_ur branch from 053d092 to 1360515 Compare October 3, 2024 09:49

pbalcer reviewed Oct 3, 2024

View reviewed changes

lslusarczyk mentioned this pull request Oct 3, 2024

Install adapters by cmake #2169

Merged

Compute-benchmarks use already compiled UR

bf46d7f

lslusarczyk force-pushed the bench_no_recompile_ur branch from 1360515 to bf46d7f Compare October 4, 2024 17:06

lslusarczyk marked this pull request as ready for review October 4, 2024 17:11

lslusarczyk requested a review from a team as a code owner October 4, 2024 17:11

pbalcer merged commit cf90cb1 into oneapi-src:main Oct 7, 2024
75 checks passed

Compute-benchmarks use already compiled UR #2146

Compute-benchmarks use already compiled UR #2146

Uh oh!

Conversation

lslusarczyk commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pbalcer Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

pbalcer Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

lslusarczyk Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 2, 2024

Uh oh!

github-actions bot commented Oct 2, 2024

Summary

Performance change in benchmark groups

Details

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

================================== Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0

================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

lslusarczyk commented Sep 27, 2024 •

edited

Loading

lslusarczyk Sep 30, 2024 •

edited

Loading

==================================
Retrieving Info

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!