-
Notifications
You must be signed in to change notification settings - Fork 124
Compute-benchmarks use already compiled UR #2146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e9582f3 to
91cd46a
Compare
91cd46a to
c03d7d0
Compare
| def __init__(self, directory): | ||
| self.directory = directory | ||
| self.adapter_path = os.path.join(options.ur_dir, 'build', 'lib', f"libur_adapter_{options.ur_adapter_name}.so") | ||
| for libs_dir_name in ['lib', 'lib64']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the make install fix is fine, but, in this case, not doing the install makes this deterministic - the file will be located in the lib directory.
Why the change to do a make install?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make 'get lib path' into a generic function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the make install fix is fine, but, in this case, not doing the install makes this deterministic - the file will be located in the lib directory. Why the change to do a
make install?
Not doing install will fail benchmark in an assert in the next line with message like could not find adapter file /home/lslusarczyk/src2/ur_install/lib64/libur_adapter_level_zero.so (and in similar lib paths) which looks what I would like to se, that is sample full path which was searched and still being a short message
"make install" used as cleanest way of using binaries and includes of already built UR
can you make 'get lib path' into a generic function?
yes, separate function added
7488aa7 to
053d092
Compare
|
Compute Benchmarks level_zero run (with params: ): |
|
Compute Benchmarks level_zero run (): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group api (6): cannot calculate
Relative perf in group memory (4): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group Velocity-Bench (5): cannot calculate
Relative perf in group Runtime (24): cannot calculate
Relative perf in group MicroBench (15): cannot calculate
Relative perf in group Pattern (14): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group SYCL2020 (2): cannot calculate
Relative perf in group USM (17): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (12): cannot calculate
Relative perf in group ReductionAtomic (4): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Relative perf in group LinearRegression (1): cannot calculate
Relative perf in group MatmulChain (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
DetailsBenchmark details - environment, command, output...api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type miscellaneous_benchmark_sycl VectorSumEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 Output:TestCase,Mean,Median,StdDev,Min,Max,Type Velocity-Bench HashtableEnvironment Variables:Command:/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.371992 s Velocity-Bench BitcrackerEnvironment Variables:Command:/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
|
Compute Benchmarks level_zero_v2 run (with params: ): |
|
Compute Benchmarks level_zero_v2 run (): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group api (6): cannot calculate
Relative perf in group memory (4): cannot calculate
Relative perf in group Velocity-Bench (5): cannot calculate
Relative perf in group Runtime (20): cannot calculate
Relative perf in group MicroBench (16): cannot calculate
Relative perf in group Pattern (14): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (17): cannot calculate
Relative perf in group SYCL2020 (2): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (12): cannot calculate
Relative perf in group ReductionAtomic (4): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Relative perf in group LinearRegression (1): cannot calculate
Relative perf in group MatmulChain (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
DetailsBenchmark details - environment, command, output...api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type Velocity-Bench HashtableEnvironment Variables:Command:/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.369433 s Velocity-Bench BitcrackerEnvironment Variables:Command:/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
053d092 to
1360515
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, but it might be easier to merge if we were to split this into two PRs: one that fixes make install (and maybe adds a test?) and the second one that does the benchmarks change.
|
I've compared level_zero_v2 benchmarks on main and on my PR.
Results are also comparable. E.g. "api_overhead_benchmark_sycl/ur SubmitKernel out of order/in order.
So it seems I've not broken anything I think. |
separate PR: #2169 |
1360515 to
bf46d7f
Compare
|
the cmake change got merged, please rebase. |
already rebased |
Utilize support for using UR binary through find_package added in intel/compute-benchmarks#21
Before this change compute-benchmarks always builds UR from sources which is unnecessary since we already build UR as part of SYCL on UR CI.
Add installing adapters by "cmake --install" which is needed by new benchmarks code.