-
Notifications
You must be signed in to change notification settings - Fork 0
Update README.md #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: bench-workflow
Are you sure you want to change the base?
Conversation
Compute Benchmarks level_zero_v2 run (with params: ): |
Benchmarks level_zero_v2 run (): Summary(Emphasized values are the best results) Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (7)
Relative perf in group Other (17)
Relative perf in group SinKernelGraph (4)
Relative perf in group SubmitGraph (1)
Relative perf in group ExecGraph (1)
Relative perf in group SubmitKernel CPU count (3)
Velocity BenchRelative perf in group Other (5)
SYCL-BenchRelative perf in group Other (53)
DetailsBenchmark details - environment, command...api_overhead_benchmark_l0 SubmitKernel out of orderCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in orderCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel out of orderCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in orderCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 miscellaneous_benchmark_sycl VectorSumCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without eventsCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without eventsCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0 graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0 graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1 graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100 graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100Command:/home/test-user/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100 api_overhead_benchmark_ur SubmitKernel out of order CPU countCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of orderCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order CPU countCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in orderCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU countCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completionCommand:/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Velocity-Bench HashtableCommand:/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerCommand:/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftCommand:/home/test-user/bench_workdir/cudaSift/cudaSift Velocity-Bench QuickSilverCommand:/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Environment Variables:QS_DEVICE=GPU Velocity-Bench Sobel FilterCommand:/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Environment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Runtime_IndependentDAGTaskThroughput_SingleTaskCommand:/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_BasicParallelForCommand:/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_HierarchicalParallelForCommand:/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_NDRangeParallelForCommand:/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_DAGTaskThroughput_SingleTaskCommand:/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_BasicParallelForCommand:/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_HierarchicalParallelForCommand:/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_NDRangeParallelForCommand:/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 MicroBench_HostDeviceBandwidth_1D_H2D_ContiguousCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_ContiguousCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_ContiguousCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_ContiguousCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_ContiguousCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_ContiguousCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_H2D_StridedCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_StridedCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_StridedCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_StridedCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_StridedCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_StridedCommand:/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_LocalMem_int32_4096Command:/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000 MicroBench_LocalMem_fp32_4096Command:/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000 Pattern_Reduction_NDRange_int32Command:/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Pattern_Reduction_Hierarchical_int32Command:/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000 ScalarProduct_NDRange_int32Command:/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_int64Command:/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_fp32Command:/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int32Command:/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int64Command:/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_fp32Command:/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int16Command:/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int32Command:/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int64Command:/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_fp32Command:/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int16Command:/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int32Command:/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int64Command:/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_fp32Command:/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 USM_Allocation_latency_fp32_deviceCommand:/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_hostCommand:/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_sharedCommand:/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetchCommand:/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetchCommand:/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetchCommand:/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetchCommand:/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 VectorAddition_int32Command:/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_int64Command:/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_fp32Command:/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000 Polybench_2mmCommand:/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/2mm.csv --size=512 Polybench_3mmCommand:/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/3mm.csv --size=512 Polybench_AtaxCommand:/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Atax.csv --size=8192 Kmeans_fp32Command:/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000 MolecularDynamicsCommand:/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196 |
84e2ab6
to
e40a827
Compare
This is a first step towards reenabling UR performance testing CI. This introduces the reusable yml workflow and a way to trigger it manually. Here's an example how it looks: pbalcer#2 (comment)
This is a first step towards reenabling UR performance testing CI. This introduces the reusable yml workflow and a way to trigger it manually. Here's an example how it looks: pbalcer/llvm#2 (comment)
Fix for: `Assertion failed: (false && "Architecture or OS not supported"), function CreateRegisterContextForFrame, file /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/elf-core/ThreadElfCore.cpp, line 182. PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace. #0 0x000000080cd857c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13 #1 0x000000080cd85ed4 /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:797:3 #2 0x000000080cd82ae8 llvm::sys::RunSignalHandlers() /usr/src/contrib/llvm-project/llvm/lib/Support/Signals.cpp:104:5 #3 0x000000080cd861f0 SignalHandler /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:403:3 intel#4 0x000000080f159644 handle_signal /usr/src/lib/libthr/thread/thr_sig.c:298:3 `
The mcmodel=tiny memory model is only valid on ARM targets. While trying this on X86 compiler throws an internal error along with stack dump. #125641 This patch resolves the issue. Reduced test case: ``` #include <stdio.h> int main( void ) { printf( "Hello, World!\n" ); return 0; } ``` ``` 0. Program arguments: /opt/compiler-explorer/clang-trunk/bin/clang++ -gdwarf-4 -g -o /app/output.s -fno-verbose-asm -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics -mcmodel=tiny <source> 1. <eof> parser at end of file #0 0x0000000003b10218 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b10218) #1 0x0000000003b0e35c llvm::sys::CleanupOnSignal(unsigned long) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b0e35c) #2 0x0000000003a5dbc3 llvm::CrashRecoveryContext::HandleExit(int) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a5dbc3) #3 0x0000000003b05cfe llvm::sys::Process::Exit(int, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b05cfe) intel#4 0x0000000000d4e3eb LLVMErrorHandler(void*, char const*, bool) cc1_main.cpp:0:0 intel#5 0x0000000003a67c93 llvm::report_fatal_error(llvm::Twine const&, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a67c93) intel#6 0x0000000003a67df8 (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a67df8) intel#7 0x0000000002549148 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x2549148) intel#8 0x00000000025491fc llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x25491fc) intel#9 0x0000000003db74cc clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3db74cc) intel#10 0x0000000004460d95 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4460d95) intel#11 0x00000000060005ec clang::ParseAST(clang::Sema&, bool, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x60005ec) intel#12 0x00000000044614b5 clang::CodeGenAction::ExecuteAction() (/opt/compiler-explorer/clang-trunk/bin/clang+++0x44614b5) intel#13 0x0000000004737121 clang::FrontendAction::Execute() (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4737121) intel#14 0x00000000046b777b clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x46b777b) intel#15 0x00000000048229e3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x48229e3) intel#16 0x0000000000d50621 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd50621) intel#17 0x0000000000d48e2d ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0 intel#18 0x00000000044acc99 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0 intel#19 0x0000000003a5dac3 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a5dac3) intel#20 0x00000000044aceb9 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) Job.cpp:0:0 intel#21 0x00000000044710dd clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/compiler-explorer/clang-trunk/bin/clang+++0x44710dd) intel#22 0x0000000004472071 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4472071) intel#23 0x000000000447c3fc clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x447c3fc) intel#24 0x0000000000d4d2b1 clang_main(int, char**, llvm::ToolContext const&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd4d2b1) intel#25 0x0000000000c12464 main (/opt/compiler-explorer/clang-trunk/bin/clang+++0xc12464) intel#26 0x00007ae43b029d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90) intel#27 0x00007ae43b029e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40) intel#28 0x0000000000d488c5 _start (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd488c5) ``` --------- Co-authored-by: Shashwathi N <[email protected]>
… `getForwardSlice` matchers (#115670) Improve mlir-query tool by implementing `getBackwardSlice` and `getForwardSlice` matchers. As an addition `SetQuery` also needed to be added to enable custom configuration for each query. e.g: `inclusive`, `omitUsesFromAbove`, `omitBlockArguments`. Note: backwardSlice and forwardSlice algoritms are the same as the ones in `mlir/lib/Analysis/SliceAnalysis.cpp` Example of current matcher. The query was made to the file: `mlir/test/mlir-query/complex-test.mlir` ```mlir ./mlir-query /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir -c "match getDefinitions(hasOpName(\"arith.add f\"),2)" Match #1: /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:5:8: %0 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0 : tensor<5x5xf32>) outs(%arg1 : tensor<5x5xf32>) { ^ /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:7:10: note: "root" binds here %2 = arith.addf %in, %in : f32 ^ Match #2: /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:10:16: %collapsed = tensor.collapse_shape %0 [[0, 1]] : tensor<5x5xf32> into tensor<25xf32> ^ /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:13:11: %c2 = arith.constant 2 : index ^ /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:14:18: %extracted = tensor.extract %collapsed[%c2] : tensor<25xf32> ^ /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:15:10: note: "root" binds here %2 = arith.addf %extracted, %extracted : f32 ^ 2 matches. ```
Fixes #123300 What is seen ``` clang-repl> int x = 42; clang-repl> auto capture = [&]() { return x * 2; }; In file included from <<< inputs >>>:1: input_line_4:1:17: error: non-local lambda expression cannot have a capture-default 1 | auto capture = [&]() { return x * 2; }; | ^ zsh: segmentation fault clang-repl --Xcc="-v" (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8) * frame #0: 0x0000000107b4f8b8 libclang-cpp.19.1.dylib`clang::IncrementalParser::CleanUpPTU(clang::PartialTranslationUnit&) + 988 frame #1: 0x0000000107b4f1b4 libclang-cpp.19.1.dylib`clang::IncrementalParser::ParseOrWrapTopLevelDecl() + 416 frame #2: 0x0000000107b4fb94 libclang-cpp.19.1.dylib`clang::IncrementalParser::Parse(llvm::StringRef) + 612 frame #3: 0x0000000107b52fec libclang-cpp.19.1.dylib`clang::Interpreter::ParseAndExecute(llvm::StringRef, clang::Value*) + 180 frame intel#4: 0x0000000100003498 clang-repl`main + 3560 frame intel#5: 0x000000018d39a0e0 dyld`start + 2360 ``` Though the error is justified, we shouldn't be interested in exiting through a segfault in such cases. The issue is that empty named decls weren't being taken care of resulting into this assert https://github.com/llvm/llvm-project/blob/c1a229252617ed58f943bf3f4698bd8204ee0f04/clang/include/clang/AST/DeclarationName.h#L503 Can also be seen when the example is attempted through xeus-cpp-lite. 
With non -O0, the call stack is not preserved, like malloc_shared will be inlined, the call stack would be like ``` #0 in int* sycl::_V1::malloc_host<int>(unsigned long, sycl::_V1::context const&, sycl::_V1::property_list const&, sycl::_V1::detail::code_location const&) /tmp/syclws/include/sycl/usm.hpp:215:27 #1 in ?? (/lib/x86_64-linux-gnu/libc.so.6+0x757867a2a1c9) #2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x757867a2a28a) ``` instead of ``` #0 in int* sycl::_V1::malloc_host<int>(unsigned long, sycl::_V1::context const&, sycl::_V1::property_list const&, sycl::_V1::detail::code_location const&) /tmp/syclws/include/sycl/usm.hpp:215:27 #1 in int* sycl::_V1::malloc_host<int>(unsigned long, sycl::_V1::queue const&, sycl::_V1::property_list const&, sycl::_V1::detail::code_location const&) /tmp/syclws/include/sycl/usm.hpp:223:10 #2 in main /tmp/syclws/llvm/sycl/test-e2e/MemorySanitizer/track-origins/check_host_usm_initialized_on_host.cpp:15:17 #3 in ?? (/lib/x86_64-linux-gnu/libc.so.6+0x7a67f842a1c9) intel#4 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x7a67f842a28a) ``` Also, add env to every %{run} directive to make sure they are not affected by system env.
No description provided.