GPU Compute Driver Bench is a comprehensive performance evaluation suite focused on assessing GPU compute driver performance from practical development and deployment perspectives. It supports both Moore Threads MUSA drivers and CUDA-compatible GPU drivers, enabling fair and repeatable cross-platform evaluation.
🧠 Realistic Workloads Covers diverse compute and memory scenarios closely aligned with real-world usage patterns.
⚙️ Multi-Dimensional Driver Evaluation Evaluates driver performance, resource management, and execution efficiency across multiple dimensions.
📊 Standardized Metrics & Baselines Provides standardized metrics and baselines for comparing hardware and software optimizations.
🔬 Granular & Holistic Analysis Enables both fine-grained subsystem testing and holistic, end-to-end performance assessment.
🔁 Cross-Platform Compatibility Supports cross-platform evaluation with MUSA and CUDA-compatible GPU drivers.
📈 Automated Performance Scoring Provides a scoring system for automated performance regression tracking across driver and hardware versions.
-
common
This directory contains the core library based on the Celero framework, providing benchmark infrastructure including test fixtures, timing utilities, result collection, and printing facilities. Both header and source files are included to expose interfaces and implementation for convenient user access.
-
schedule
Benchmarks focused on kernel execution and scheduling performance:
- Kernel execution time measurements for different types of kernels.
- Evaluation of memcpy and kernel co-execution scenarios to analyze dependency resolution efficiency.
- Event synchronization efficiency between operations in different streams.
- First kernel launch latency to assess module loading performance.
- Graph optimization benefits for small kernel launch performance.
- Kernel gap analysis and multi-stream execution concurrency.
- Module loading and function retrieval performance.
-
memory
Comprehensive memory operation benchmarks:
- Memory allocation/deallocation performance across sizes from 1B to 4GB.
- Memory reuse efficiency after releasing and reapplying same-sized allocations.
- Bandwidth measurements for 1D aligned/unaligned, 2D, 3D, and array memory copies.
- Host and device memory read/write performance.
- Pinned memory and registered host memory copy performance.
- Inter-process memory handle operations.
- Memory set/clear performance benchmarks.
-
multicards
Multi-GPU benchmarks for evaluating cross-device performance:
- P2P memory copy bandwidth and latency.
- Cross-device memory set operations.
- Complex multi-card kernel launching scenarios.
-
resource
Resource management benchmarks:
- Event management performance.
- Stream management and concurrency.
-
scripts
Utility scripts for automation and result processing:
- Automated test execution (autorun.py).
- Performance scoring calculation (calculateScoreOfSuit.py).
- Result visualization tools (csv2pngs.py, visualize_results_demo.py).
- Code porting utilities between CUDA and MUSA (porting2cuda.sh, porting2musa.sh).
-
Run
sudo apt install libcpuid-devto install libcpuid for getting some cpu infos. -
Run
sudo apt-get install libeigen3-devto install Eigen. -
Run
sudo apt-get install hwlocto intall topology graph tool. -
Run
sudo apt install libblas-dev libopenblas-base libopenblas-devto install blas lib. -
Run the script
install.shin gpu-compute-driver-bench. add-m(musa) /-n(cuda)[!IMPORTANT]
-
Make sure that the MUSA Toolkits or CUDA Toolkits are properly installed.
-
We provide a script under
scripts/for one-step conversion between CUDA and MUSA code. It depends on themusifytool and is ready to use.
-
-
You can get the usage with -h eg.
./install -h, the the message below will be showed.[!TIP]
Some device functions have been precompiled. If you want to run on other architectures, please modify
gpu-compute-driver-bench/schedule/elf/gen.sh, rerun./gen.sh, and specify the-Roption when running./install.sh.Usage: ./install.sh [OPTIONS] Options: -R : Rebuild (clean build directory and rebuild) -j N : Number of parallel jobs for make (default: j12) -h : Display this help message -m : Enable MCC Compiler (default: OFF) -n : Enable NVCC Compiler (default: OFF) -d : Enable Debug Mode (default: OFF)
- You can run each case by the command like ./programeName [-t tableName.csv] [b]
[opt]means you can choose it or not[-t tableName.csv]means save the result to table if you need.[b]means show the basic info of your environment if you need.[-h]is also provided for printing the help messages.
-
Run the cases and collect the results.
$ cd scripts/ $ python3 autorun.py -h usage: autorun.py [-h] [--result RESULT] [--suits {memoryOp,mulStreams,graphAndSchedule} [{memoryOp,mulStreams,graphAndSchedule} ...]] Run executables and save results. options: -h, --help show this help message and exit --result RESULT The result directory, default is projectPath/result --suits {memoryOp,mulStreams,graphAndSchedule} [{memoryOp,mulStreams,graphAndSchedule} ...] The test suits to run, choose from ['memoryOp', 'mulStreams', 'graphAndSchedule']. Default is all.
-
Calculate the score.
$ cd scripts/ $ python3 calculateScoreOfSuit.py -h usage: calculateScoreOfSuit.py [-h] [--base BASE] [--test TEST] [--score SCORE] [--config CONFIG] Calculate the score of test cases. options: -h, --help show this help message and exit --base BASE The basic result table fold, default is ../baseline/ --test TEST The test result table fold, default is ../result/ --score SCORE The score path, default is projectPath/score
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This project includes code from the Celero project (https://github.com/DigitalInBlue/Celero), which is also licensed under the Apache License 2.0. Copyright 2015-2023 John Farrier
Additional third-party libraries used in this project:
- libcpuid: Copyright 2008-2013 Veselin Georgiev, licensed under BSD License
- Eigen: Copyright (C) 2008 Gael Guennebaud, licensed under MPL2
- hwloc: Copyright 2006-2021 The University of Tennessee, licensed under BSD License
- OpenBLAS: Copyright 2015-2021 OpenBLAS project, licensed under BSD License