English | 中文文档
Busy GPU is a professional NVIDIA GPU stress testing tool designed to max out your GPU, achieving maximum power consumption and memory usage.
Perfect for:
- 🔥 Burn-in Testing: Push your GPU to its absolute limits
- 🧪 Stability Validation: Verify GPU stability under extreme load
- 📊 Performance Benchmarking: Measure peak GPU capabilities
- 🌡️ Thermal Testing: Test cooling solutions under maximum stress
- ✅ Dual-Mode Design: Extreme mode (max out GPU) + Light mode (low power)
- ✅ Auto-Adaptive: Automatically selects optimal workload for your GPU architecture
- Tensor Core FP16 GEMM: For SM 7.0+ (Volta, Turing, Ampere, Ada, Hopper)
- FP32 Extreme Compute: For older GPUs (Maxwell, Pascal)
- ✅ Multi-GPU Support: Stress test multiple GPUs simultaneously
- ✅ Flexible Duration: Run for specific time periods or indefinitely
- ✅ Background Mode: Run stress tests in the background with logging
- ✅ Interactive CLI: User-friendly interactive configuration
- ✅ Universal Compatibility: Works on all NVIDIA GPUs from Maxwell to Hopper
Tested on NVIDIA A100 40GB:
| Metric | Value |
|---|---|
| Memory Usage | ~87% (36.9 GB / 42.5 GB) |
| Power Draw | ~82% TDP (329W / 400W) |
| GPU Utilization | 100% |
| Workload | Tensor Core FP16 GEMM |
Tested on NVIDIA A100 40GB:
| Metric | Value |
|---|---|
| Memory Usage | ~34% (14.6 GB / 42.5 GB) |
| Power Draw | ~52% TDP (209W / 400W) |
| GPU Utilization | 100% |
| Workload | Memory-bound compute |
Note: Actual power consumption varies by GPU model, architecture, and TDP. This tool guarantees maximum stress on your specific GPU.
# Clone repository
git clone https://github.com/Metaphorme/busy_gpu.git
cd busy_gpu
# Interactive mode - easiest way
./gpu_burn.sh
# Or use command line directly
./gpu_burn.sh -t 12h -b # Extreme mode, background, 12 hours
./gpu_burn.sh --light -t 30m # Light mode, 30 minutesThe script will automatically compile if needed and provides:
- Interactive configuration menu
- Background execution with logging
- Real-time GPU monitoring
- PID file management
# Compile (one-time only)
make
# Extreme mode - max out all GPUs!
./busy_gpu
# Extreme mode - specify GPUs and duration
./busy_gpu -d 0,1 -t 1h
# Light mode - low power operation
./busy_gpu --light -d 0 -t 30mUsage: ./busy_gpu [OPTIONS]
MODES:
Default (Extreme) Maximum stress (~90-95% TDP, 85% VRAM)
Auto-selects best workload for your GPU:
- Tensor Core FP16 GEMM (SM 7.0+)
- FP32 Extreme Compute (older GPUs)
--light Light mode (~30-40% TDP, 35% VRAM)
For background tasks or testing
OPTIONS:
-t, --time <dur> Duration (e.g., 30s, 5m, 2h, 1d), default: forever
-d, --devices <ids> GPU IDs (e.g., 0,1,2), default: all GPUs
-h, --help Show this help
-v, --version Show version
EXAMPLES:
./busy_gpu # Extreme mode on all GPUs (max out!)
./busy_gpu --light -t 30m # Light mode for 30 minutes
./busy_gpu -d 0,2 # Extreme mode on GPU 0 and 2
Usage: ./gpu_burn.sh [OPTIONS]
OPTIONS:
--light Light mode
-g, --gpus <ids> GPU IDs (e.g., 0,1,2)
-t, --time <dur> Duration (e.g., 30s, 5m, 2h, 1d)
-b, --background Run in background
-h, --help Show this help
--status Check running status
--stop Stop background instance
EXAMPLES:
./gpu_burn.sh # Interactive mode
./gpu_burn.sh -t 30m # Extreme mode for 30 minutes
./gpu_burn.sh --light -t 12h -b # Light mode, background, 12 hours
./gpu_burn.sh --status # Check status
./gpu_burn.sh --stop # Stop background instance# Max out all GPUs for 1 hour
./busy_gpu -t 1h# Run in background for 24 hours with logging
./gpu_burn.sh -t 24h -b
# Check status
./gpu_burn.sh --status
# View logs
tail -f logs/gpu_burn_*.log
tail -f logs/gpu_monitor_*.log# Test only GPU 0 and GPU 2
./busy_gpu -d 0,2 -t 30m# Light mode for long-term background tasks
./busy_gpu --light -t 12h- CUDA Toolkit: 9.0 or higher
- GCC/G++: C++11 support required
- NVIDIA Driver: Compatible with your GPU
- Operating System: Linux (tested on Ubuntu 18.04+, CentOS 7+)
| Architecture | Compute Capability | Example GPUs | Workload |
|---|---|---|---|
| Hopper | SM 9.0 | H100 | Tensor Core FP16 GEMM |
| Ada Lovelace | SM 8.9 | RTX 4090, RTX 4080 | Tensor Core FP16 GEMM |
| Ampere | SM 8.0-8.6 | A100, RTX 3090, RTX 3080 | Tensor Core FP16 GEMM |
| Turing | SM 7.5 | RTX 2080 Ti, T4 | Tensor Core FP16 GEMM |
| Volta | SM 7.0 | V100, Titan V | Tensor Core FP16 GEMM |
| Pascal | SM 6.0-6.2 | GTX 1080 Ti, P100 | FP32 Extreme Compute |
| Maxwell | SM 5.0-5.3 | GTX 980, GTX 750 Ti | FP32 Extreme Compute |
This tool is designed to work across diverse hardware configurations:
| Scenario | How It's Handled |
|---|---|
| Mixed GPU architectures | Each GPU is detected independently at runtime; workload is selected per-GPU |
| Different VRAM sizes | Memory allocation is calculated dynamically based on actual available memory |
| VRAM partially occupied | Queries free memory (not total), automatically adjusts allocation |
| Low system RAM | No dependency on host memory; all computation runs on GPU |
| Multi-GPU race conditions | 100ms staggered startup to avoid memory allocation conflicts |
Supported GPU Range:
- Consumer GPUs: GTX 750 Ti to RTX 4090 (2014 onwards)
- Data Center GPUs: P100, V100, T4, A100, A10, H100, etc.
- Coverage: ~95% of NVIDIA GPUs currently in use
| Limitation | Details |
|---|---|
| Kepler and older | GTX 780, GTX 680, etc. (SM < 5.0) are not supported |
| Windows | Linux only; Windows support is planned |
| Non-NVIDIA GPUs | AMD and Intel GPUs are not supported |
| CUDA < 9.0 | Requires CUDA Toolkit 9.0 or higher |
Mixed architecture system:
GPU 0: RTX 4090 (SM 8.9, 24GB) → Tensor Core FP16 GEMM, 10240x10240 matrix, 4 streams
GPU 1: GTX 1080 Ti (SM 6.1, 11GB) → FP32 Extreme Compute
GPU 2: V100 (SM 7.0, 32GB) → Tensor Core FP16 GEMM, 8192x8192 matrix, 3 streams
Memory-constrained GPUs:
H100 80GB → 10240x10240 matrix, 4 streams
RTX 3080 10GB → 8192x8192 matrix, 2 streams
GTX 1650 4GB → 4096x4096 matrix, 1 stream
- Auto-detect GPU Architecture: Read Compute Capability (SM version)
- Select Optimal Workload:
- SM 7.0+: Use Tensor Core FP16 GEMM (maximum power)
- SM < 7.0: Use FP32 extreme compute (maximize FP32 throughput)
- Allocate Memory: Allocate 85% of available VRAM
- Continuous Execution: Loop compute until time limit or user interrupt
- Memory-bound Workload: Heavy memory access + minimal compute
- Allocate Memory: Allocate 35% of available VRAM
- Low Power Operation: Power draw ~30-40% TDP
A: This is normal. Most GPUs rarely reach 100% TDP under real workloads. This tool is optimized to approach theoretical limits (90-95% TDP).
A: No. Power consumption depends on GPU architecture, TDP, and workload characteristics. This tool guarantees maximum stress on your specific GPU, but absolute power values will vary.
A: After extensive testing, we found users really need:
- Extreme Mode: Max out GPU for burn-in testing
- Light Mode: Low power for background tasks
Multi-level designs add complexity with limited practical value.
A: Currently Linux only. Windows support is planned.
A: No. This tool only runs intensive compute workloads without overclocking or modifying hardware settings. However, please ensure:
- Adequate cooling
- Sufficient power supply
- Monitor temperature (recommended < 85°C)
# Run in another terminal
watch -n 1 nvidia-smi# Start background run
./gpu_burn.sh -t 12h -b
# View program log
tail -f logs/gpu_burn_*.log
# View GPU monitoring log
tail -f logs/gpu_monitor_*.logIssues and Pull Requests are welcome!
MIT License - see LICENSE file for details
Thanks to all contributors and users for their support!