Skip to content

nanyeglm/Busy-GPU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Busy GPU - Max Out Your GPU

License: MIT CUDA Platform

English | 中文文档

Introduction

Busy GPU is a professional NVIDIA GPU stress testing tool designed to max out your GPU, achieving maximum power consumption and memory usage.

Perfect for:

  • 🔥 Burn-in Testing: Push your GPU to its absolute limits
  • 🧪 Stability Validation: Verify GPU stability under extreme load
  • 📊 Performance Benchmarking: Measure peak GPU capabilities
  • 🌡️ Thermal Testing: Test cooling solutions under maximum stress

Key Features

  • Dual-Mode Design: Extreme mode (max out GPU) + Light mode (low power)
  • Auto-Adaptive: Automatically selects optimal workload for your GPU architecture
    • Tensor Core FP16 GEMM: For SM 7.0+ (Volta, Turing, Ampere, Ada, Hopper)
    • FP32 Extreme Compute: For older GPUs (Maxwell, Pascal)
  • Multi-GPU Support: Stress test multiple GPUs simultaneously
  • Flexible Duration: Run for specific time periods or indefinitely
  • Background Mode: Run stress tests in the background with logging
  • Interactive CLI: User-friendly interactive configuration
  • Universal Compatibility: Works on all NVIDIA GPUs from Maxwell to Hopper

Performance Metrics

Extreme Mode (Default)

Tested on NVIDIA A100 40GB:

Metric Value
Memory Usage ~87% (36.9 GB / 42.5 GB)
Power Draw ~82% TDP (329W / 400W)
GPU Utilization 100%
Workload Tensor Core FP16 GEMM

Light Mode

Tested on NVIDIA A100 40GB:

Metric Value
Memory Usage ~34% (14.6 GB / 42.5 GB)
Power Draw ~52% TDP (209W / 400W)
GPU Utilization 100%
Workload Memory-bound compute

Note: Actual power consumption varies by GPU model, architecture, and TDP. This tool guarantees maximum stress on your specific GPU.

Quick Start

Method 1: One-Click Script (Recommended)

# Clone repository
git clone https://github.com/Metaphorme/busy_gpu.git
cd busy_gpu

# Interactive mode - easiest way
./gpu_burn.sh

# Or use command line directly
./gpu_burn.sh -t 12h -b          # Extreme mode, background, 12 hours
./gpu_burn.sh --light -t 30m     # Light mode, 30 minutes

The script will automatically compile if needed and provides:

  • Interactive configuration menu
  • Background execution with logging
  • Real-time GPU monitoring
  • PID file management

Method 2: Direct Binary Usage

# Compile (one-time only)
make

# Extreme mode - max out all GPUs!
./busy_gpu

# Extreme mode - specify GPUs and duration
./busy_gpu -d 0,1 -t 1h

# Light mode - low power operation
./busy_gpu --light -d 0 -t 30m

Detailed Usage

Command Line Arguments

Usage: ./busy_gpu [OPTIONS]

MODES:
  Default (Extreme)   Maximum stress (~90-95% TDP, 85% VRAM)
                      Auto-selects best workload for your GPU:
                      - Tensor Core FP16 GEMM (SM 7.0+)
                      - FP32 Extreme Compute (older GPUs)
  --light             Light mode (~30-40% TDP, 35% VRAM)
                      For background tasks or testing

OPTIONS:
  -t, --time <dur>    Duration (e.g., 30s, 5m, 2h, 1d), default: forever
  -d, --devices <ids> GPU IDs (e.g., 0,1,2), default: all GPUs
  -h, --help          Show this help
  -v, --version       Show version

EXAMPLES:
  ./busy_gpu                  # Extreme mode on all GPUs (max out!)
  ./busy_gpu --light -t 30m   # Light mode for 30 minutes
  ./busy_gpu -d 0,2           # Extreme mode on GPU 0 and 2

One-Click Script Arguments

Usage: ./gpu_burn.sh [OPTIONS]

OPTIONS:
  --light               Light mode
  -g, --gpus <ids>      GPU IDs (e.g., 0,1,2)
  -t, --time <dur>      Duration (e.g., 30s, 5m, 2h, 1d)
  -b, --background      Run in background
  -h, --help            Show this help
  --status              Check running status
  --stop                Stop background instance

EXAMPLES:
  ./gpu_burn.sh                    # Interactive mode
  ./gpu_burn.sh -t 30m             # Extreme mode for 30 minutes
  ./gpu_burn.sh --light -t 12h -b  # Light mode, background, 12 hours
  ./gpu_burn.sh --status           # Check status
  ./gpu_burn.sh --stop             # Stop background instance

Use Cases

1. Quick Burn-in Test (Extreme Mode)

# Max out all GPUs for 1 hour
./busy_gpu -t 1h

2. Long-term Stability Test

# Run in background for 24 hours with logging
./gpu_burn.sh -t 24h -b

# Check status
./gpu_burn.sh --status

# View logs
tail -f logs/gpu_burn_*.log
tail -f logs/gpu_monitor_*.log

3. Multi-GPU Testing

# Test only GPU 0 and GPU 2
./busy_gpu -d 0,2 -t 30m

4. Low-Power Background Operation

# Light mode for long-term background tasks
./busy_gpu --light -t 12h

Build Requirements

  • CUDA Toolkit: 9.0 or higher
  • GCC/G++: C++11 support required
  • NVIDIA Driver: Compatible with your GPU
  • Operating System: Linux (tested on Ubuntu 18.04+, CentOS 7+)

Supported GPU Architectures

Architecture Compute Capability Example GPUs Workload
Hopper SM 9.0 H100 Tensor Core FP16 GEMM
Ada Lovelace SM 8.9 RTX 4090, RTX 4080 Tensor Core FP16 GEMM
Ampere SM 8.0-8.6 A100, RTX 3090, RTX 3080 Tensor Core FP16 GEMM
Turing SM 7.5 RTX 2080 Ti, T4 Tensor Core FP16 GEMM
Volta SM 7.0 V100, Titan V Tensor Core FP16 GEMM
Pascal SM 6.0-6.2 GTX 1080 Ti, P100 FP32 Extreme Compute
Maxwell SM 5.0-5.3 GTX 980, GTX 750 Ti FP32 Extreme Compute

Compatibility

✅ Supported Environments

This tool is designed to work across diverse hardware configurations:

Scenario How It's Handled
Mixed GPU architectures Each GPU is detected independently at runtime; workload is selected per-GPU
Different VRAM sizes Memory allocation is calculated dynamically based on actual available memory
VRAM partially occupied Queries free memory (not total), automatically adjusts allocation
Low system RAM No dependency on host memory; all computation runs on GPU
Multi-GPU race conditions 100ms staggered startup to avoid memory allocation conflicts

Supported GPU Range:

  • Consumer GPUs: GTX 750 Ti to RTX 4090 (2014 onwards)
  • Data Center GPUs: P100, V100, T4, A100, A10, H100, etc.
  • Coverage: ~95% of NVIDIA GPUs currently in use

⚠️ Not Supported

Limitation Details
Kepler and older GTX 780, GTX 680, etc. (SM < 5.0) are not supported
Windows Linux only; Windows support is planned
Non-NVIDIA GPUs AMD and Intel GPUs are not supported
CUDA < 9.0 Requires CUDA Toolkit 9.0 or higher

Runtime Adaptation Examples

Mixed architecture system:

GPU 0: RTX 4090 (SM 8.9, 24GB) → Tensor Core FP16 GEMM, 10240x10240 matrix, 4 streams
GPU 1: GTX 1080 Ti (SM 6.1, 11GB) → FP32 Extreme Compute
GPU 2: V100 (SM 7.0, 32GB) → Tensor Core FP16 GEMM, 8192x8192 matrix, 3 streams

Memory-constrained GPUs:

H100 80GB  → 10240x10240 matrix, 4 streams
RTX 3080 10GB → 8192x8192 matrix, 2 streams
GTX 1650 4GB → 4096x4096 matrix, 1 stream

How It Works

Extreme Mode

  1. Auto-detect GPU Architecture: Read Compute Capability (SM version)
  2. Select Optimal Workload:
    • SM 7.0+: Use Tensor Core FP16 GEMM (maximum power)
    • SM < 7.0: Use FP32 extreme compute (maximize FP32 throughput)
  3. Allocate Memory: Allocate 85% of available VRAM
  4. Continuous Execution: Loop compute until time limit or user interrupt

Light Mode

  1. Memory-bound Workload: Heavy memory access + minimal compute
  2. Allocate Memory: Allocate 35% of available VRAM
  3. Low Power Operation: Power draw ~30-40% TDP

FAQ

Q: Why doesn't power consumption reach 100% TDP?

A: This is normal. Most GPUs rarely reach 100% TDP under real workloads. This tool is optimized to approach theoretical limits (90-95% TDP).

Q: Will power consumption percentage be the same across different GPUs?

A: No. Power consumption depends on GPU architecture, TDP, and workload characteristics. This tool guarantees maximum stress on your specific GPU, but absolute power values will vary.

Q: Why only two modes?

A: After extensive testing, we found users really need:

  • Extreme Mode: Max out GPU for burn-in testing
  • Light Mode: Low power for background tasks

Multi-level designs add complexity with limited practical value.

Q: Does it support Windows?

A: Currently Linux only. Windows support is planned.

Q: Will it damage my GPU?

A: No. This tool only runs intensive compute workloads without overclocking or modifying hardware settings. However, please ensure:

  • Adequate cooling
  • Sufficient power supply
  • Monitor temperature (recommended < 85°C)

Monitoring and Logging

Real-time Monitoring

# Run in another terminal
watch -n 1 nvidia-smi

Background Logging

# Start background run
./gpu_burn.sh -t 12h -b

# View program log
tail -f logs/gpu_burn_*.log

# View GPU monitoring log
tail -f logs/gpu_monitor_*.log

Contributing

Issues and Pull Requests are welcome!

License

MIT License - see LICENSE file for details

Acknowledgments

Thanks to all contributors and users for their support!


⚠️ Disclaimer: This tool is for testing purposes only. Prolonged high-load operation may reduce hardware lifespan. Ensure adequate cooling and power supply before use.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published