This guide helps resolve "No GPU detected, running on CPU" errors.
Run the diagnostic tool to identify your issue:
python scripts/check_gpu.pyThis will check your PyTorch installation, GPU availability, and environment configuration.
Symptoms:
- You have an AMD GPU (RX 6000/7000/9000 series)
- ROCm is installed
- Still getting "No GPU detected"
Solution:
The HSA_OVERRIDE_GFX_VERSION environment variable is required:
Linux/macOS:
export HSA_OVERRIDE_GFX_VERSION=11.0.0 # For RX 7900 XT/XTX, RX 9070 XT
export HSA_OVERRIDE_GFX_VERSION=11.0.1 # For RX 7800 XT, RX 7700 XT
export HSA_OVERRIDE_GFX_VERSION=11.0.2 # For RX 7600Windows:
set HSA_OVERRIDE_GFX_VERSION=11.0.0Or use the provided launcher scripts which set this automatically:
start_gradio_ui_rocm.bat
start_api_server_rocm.batexport HSA_OVERRIDE_GFX_VERSION=10.3.0 # Linux/macOS
set HSA_OVERRIDE_GFX_VERSION=10.3.0 # Windows# Check if ROCm can see your GPU
rocm-smi
# Check PyTorch ROCm build
python -c "import torch; print(f'ROCm: {torch.version.hip}')"Symptoms:
- Diagnostic shows "Build type: CPU-only"
Solution:
You need to reinstall PyTorch with GPU support.
Windows (ROCm 7.2):
Follow the detailed instructions in requirements-rocm.txt:
# 1. Install ROCm SDK components (see requirements-rocm.txt for full URLs)
pip install --no-cache-dir [ROCm SDK wheels...]
# 2. Install PyTorch for ROCm
pip install --no-cache-dir [PyTorch ROCm wheel...]
# 3. Install dependencies
pip install -r requirements-rocm.txtLinux (ROCm 6.0+):
pip install torch --index-url https://download.pytorch.org/whl/rocm6.0
pip install -r requirements-rocm-linux.txt# For CUDA 12.1 (check PyTorch website for latest version)
pip install torch --index-url https://download.pytorch.org/whl/cu121
# Or for CUDA 12.4+:
# pip install torch --index-url https://download.pytorch.org/whl/cu124Note: Check https://pytorch.org/get-started/locally/ for the latest CUDA version supported by PyTorch.
Symptoms:
- You have an NVIDIA GPU
- CUDA is installed
- Still getting "No GPU detected"
Solution:
-
Check NVIDIA drivers:
nvidia-smi
If this fails, install/update NVIDIA drivers from: https://www.nvidia.com/download/index.aspx
-
Check CUDA version compatibility:
The CUDA version in your PyTorch build must be compatible with your driver.
Check PyTorch CUDA version:
python -c "import torch; print(f'CUDA: {torch.version.cuda}')"Check driver CUDA version:
nvidia-smi # Look for "CUDA Version: X.X" -
Reinstall PyTorch if needed:
pip uninstall torch torchvision torchaudio # Check https://pytorch.org/get-started/locally/ for the latest CUDA version pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Symptoms:
- Running in WSL2 (Windows Subsystem for Linux)
- GPU not detected
Solution:
For NVIDIA GPUs in WSL2, you need CUDA on WSL2:
- Install NVIDIA drivers on Windows (not in WSL2)
- Install CUDA toolkit in WSL2
- Follow: https://docs.nvidia.com/cuda/wsl-user-guide/index.html
For AMD GPUs, ROCm support in WSL2 is limited. Consider:
- Running on native Linux
- Using Windows with
start_gradio_ui_rocm.bat/start_api_server_rocm.bat
# Linux/macOS
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export MIOPEN_FIND_MODE=FAST
# Windows (or use start_gradio_ui_rocm.bat / start_api_server_rocm.bat)
set HSA_OVERRIDE_GFX_VERSION=11.0.0
set MIOPEN_FIND_MODE=FASTSame as RX 9070 XT above.
# Linux/macOS
export HSA_OVERRIDE_GFX_VERSION=10.3.0
# Windows
set HSA_OVERRIDE_GFX_VERSION=10.3.0- ROCm Linux Setup: See
docs/en/ACE-Step1.5-Rocm-Manual-Linux.md - ROCm Windows Setup: See
requirements-rocm.txt - GPU Tiers: See
docs/en/GPU_COMPATIBILITY.md - General Installation: See
README.md
If none of the above solutions work:
-
Run the diagnostic tool and save the output:
python scripts/check_gpu.py > gpu_diagnostic.txt -
Open an issue on GitHub with:
- The diagnostic output
- Your GPU model
- Your OS (Windows/Linux/macOS)
- ROCm/CUDA version installed
| Variable | Purpose | Example |
|---|---|---|
HSA_OVERRIDE_GFX_VERSION |
Override GPU architecture | 11.0.0 (RDNA3), 10.3.0 (RDNA2) |
MIOPEN_FIND_MODE |
MIOpen kernel selection mode | FAST (recommended) |
TORCH_COMPILE_BACKEND |
PyTorch compilation backend | eager (ROCm Windows) |
ACESTEP_LM_BACKEND |
Language model backend | pt (recommended for ROCm) |
| Variable | Purpose | Example |
|---|---|---|
CUDA_VISIBLE_DEVICES |
Select which GPU to use | 0 (first GPU) |
| Variable | Purpose | Example |
|---|---|---|
MAX_CUDA_VRAM |
Override detected VRAM for tier simulation (also enforces hard VRAM cap via set_per_process_memory_fraction) |
8 (simulate 8GB GPU) |
ACESTEP_VAE_ON_CPU |
Force VAE decode on CPU to save VRAM | 1 (enable) |
Note on
MAX_CUDA_VRAM: When set, this variable not only changes the tier detection logic but also callstorch.cuda.set_per_process_memory_fraction()to enforce a hard VRAM limit. This means OOM errors during simulation are realistic and reflect actual behavior on GPUs with that amount of VRAM. See GPU_COMPATIBILITY.md for the full tier table.
Symptoms:
- Cannot use LoRA on 24GB VRAM GPUs (e.g., RTX 4090)
- VRAM usage spikes to 25-30GB when loading LoRA
- Out of memory errors during LoRA inference
Status: ✅ FIXED (as of commit 731fabd)
Solution:
This issue was caused by inefficient memory management in the LoRA lifecycle code. The fix replaces memory-heavy deepcopy operations with efficient state_dict backups stored on CPU.
Memory Usage:
- Before fix: 24-33GB VRAM (exceeds 24GB cards)
- After fix: 14-18GB VRAM (fits on 24GB cards)
- Savings: ~10-15GB VRAM per LoRA operation
What Changed:
- LoRA base model backup now stored on CPU (not GPU)
- Uses
state_dict(weights only) instead ofdeepcopy(full model) - Added memory diagnostics logging
Verify the Fix:
Run the validation script to confirm:
python scripts/validate_lora_memory.pyExpected output:
✓ No deepcopy found in load_lora/unload_lora
✓ Using state_dict backup (memory-efficient)
✓ Backing up to CPU (saves VRAM)
✓ Memory diagnostics enabled
Additional Information:
- Technical details:
docs/lora_memory_optimization.md - Full fix summary:
docs/FIX_SUMMARY.md - Unit tests:
tests/test_lora_lifecycle_memory.py