|
| 1 | +# ROCm Setup Guide for Stable Diffusion WebUI |
| 2 | + |
| 3 | +This guide helps you set up and optimize Stable Diffusion WebUI for AMD GPUs using ROCm 6.2. |
| 4 | + |
| 5 | +## Quick Start |
| 6 | + |
| 7 | +### 1. Copy the Optimized Launch Configuration |
| 8 | + |
| 9 | +```bash |
| 10 | +cp webui-user-rocm62.sh webui-user.sh |
| 11 | +``` |
| 12 | + |
| 13 | +### 2. Launch the WebUI |
| 14 | + |
| 15 | +```bash |
| 16 | +./webui.sh |
| 17 | +``` |
| 18 | + |
| 19 | +The launch script will automatically: |
| 20 | +- Install PyTorch with ROCm 6.2 support |
| 21 | +- Configure optimal VRAM settings for 16GB GPUs |
| 22 | +- Set up memory management to prevent fragmentation |
| 23 | + |
| 24 | +### 3. Configure WebUI Settings |
| 25 | + |
| 26 | +After the WebUI starts, navigate to **Settings → Optimizations** and configure: |
| 27 | + |
| 28 | +- **Cross attention optimization:** `Doggettx` (default) |
| 29 | +- **Enable quantization in K samplers:** ✓ Enabled |
| 30 | +- **Token merging ratio:** `0.5` |
| 31 | + |
| 32 | +See the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md) for detailed configuration instructions. |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## System Requirements |
| 37 | + |
| 38 | +### Supported AMD GPUs |
| 39 | + |
| 40 | +- **RX 6000 Series** (Navi 2): RX 6700 XT, 6800, 6800 XT, 6900 XT |
| 41 | +- **RX 7000 Series** (Navi 3): RX 7600, 7700 XT, 7800 XT, 7900 XT, 7900 XTX |
| 42 | +- **RX 5000 Series** (Navi 1): RX 5700 XT (with limitations) |
| 43 | + |
| 44 | +### Recommended VRAM |
| 45 | + |
| 46 | +- **Minimum:** 8GB VRAM |
| 47 | +- **Recommended:** 16GB VRAM |
| 48 | +- **Optimal:** 24GB VRAM |
| 49 | + |
| 50 | +### Software Requirements |
| 51 | + |
| 52 | +- **ROCm:** 6.2 or newer |
| 53 | +- **Python:** 3.10 or 3.11 |
| 54 | +- **Linux:** Ubuntu 22.04, Fedora 38+, or Arch Linux |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Installation |
| 59 | + |
| 60 | +### Option 1: Automatic Setup (Recommended) |
| 61 | + |
| 62 | +The `webui.sh` script automatically detects AMD GPUs and installs ROCm 6.2 support. |
| 63 | + |
| 64 | +```bash |
| 65 | +# Clone the repository (if not already done) |
| 66 | +git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git |
| 67 | +cd stable-diffusion-webui |
| 68 | + |
| 69 | +# Copy the optimized configuration |
| 70 | +cp webui-user-rocm62.sh webui-user.sh |
| 71 | + |
| 72 | +# Launch (will install dependencies automatically) |
| 73 | +./webui.sh |
| 74 | +``` |
| 75 | + |
| 76 | +### Option 2: Manual Setup |
| 77 | + |
| 78 | +If you need manual control over the installation: |
| 79 | + |
| 80 | +```bash |
| 81 | +# Create and activate virtual environment |
| 82 | +python3 -m venv venv |
| 83 | +source venv/bin/activate |
| 84 | + |
| 85 | +# Install PyTorch with ROCm 6.2 |
| 86 | +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 |
| 87 | + |
| 88 | +# Install Stable Diffusion WebUI requirements |
| 89 | +pip install -r requirements_versions.txt |
| 90 | + |
| 91 | +# Set environment variables |
| 92 | +export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True |
| 93 | + |
| 94 | +# Launch with optimized flags |
| 95 | +python launch.py --skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae |
| 96 | +``` |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## Configuration Files |
| 101 | + |
| 102 | +### `webui-user-rocm62.sh` |
| 103 | + |
| 104 | +Pre-configured launch script with optimal settings for AMD GPUs with 16GB VRAM. |
| 105 | + |
| 106 | +**Key settings:** |
| 107 | +- PyTorch with ROCm 6.2 |
| 108 | +- Memory fragmentation prevention |
| 109 | +- VRAM-optimized command-line arguments |
| 110 | + |
| 111 | +### `ROCM_VRAM_OPTIMIZATION.md` |
| 112 | + |
| 113 | +Comprehensive guide covering: |
| 114 | +- WebUI settings optimization |
| 115 | +- Generation settings for different VRAM amounts |
| 116 | +- ControlNet optimization |
| 117 | +- Workflows for best quality |
| 118 | +- Troubleshooting common issues |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +## Command-Line Arguments Explained |
| 123 | + |
| 124 | +The optimized configuration uses these flags: |
| 125 | + |
| 126 | +```bash |
| 127 | +--skip-torch-cuda-test # Skip CUDA test (we're using ROCm/HIP) |
| 128 | +--medvram # Optimized for 8-16GB VRAM |
| 129 | +--opt-split-attention # Reduces VRAM usage during attention |
| 130 | +--no-half-vae # Prevents VAE errors with full precision |
| 131 | +``` |
| 132 | + |
| 133 | +### For Different VRAM Amounts |
| 134 | + |
| 135 | +**16GB VRAM (Recommended):** |
| 136 | +```bash |
| 137 | +--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae |
| 138 | +``` |
| 139 | + |
| 140 | +**8GB VRAM:** |
| 141 | +```bash |
| 142 | +--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae |
| 143 | +``` |
| 144 | + |
| 145 | +**6GB VRAM or less:** |
| 146 | +```bash |
| 147 | +--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae --opt-channelslast |
| 148 | +``` |
| 149 | + |
| 150 | +--- |
| 151 | + |
| 152 | +## Recommended Generation Settings |
| 153 | + |
| 154 | +### For 16GB VRAM |
| 155 | + |
| 156 | +**Safe Mode (Fast, No Errors):** |
| 157 | +- Resolution: 512x512 |
| 158 | +- Hires fix: OFF |
| 159 | +- Batch size: 1 |
| 160 | +- VRAM usage: ~4-6GB |
| 161 | + |
| 162 | +**Quality Mode (Best Results):** |
| 163 | +- Resolution: 512x512 |
| 164 | +- Hires fix: ON (1.5x upscale) |
| 165 | +- Hires steps: 10 |
| 166 | +- VRAM usage: ~8-12GB |
| 167 | + |
| 168 | +**With ControlNet:** |
| 169 | +- Resolution: 512x512 |
| 170 | +- Hires fix: OFF |
| 171 | +- ControlNet units: 1-2 maximum |
| 172 | +- Low VRAM mode: ON |
| 173 | +- VRAM usage: ~6-10GB |
| 174 | + |
| 175 | +See the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md) for detailed workflows and settings. |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## Verification |
| 180 | + |
| 181 | +### Check PyTorch ROCm Installation |
| 182 | + |
| 183 | +```bash |
| 184 | +source venv/bin/activate |
| 185 | +python -c "import torch; print('ROCm available:', torch.cuda.is_available()); print('ROCm version:', torch.version.hip)" |
| 186 | +``` |
| 187 | + |
| 188 | +**Expected output:** |
| 189 | +``` |
| 190 | +ROCm available: True |
| 191 | +ROCm version: 6.2.x |
| 192 | +``` |
| 193 | + |
| 194 | +### Monitor VRAM Usage |
| 195 | + |
| 196 | +```bash |
| 197 | +watch -n 1 rocm-smi |
| 198 | +``` |
| 199 | + |
| 200 | +Or check current usage: |
| 201 | +```bash |
| 202 | +rocm-smi --showmeminfo vram |
| 203 | +``` |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## Troubleshooting |
| 208 | + |
| 209 | +### Out of Memory Errors |
| 210 | + |
| 211 | +If you encounter OOM errors: |
| 212 | + |
| 213 | +1. **Reduce resolution:** 768x768 → 512x512 |
| 214 | +2. **Disable Hires fix** or reduce upscale ratio |
| 215 | +3. **Use more aggressive flags:** |
| 216 | + ```bash |
| 217 | + export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae" |
| 218 | + ``` |
| 219 | + |
| 220 | +### Black Images or Artifacts |
| 221 | + |
| 222 | +Ensure `--no-half-vae` is in your command-line arguments. |
| 223 | + |
| 224 | +### Slow Generation |
| 225 | + |
| 226 | +- Use `--medvram` instead of `--lowvram` for 16GB VRAM |
| 227 | +- Reduce sampling steps to 20 |
| 228 | +- Try faster samplers: DPM++ 2M, Euler a |
| 229 | + |
| 230 | +### Model Loading Errors |
| 231 | + |
| 232 | +Verify PyTorch installation: |
| 233 | +```bash |
| 234 | +source venv/bin/activate |
| 235 | +python -c "import torch; print(torch.cuda.is_available())" |
| 236 | +``` |
| 237 | + |
| 238 | +If it returns `False`, reinstall PyTorch: |
| 239 | +```bash |
| 240 | +pip uninstall torch torchvision torchaudio |
| 241 | +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 |
| 242 | +``` |
| 243 | + |
| 244 | +For more troubleshooting, see the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md#troubleshooting). |
| 245 | + |
| 246 | +--- |
| 247 | + |
| 248 | +## Performance Tips |
| 249 | + |
| 250 | +1. **Two-Phase Workflow:** |
| 251 | + - Generate at 512x512 without Hires fix (fast) |
| 252 | + - Upscale separately using img2img or Extras tab (best quality) |
| 253 | + |
| 254 | +2. **ControlNet Best Practices:** |
| 255 | + - Use only 1-2 units at a time |
| 256 | + - Enable Low VRAM mode |
| 257 | + - Disable Hires fix when using ControlNet |
| 258 | + |
| 259 | +3. **Batch Processing:** |
| 260 | + - Use `Batch count` instead of `Batch size` |
| 261 | + - Keep resolution at 512x512 for batches |
| 262 | + |
| 263 | +4. **Memory Management:** |
| 264 | + - Restart WebUI after 50-100 generations |
| 265 | + - Use "Unload SD checkpoint" when switching models |
| 266 | + |
| 267 | +--- |
| 268 | + |
| 269 | +## Additional Resources |
| 270 | + |
| 271 | +- **[VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md)** - Comprehensive optimization guide |
| 272 | +- **[ROCm Documentation](https://rocm.docs.amd.com/)** - Official AMD ROCm docs |
| 273 | +- **[SD WebUI Wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki)** - Official WebUI documentation |
| 274 | +- **[AMD GPU Support](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs)** - WebUI AMD GPU guide |
| 275 | + |
| 276 | +--- |
| 277 | + |
| 278 | +## Summary |
| 279 | + |
| 280 | +✅ **Key Points:** |
| 281 | +- Use ROCm 6.2 for best compatibility |
| 282 | +- Enable `expandable_segments:True` to prevent memory fragmentation |
| 283 | +- Use `--medvram` for 16GB VRAM |
| 284 | +- Start with 512x512, upscale separately for quality |
| 285 | +- Enable ControlNet Low VRAM mode |
| 286 | + |
| 287 | +❌ **Avoid:** |
| 288 | +- Batch size > 1 (use Batch count instead) |
| 289 | +- Hires fix with 2x upscale on 16GB VRAM |
| 290 | +- More than 2 ControlNet units simultaneously |
| 291 | +- Direct generation at resolutions > 768x768 |
| 292 | + |
| 293 | +--- |
| 294 | + |
| 295 | +**For detailed configuration and workflows, see [ROCM_VRAM_OPTIMIZATION.md](ROCM_VRAM_OPTIMIZATION.md)** |
0 commit comments