Skip to content

Commit 5e65b0e

Browse files
committed
Add ROCm 6.2 VRAM optimization for AMD GPUs (8-16GB)
This commit adds comprehensive ROCm 6.2 support and VRAM optimization for AMD GPUs, specifically targeting systems with 8-16GB VRAM. Changes: - Updated webui.sh to use ROCm 6.2 instead of 5.7 for AMD GPUs - Added webui-user-rocm62.sh: Optimized launch script with: * PyTorch ROCm 6.2 installation command * PYTORCH_HIP_ALLOC_CONF for memory fragmentation prevention * Optimized command-line flags (--medvram, --opt-split-attention, etc.) * Detailed inline documentation - Added ROCM_VRAM_OPTIMIZATION.md: Comprehensive 400+ line guide covering: * Launch configuration and environment variables * WebUI settings optimization * Generation settings for different VRAM amounts * ControlNet optimization techniques * Recommended workflows for quality and performance * Extensive troubleshooting section * Performance benchmarks - Added README_ROCM.md: Quick start guide for ROCm setup Key optimizations: - Memory fragmentation prevention via expandable_segments - Optimal command-line arguments for 16GB VRAM - Two-phase workflow (generate at 512x512, upscale separately) - ControlNet low VRAM mode configuration - Batch processing best practices Benefits: - Prevents OOM errors on 16GB VRAM GPUs - Improved stability for long generation sessions - Better quality outputs through optimized workflows - Faster iteration with recommended settings
1 parent 82a973c commit 5e65b0e

File tree

4 files changed

+963
-2
lines changed

4 files changed

+963
-2
lines changed

README_ROCM.md

Lines changed: 295 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,295 @@
1+
# ROCm Setup Guide for Stable Diffusion WebUI
2+
3+
This guide helps you set up and optimize Stable Diffusion WebUI for AMD GPUs using ROCm 6.2.
4+
5+
## Quick Start
6+
7+
### 1. Copy the Optimized Launch Configuration
8+
9+
```bash
10+
cp webui-user-rocm62.sh webui-user.sh
11+
```
12+
13+
### 2. Launch the WebUI
14+
15+
```bash
16+
./webui.sh
17+
```
18+
19+
The launch script will automatically:
20+
- Install PyTorch with ROCm 6.2 support
21+
- Configure optimal VRAM settings for 16GB GPUs
22+
- Set up memory management to prevent fragmentation
23+
24+
### 3. Configure WebUI Settings
25+
26+
After the WebUI starts, navigate to **Settings → Optimizations** and configure:
27+
28+
- **Cross attention optimization:** `Doggettx` (default)
29+
- **Enable quantization in K samplers:** ✓ Enabled
30+
- **Token merging ratio:** `0.5`
31+
32+
See the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md) for detailed configuration instructions.
33+
34+
---
35+
36+
## System Requirements
37+
38+
### Supported AMD GPUs
39+
40+
- **RX 6000 Series** (Navi 2): RX 6700 XT, 6800, 6800 XT, 6900 XT
41+
- **RX 7000 Series** (Navi 3): RX 7600, 7700 XT, 7800 XT, 7900 XT, 7900 XTX
42+
- **RX 5000 Series** (Navi 1): RX 5700 XT (with limitations)
43+
44+
### Recommended VRAM
45+
46+
- **Minimum:** 8GB VRAM
47+
- **Recommended:** 16GB VRAM
48+
- **Optimal:** 24GB VRAM
49+
50+
### Software Requirements
51+
52+
- **ROCm:** 6.2 or newer
53+
- **Python:** 3.10 or 3.11
54+
- **Linux:** Ubuntu 22.04, Fedora 38+, or Arch Linux
55+
56+
---
57+
58+
## Installation
59+
60+
### Option 1: Automatic Setup (Recommended)
61+
62+
The `webui.sh` script automatically detects AMD GPUs and installs ROCm 6.2 support.
63+
64+
```bash
65+
# Clone the repository (if not already done)
66+
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
67+
cd stable-diffusion-webui
68+
69+
# Copy the optimized configuration
70+
cp webui-user-rocm62.sh webui-user.sh
71+
72+
# Launch (will install dependencies automatically)
73+
./webui.sh
74+
```
75+
76+
### Option 2: Manual Setup
77+
78+
If you need manual control over the installation:
79+
80+
```bash
81+
# Create and activate virtual environment
82+
python3 -m venv venv
83+
source venv/bin/activate
84+
85+
# Install PyTorch with ROCm 6.2
86+
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
87+
88+
# Install Stable Diffusion WebUI requirements
89+
pip install -r requirements_versions.txt
90+
91+
# Set environment variables
92+
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
93+
94+
# Launch with optimized flags
95+
python launch.py --skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae
96+
```
97+
98+
---
99+
100+
## Configuration Files
101+
102+
### `webui-user-rocm62.sh`
103+
104+
Pre-configured launch script with optimal settings for AMD GPUs with 16GB VRAM.
105+
106+
**Key settings:**
107+
- PyTorch with ROCm 6.2
108+
- Memory fragmentation prevention
109+
- VRAM-optimized command-line arguments
110+
111+
### `ROCM_VRAM_OPTIMIZATION.md`
112+
113+
Comprehensive guide covering:
114+
- WebUI settings optimization
115+
- Generation settings for different VRAM amounts
116+
- ControlNet optimization
117+
- Workflows for best quality
118+
- Troubleshooting common issues
119+
120+
---
121+
122+
## Command-Line Arguments Explained
123+
124+
The optimized configuration uses these flags:
125+
126+
```bash
127+
--skip-torch-cuda-test # Skip CUDA test (we're using ROCm/HIP)
128+
--medvram # Optimized for 8-16GB VRAM
129+
--opt-split-attention # Reduces VRAM usage during attention
130+
--no-half-vae # Prevents VAE errors with full precision
131+
```
132+
133+
### For Different VRAM Amounts
134+
135+
**16GB VRAM (Recommended):**
136+
```bash
137+
--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae
138+
```
139+
140+
**8GB VRAM:**
141+
```bash
142+
--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae
143+
```
144+
145+
**6GB VRAM or less:**
146+
```bash
147+
--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae --opt-channelslast
148+
```
149+
150+
---
151+
152+
## Recommended Generation Settings
153+
154+
### For 16GB VRAM
155+
156+
**Safe Mode (Fast, No Errors):**
157+
- Resolution: 512x512
158+
- Hires fix: OFF
159+
- Batch size: 1
160+
- VRAM usage: ~4-6GB
161+
162+
**Quality Mode (Best Results):**
163+
- Resolution: 512x512
164+
- Hires fix: ON (1.5x upscale)
165+
- Hires steps: 10
166+
- VRAM usage: ~8-12GB
167+
168+
**With ControlNet:**
169+
- Resolution: 512x512
170+
- Hires fix: OFF
171+
- ControlNet units: 1-2 maximum
172+
- Low VRAM mode: ON
173+
- VRAM usage: ~6-10GB
174+
175+
See the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md) for detailed workflows and settings.
176+
177+
---
178+
179+
## Verification
180+
181+
### Check PyTorch ROCm Installation
182+
183+
```bash
184+
source venv/bin/activate
185+
python -c "import torch; print('ROCm available:', torch.cuda.is_available()); print('ROCm version:', torch.version.hip)"
186+
```
187+
188+
**Expected output:**
189+
```
190+
ROCm available: True
191+
ROCm version: 6.2.x
192+
```
193+
194+
### Monitor VRAM Usage
195+
196+
```bash
197+
watch -n 1 rocm-smi
198+
```
199+
200+
Or check current usage:
201+
```bash
202+
rocm-smi --showmeminfo vram
203+
```
204+
205+
---
206+
207+
## Troubleshooting
208+
209+
### Out of Memory Errors
210+
211+
If you encounter OOM errors:
212+
213+
1. **Reduce resolution:** 768x768 → 512x512
214+
2. **Disable Hires fix** or reduce upscale ratio
215+
3. **Use more aggressive flags:**
216+
```bash
217+
export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae"
218+
```
219+
220+
### Black Images or Artifacts
221+
222+
Ensure `--no-half-vae` is in your command-line arguments.
223+
224+
### Slow Generation
225+
226+
- Use `--medvram` instead of `--lowvram` for 16GB VRAM
227+
- Reduce sampling steps to 20
228+
- Try faster samplers: DPM++ 2M, Euler a
229+
230+
### Model Loading Errors
231+
232+
Verify PyTorch installation:
233+
```bash
234+
source venv/bin/activate
235+
python -c "import torch; print(torch.cuda.is_available())"
236+
```
237+
238+
If it returns `False`, reinstall PyTorch:
239+
```bash
240+
pip uninstall torch torchvision torchaudio
241+
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
242+
```
243+
244+
For more troubleshooting, see the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md#troubleshooting).
245+
246+
---
247+
248+
## Performance Tips
249+
250+
1. **Two-Phase Workflow:**
251+
- Generate at 512x512 without Hires fix (fast)
252+
- Upscale separately using img2img or Extras tab (best quality)
253+
254+
2. **ControlNet Best Practices:**
255+
- Use only 1-2 units at a time
256+
- Enable Low VRAM mode
257+
- Disable Hires fix when using ControlNet
258+
259+
3. **Batch Processing:**
260+
- Use `Batch count` instead of `Batch size`
261+
- Keep resolution at 512x512 for batches
262+
263+
4. **Memory Management:**
264+
- Restart WebUI after 50-100 generations
265+
- Use "Unload SD checkpoint" when switching models
266+
267+
---
268+
269+
## Additional Resources
270+
271+
- **[VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md)** - Comprehensive optimization guide
272+
- **[ROCm Documentation](https://rocm.docs.amd.com/)** - Official AMD ROCm docs
273+
- **[SD WebUI Wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki)** - Official WebUI documentation
274+
- **[AMD GPU Support](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs)** - WebUI AMD GPU guide
275+
276+
---
277+
278+
## Summary
279+
280+
**Key Points:**
281+
- Use ROCm 6.2 for best compatibility
282+
- Enable `expandable_segments:True` to prevent memory fragmentation
283+
- Use `--medvram` for 16GB VRAM
284+
- Start with 512x512, upscale separately for quality
285+
- Enable ControlNet Low VRAM mode
286+
287+
**Avoid:**
288+
- Batch size > 1 (use Batch count instead)
289+
- Hires fix with 2x upscale on 16GB VRAM
290+
- More than 2 ControlNet units simultaneously
291+
- Direct generation at resolutions > 768x768
292+
293+
---
294+
295+
**For detailed configuration and workflows, see [ROCM_VRAM_OPTIMIZATION.md](ROCM_VRAM_OPTIMIZATION.md)**

0 commit comments

Comments
 (0)