Skip to content

Commit bdb0651

Browse files
committed
add support for Apple hardware using MPS acceleration
1 parent 1714816 commit bdb0651

File tree

16 files changed

+361
-52
lines changed

16 files changed

+361
-52
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,4 +180,6 @@ outputs
180180
# created from generated embeddings.
181181
logs
182182
testtube
183-
checkpoints
183+
checkpoints
184+
# If it's a Mac
185+
.DS_Store

README-Mac-MPS.md

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# Apple Silicon Mac Users
2+
3+
Several people have gotten Stable Diffusion to work on Apple Silicon
4+
Macs using Anaconda. I've gathered up most of their instructions and
5+
put them in this fork (and readme). I haven't tested anything besides
6+
Anaconda, and I've read about issues with things like miniforge, so if
7+
you have an issue that isn't dealt with in this fork then head on over
8+
to the [Apple
9+
Silicon](https://github.com/CompVis/stable-diffusion/issues/25) issue
10+
on GitHub (that page is so long that GitHub hides most of it by
11+
default, so you need to find the hidden part and expand it to view the
12+
whole thing). This fork would not have been possible without the work
13+
done by the people on that issue.
14+
15+
You have to have macOS 12.3 Monterey or later. Anything earlier than that won't work.
16+
17+
BTW, I haven't tested any of this on Intel Macs.
18+
19+
How to:
20+
21+
```
22+
git clone https://github.com/lstein/stable-diffusion.git
23+
cd stable-diffusion
24+
25+
mkdir -p models/ldm/stable-diffusion-v1/
26+
ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt
27+
28+
conda env create -f environment-mac.yaml
29+
conda activate ldm
30+
```
31+
32+
These instructions are identical to the main repo except I added
33+
environment-mac.yaml because Mac doesn't have cudatoolkit.
34+
35+
After you follow all the instructions and run txt2img.py you might get several errors. Here's the errors I've seen and found solutions for.
36+
37+
### Doesn't work anymore?
38+
39+
We are using PyTorch nightly, which includes support for MPS. I don't
40+
know exactly how Anaconda does updates, but I woke up one morning and
41+
Stable Diffusion crashed and I couldn't think of anything I did that
42+
would've changed anything the night before, when it worked. A day and
43+
a half later I finally got it working again. I don't know what changed
44+
overnight. PyTorch-nightly changes overnight but I'm pretty sure I
45+
didn't manually update it. Either way, things are probably going to be
46+
bumpy on Apple Silicon until PyTorch releases a firm version that we
47+
can lock to.
48+
49+
To manually update to the latest version of PyTorch nightly (which could fix issues), run this command.
50+
51+
conda install pytorch torchvision torchaudio -c pytorch-nightly
52+
53+
## Debugging?
54+
55+
Tired of waiting for your renders to finish before you can see if it
56+
works? Reduce the steps! The picture wont look like anything but if it
57+
finishes, hey, it works! This could also help you figure out if you've
58+
got a memory problem, because I'm betting 1 step doesn't use much
59+
memory.
60+
61+
python ./scripts/txt2img.py --prompt "ocean" --ddim_steps 1
62+
63+
### "No module named cv2" (or some other module)
64+
65+
Did you remember to `conda activate ldm`? If your terminal prompt
66+
begins with "(ldm)" then you activated it. If it begins with "(base)"
67+
or something else you haven't.
68+
69+
If it says you're missing taming you need to rebuild your virtual
70+
environment.
71+
72+
conda env remove -n ldm
73+
conda env create -f environment-mac.yaml
74+
75+
If you have activated the ldm virtual environment and tried rebuilding
76+
it, maybe the problem could be that I have something installed that
77+
you don't and you'll just need to manually install it. Make sure you
78+
activate the virtual environment so it installs there instead of
79+
globally.
80+
81+
conda activate ldm
82+
pip install *name*
83+
84+
You might also need to install Rust (I mention this again below).
85+
86+
### "The operator [name] is not current implemented for the MPS device." (sic)
87+
88+
Example error.
89+
90+
```
91+
...
92+
NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on [https://github.com/pytorch/pytorch/issues/77764](https://github.com/pytorch/pytorch/issues/77764). As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
93+
```
94+
95+
Just do what it says:
96+
97+
export PYTORCH_ENABLE_MPS_FALLBACK=1
98+
99+
### "Could not build wheels for tokenizers"
100+
101+
I have not seen this error because I had Rust installed on my computer before I started playing with Stable Diffusion. The fix is to install Rust.
102+
103+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
104+
105+
### How come `--seed` doesn't work?
106+
107+
> Completely reproducible results are not guaranteed across PyTorch
108+
releases, individual commits, or different platforms. Furthermore,
109+
results may not be reproducible between CPU and GPU executions, even
110+
when using identical seeds.
111+
112+
[PyTorch docs](https://pytorch.org/docs/stable/notes/randomness.html)
113+
114+
There is an [open issue](https://github.com/pytorch/pytorch/issues/78035) (as of August 2022) in pytorch regarding gradient inconsistency. I am guessing that's what is causing this.
115+
116+
### libiomp5.dylib error?
117+
118+
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
119+
120+
There are several things you can do. First, you could use something
121+
besides Anaconda like miniforge. I read a lot of things online telling
122+
people to use something else, but I am stuck with Anaconda for other
123+
reasons.
124+
125+
Or you can try this.
126+
127+
export KMP_DUPLICATE_LIB_OK=True
128+
129+
Or this (which takes forever on my computer and didn't work anyway).
130+
131+
conda install nomkl
132+
133+
This error happens with Anaconda on Macs, and
134+
[nomkl](https://stackoverflow.com/questions/66224879/what-is-the-nomkl-python-package-used-for)
135+
is supposed to fix the issue (it isn't a module but a fix of some
136+
sort). [There's more
137+
suggestions](https://stackoverflow.com/questions/53014306/error-15-initializing-libiomp5-dylib-but-found-libiomp5-dylib-already-initial),
138+
like uninstalling tensorflow and reinstalling. I haven't tried them.
139+
140+
### Not enough memory.
141+
142+
This seems to be a common problem and is probably the underlying
143+
problem for a lot of symptoms (listed below). The fix is to lower your
144+
image size or to add `model.half()` right after the model is loaded. I
145+
should probably test it out. I've read that the reason this fixes
146+
problems is because it converts the model from 32-bit to 16-bit and
147+
that leaves more RAM for other things. I have no idea how that would
148+
affect the quality of the images though.
149+
150+
See [this issue](https://github.com/CompVis/stable-diffusion/issues/71).
151+
152+
### "Error: product of dimension sizes > 2**31'"
153+
154+
This error happens with img2img, which I haven't played with too much
155+
yet. But I know it's because your image is too big or the resolution
156+
isn't a multiple of 32x32. Because the stable-diffusion model was
157+
trained on images that were 512 x 512, it's always best to use that
158+
output size (which is the default). However, if you're using that size
159+
and you get the above error, try 256 x 256 or 512 x 256 or something
160+
as the source image.
161+
162+
BTW, 2**31-1 = [2,147,483,647](https://en.wikipedia.org/wiki/2,147,483,647#In_computing), which is also 32-bit signed [LONG_MAX](https://en.wikipedia.org/wiki/C_data_types) in C.
163+
164+
### I just got Rickrolled! Do I have a virus?
165+
166+
You don't have a virus. It's part of the project. Here's
167+
[Rick](https://github.com/lstein/stable-diffusion/blob/main/assets/rick.jpeg)
168+
and here's [the
169+
code](https://github.com/lstein/stable-diffusion/blob/69ae4b35e0a0f6ee1af8bb9a5d0016ccb27e36dc/scripts/txt2img.py#L79)
170+
that swaps him in. It's a NSFW filter, which IMO, doesn't work very
171+
good (and we call this "computer vision", sheesh).
172+
173+
Actually, this could be happening because there's not enough RAM. You could try the `model.half()` suggestion or specify smaller output images.
174+
175+
### My images come out black
176+
177+
I haven't solved this issue. I just throw away my black
178+
images. There's a [similar
179+
issue](https://github.com/CompVis/stable-diffusion/issues/69) on CUDA
180+
GPU's where the images come out green. Maybe it's the same issue?
181+
Someone in that issue says to use "--precision full", but this fork
182+
actually disables that flag. I don't know why, someone else provided
183+
that code and I don't know what it does. Maybe the `model.half()`
184+
suggestion above would fix this issue too. I should probably test it.
185+
186+
### "view size is not compatible with input tensor's size and stride"
187+
188+
```
189+
File "/opt/anaconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2511, in layer_norm
190+
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
191+
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
192+
```
193+
194+
Update to the latest version of lstein/stable-diffusion. We were
195+
patching pytorch but we found a file in stable-diffusion that we could
196+
change instead. This is a 32-bit vs 16-bit problem.
197+
198+
### The processor must support the Intel bla bla bla
199+
200+
What? Intel? On an Apple Silicon?
201+
202+
Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
203+
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.██████████████| 50/50 [02:25<00:00, 2.53s/it]
204+
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
205+
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
206+
207+
This fixed it for me:
208+
209+
conda clean --yes --all
210+
211+
### Still slow?
212+
213+
I changed the defaults of n_samples and n_iter to 1 so that it uses
214+
less RAM and makes less images so it will be faster the first time you
215+
use it. I don't actually know what n_samples does internally, but I
216+
know it consumes a lot more RAM. The n_iter flag just loops around the
217+
image creation code, so it shouldn't consume more RAM (it should be
218+
faster if you're going to do multiple images because the libraries and
219+
model will already be loaded--use a prompt file to get this speed
220+
boost).
221+
222+
These flags are the default sample and iter settings in this fork/branch:
223+
224+
~~~~
225+
python scripts/txt2img.py --prompt "ocean" --n_samples=1 --n_iter=1
226+
~~~
227+
228+

README.md

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -387,7 +387,7 @@ Credit goes to @rinongal and the repository located at
387387
https://github.com/rinongal/textual_inversion Please see the
388388
repository and associated paper for details and limitations.
389389

390-
# Latest
390+
# Latest Changes
391391

392392
- v1.13 (in process)
393393

@@ -403,9 +403,9 @@ For older changelogs, please visit **[CHANGELOGS](CHANGELOG.md)**.
403403

404404
# Installation
405405

406-
There are separate installation walkthroughs for [Linux/Mac](#linuxmac) and [Windows](#windows).
406+
There are separate installation walkthroughs for [Linux](#linux), [Windows](#windows) and [Macintosh](#Macintosh)
407407

408-
## Linux/Mac
408+
## Linux
409409

410410
1. You will need to install the following prerequisites if they are not already available. Use your
411411
operating system's preferred installer
@@ -580,7 +580,15 @@ python scripts\dream.py -l
580580
python scripts\dream.py
581581
```
582582

583-
10. Subsequently, to relaunch the script, first activate the Anaconda command window (step 3), enter the stable-diffusion directory (step 5, "cd \path\to\stable-diffusion"), run "conda activate ldm" (step 6b), and then launch the dream script (step 9).
583+
10. Subsequently, to relaunch the script, first activate the Anaconda
584+
command window (step 3), enter the stable-diffusion directory (step 5,
585+
"cd \path\to\stable-diffusion"), run "conda activate ldm" (step 6b),
586+
and then launch the dream script (step 9).
587+
588+
**Note:** Tildebyte has written an alternative ["Easy peasy Windows
589+
install"](https://github.com/lstein/stable-diffusion/wiki/Easy-peasy-Windows-install)
590+
which uses the Windows Powershell and pew. If you are having trouble
591+
with Anaconda on Windows, give this a try (or try it first!)
584592

585593
### Updating to newer versions of the script
586594

@@ -595,11 +603,16 @@ git pull
595603

596604
This will bring your local copy into sync with the remote one.
597605

598-
## Simplified API for text to image generation
606+
## Macintosh
607+
608+
See (README-Mac-MPS)[README-Mac-MPS.md] for instructions.
609+
610+
# Simplified API for text to image generation
599611

600612
For programmers who wish to incorporate stable-diffusion into other
601-
products, this repository includes a simplified API for text to image generation, which
602-
lets you create images from a prompt in just three lines of code:
613+
products, this repository includes a simplified API for text to image
614+
generation, which lets you create images from a prompt in just three
615+
lines of code:
603616

604617
```
605618
from ldm.simplet2i import T2I
@@ -608,9 +621,10 @@ outputs = model.txt2img("a unicorn in manhattan")
608621
```
609622

610623
Outputs is a list of lists in the format [[filename1,seed1],[filename2,seed2]...]
611-
Please see ldm/simplet2i.py for more information.
624+
Please see ldm/simplet2i.py for more information. A set of example scripts is
625+
coming RSN.
612626

613-
## Workaround for machines with limited internet connectivity
627+
# Workaround for machines with limited internet connectivity
614628

615629
My development machine is a GPU node in a high-performance compute
616630
cluster which has no connection to the internet. During model

environment-mac.yaml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: ldm
2+
channels:
3+
- apple
4+
- conda-forge
5+
- pytorch-nightly
6+
- defaults
7+
dependencies:
8+
- python=3.10.4
9+
- pip=22.1.2
10+
- pytorch
11+
- torchvision
12+
- numpy=1.23.1
13+
- pip:
14+
- albumentations==0.4.6
15+
- opencv-python==4.6.0.66
16+
- pudb==2019.2
17+
- imageio==2.9.0
18+
- imageio-ffmpeg==0.4.2
19+
- pytorch-lightning==1.4.2
20+
- omegaconf==2.1.1
21+
- test-tube>=0.7.5
22+
- streamlit==1.12.0
23+
- pillow==9.2.0
24+
- einops==0.3.0
25+
- torch-fidelity==0.3.0
26+
- transformers==4.19.2
27+
- torchmetrics==0.6.0
28+
- kornia==0.6.0
29+
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
30+
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
31+
- -e git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
32+
- -e .

ldm/dream/devices.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
import torch
2+
3+
def choose_torch_device() -> str:
4+
'''Convenience routine for guessing which GPU device to run model on'''
5+
if torch.cuda.is_available():
6+
return 'cuda'
7+
if torch.backends.mps.is_available():
8+
return 'mps'
9+
return 'cpu'
10+
11+

ldm/models/diffusion/ddim.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import numpy as np
55
from tqdm import tqdm
66
from functools import partial
7+
from ldm.dream.devices import choose_torch_device
78

89
from ldm.modules.diffusionmodules.util import (
910
make_ddim_sampling_parameters,
@@ -14,17 +15,17 @@
1415

1516

1617
class DDIMSampler(object):
17-
def __init__(self, model, schedule='linear', device='cuda', **kwargs):
18+
def __init__(self, model, schedule='linear', device=None, **kwargs):
1819
super().__init__()
1920
self.model = model
2021
self.ddpm_num_timesteps = model.num_timesteps
2122
self.schedule = schedule
22-
self.device = device
23+
self.device = device or choose_torch_device()
2324

2425
def register_buffer(self, name, attr):
2526
if type(attr) == torch.Tensor:
2627
if attr.device != torch.device(self.device):
27-
attr = attr.to(torch.device(self.device))
28+
attr = attr.to(dtype=torch.float32, device=self.device)
2829
setattr(self, name, attr)
2930

3031
def make_schedule(

0 commit comments

Comments
 (0)