Skip to content

Commit 9156597

Browse files
committed
Move environment-mac.yaml to Python 3.9 and patch dream.py for Macs.
I'm using stable-diffusion on a 2022 Macbook M2 Air with 24 GB unified memory. I see this taking about 2.0s/it. I've moved many deps from pip to conda-forge, to take advantage of the precompiled binaries. Some notes for Mac users, since I've seen a lot of confusion about this: One doesn't need the `apple` channel to run this on a Mac-- that's only used by `tensorflow-deps`, required for running tensorflow-metal. For that, I have an example environment.yml here: https://developer.apple.com/forums/thread/711792?answerId=723276022#723276022 However, the `CONDA_ENV=osx-arm64` environment variable *is* needed to ensure that you do not run any Intel-specific packages such as `mkl`, which will fail with [cryptic errors](CompVis/stable-diffusion#25 (comment)) on the ARM architecture and cause the environment to break. I've also added a comment in the env file about 3.10 not working yet. When it becomes possible to update, those commands run on an osx-arm64 machine should work to determine the new version set. Here's what a successful run of dream.py should look like: ``` $ python scripts/dream.py --full_precision  SIGABRT(6) ↵  08:42:59 * Initializing, be patient... Loading model from models/ldm/stable-diffusion-v1/model.ckpt LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Using slower but more accurate full-precision math (--full_precision) >> Setting Sampler to k_lms model loaded in 6.12s * Initialization done! Awaiting your command (-h for help, 'q' to quit) dream> "an astronaut riding a horse" Generating: 0%| | 0/1 [00:00<?, ?it/s]/Users/corajr/Documents/lstein/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1662016319283/work/aten/src/ATen/mps/MPSFallback.mm:11.) placeholder_idx = torch.where( 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:37<00:00, 1.95s/it] Generating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:38<00:00, 98.55s/it] Usage stats: 1 image(s) generated in 98.60s Max VRAM used for this generation: 0.00G Outputs: outputs/img-samples/000001.1525943180.png: "an astronaut riding a horse" -s50 -W512 -H512 -C7.5 -Ak_lms -F -S1525943180 ```
1 parent 7011960 commit 9156597

File tree

3 files changed

+69
-55
lines changed

3 files changed

+69
-55
lines changed

README-Mac-MPS.md

Lines changed: 20 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@ issue](https://github.com/CompVis/stable-diffusion/issues/25), and generally on
1212

1313
You have to have macOS 12.3 Monterey or later. Anything earlier than that won't work.
1414

15-
BTW, I haven't tested any of this on Intel Macs but I have read that one person
16-
got it to work.
15+
Tested on a 2022 Macbook M2 Air with 10-core gpu 24 GB unified memory.
1716

1817
How to:
1918

@@ -22,17 +21,16 @@ git clone https://github.com/lstein/stable-diffusion.git
2221
cd stable-diffusion
2322
2423
mkdir -p models/ldm/stable-diffusion-v1/
25-
ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt
24+
PATH_TO_CKPT="$HOME/Documents/stable-diffusion-v-1-4-original" # or wherever yours is.
25+
ln -s "$PATH_TO_CKPT/sd-v1-4.ckpt" models/ldm/stable-diffusion-v1/model.ckpt
2626
27-
conda env create -f environment-mac.yaml
27+
CONDA_SUBDIR=osx-arm64 conda env create -f environment-mac.yaml
2828
conda activate ldm
2929
3030
python scripts/preload_models.py
31-
python scripts/orig_scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
31+
python scripts/dream.py --full_precision # half-precision requires autocast and won't work
3232
```
3333

34-
We have not gotten lstein's dream.py to work yet.
35-
3634
After you follow all the instructions and run txt2img.py you might get several errors. Here's the errors I've seen and found solutions for.
3735

3836
### Is it slow?
@@ -94,10 +92,6 @@ get quick feedback.
9492

9593
python ./scripts/txt2img.py --prompt "ocean" --ddim_steps 5 --n_samples 1 --n_iter 1
9694

97-
### MAC: torch._C' has no attribute '_cuda_resetPeakMemoryStats' #234
98-
99-
We haven't fixed gotten dream.py to work on Mac yet.
100-
10195
### OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'...
10296

10397
python scripts/preload_models.py
@@ -108,7 +102,7 @@ Example error.
108102

109103
```
110104
...
111-
NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on [https://github.com/pytorch/pytorch/issues/77764](https://github.com/pytorch/pytorch/issues/77764). As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
105+
NotImplementedError: The operator 'aten::_index_put_impl_' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on [https://github.com/pytorch/pytorch/issues/77764](https://github.com/pytorch/pytorch/issues/77764). As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
112106
```
113107

114108
The lstein branch includes this fix in [environment-mac.yaml](https://github.com/lstein/stable-diffusion/blob/main/environment-mac.yaml).
@@ -137,27 +131,18 @@ still working on it.
137131

138132
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
139133

140-
There are several things you can do. First, you could use something
141-
besides Anaconda like miniforge. I read a lot of things online telling
142-
people to use something else, but I am stuck with Anaconda for other
143-
reasons.
144-
145-
Or you can try this.
146-
147-
export KMP_DUPLICATE_LIB_OK=True
148-
149-
Or this (which takes forever on my computer and didn't work anyway).
134+
You are likely using an Intel package by mistake. Be sure to run conda with
135+
the environment variable `CONDA_SUBDIR=osx-arm64`, like so:
150136

151-
conda install nomkl
137+
`CONDA_SUBDIR=osx-arm64 conda install ...`
152138

153-
This error happens with Anaconda on Macs, and
154-
[nomkl](https://stackoverflow.com/questions/66224879/what-is-the-nomkl-python-package-used-for)
155-
is supposed to fix the issue (it isn't a module but a fix of some
156-
sort). [There's more
157-
suggestions](https://stackoverflow.com/questions/53014306/error-15-initializing-libiomp5-dylib-but-found-libiomp5-dylib-already-initial),
158-
like uninstalling tensorflow and reinstalling. I haven't tried them.
139+
This error happens with Anaconda on Macs when the Intel-only `mkl` is pulled in by
140+
a dependency. [nomkl](https://stackoverflow.com/questions/66224879/what-is-the-nomkl-python-package-used-for)
141+
is a metapackage designed to prevent this, by making it impossible to install
142+
`mkl`, but if your environment is already broken it may not work.
159143

160-
Since I switched to miniforge I haven't seen the error.
144+
Do *not* use `os.environ['KMP_DUPLICATE_LIB_OK']='True'` or equivalents as this
145+
masks the underlying issue of using Intel packages.
161146

162147
### Not enough memory.
163148

@@ -226,4 +211,8 @@ What? Intel? On an Apple Silicon?
226211
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
227212
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
228213

229-
This was actually the issue that I couldn't solve until I switched to miniforge.
214+
This is due to the Intel `mkl` package getting picked up when you try to install
215+
something that depends on it-- Rosetta can translate some Intel instructions but
216+
not the specialized ones here. To avoid this, make sure to use the environment
217+
variable `CONDA_SUBDIR=osx-arm64`, which restricts the Conda environment to only
218+
use ARM packages, and use `nomkl` as described above.

environment-mac.yaml

Lines changed: 46 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,57 @@
11
name: ldm
22
channels:
3-
- apple
4-
- conda-forge
53
- pytorch-nightly
6-
- defaults
4+
- conda-forge
75
dependencies:
8-
- python=3.10.4
9-
- pip=22.1.2
6+
- python==3.9.13
7+
- pip==22.2.2
8+
9+
# pytorch-nightly, left unpinned
1010
- pytorch
11+
- torchmetrics
1112
- torchvision
12-
- numpy=1.23.1
13+
14+
# I suggest to keep the other deps sorted for convenience.
15+
# If you wish to upgrade to 3.10, try to run this:
16+
#
17+
# ```shell
18+
# CONDA_CMD=conda
19+
# sed -E 's/python==3.9.13/python==3.10.5/;s/ldm/ldm-3.10/;21,99s/- ([^=]+)==.+/- \1/' environment-mac.yaml > /tmp/environment-mac-updated.yml
20+
# CONDA_SUBDIR=osx-arm64 $CONDA_CMD env create -f /tmp/environment-mac-updated.yml && $CONDA_CMD list -n ldm-3.10 | awk ' {print " - " $1 "==" $2;} '
21+
# ```
22+
#
23+
# Unfortunately, as of 2022-08-31, this fails at the pip stage.
24+
- albumentations==1.2.1
25+
- coloredlogs==15.0.1
26+
- einops==0.4.1
27+
- grpcio==1.46.4
28+
- humanfriendly
29+
- imageio-ffmpeg==0.4.7
30+
- imageio==2.21.2
31+
- imgaug==0.4.0
32+
- kornia==0.6.7
33+
- mpmath==1.2.1
34+
- nomkl
35+
- numpy==1.23.2
36+
- omegaconf==2.1.1
37+
- onnx==1.12.0
38+
- onnxruntime==1.12.1
39+
- opencv==4.6.0
40+
- pudb==2022.1
41+
- pytorch-lightning==1.6.5
42+
- scipy==1.9.1
43+
- streamlit==1.12.2
44+
- sympy==1.10.1
45+
- tensorboard==2.9.0
46+
- transformers==4.21.2
1347
- pip:
14-
- albumentations==0.4.6
15-
- opencv-python==4.6.0.66
16-
- pudb==2019.2
17-
- imageio==2.9.0
18-
- imageio-ffmpeg==0.4.2
19-
- pytorch-lightning==1.4.2
20-
- omegaconf==2.1.1
21-
- test-tube>=0.7.5
22-
- streamlit==1.12.0
23-
- pillow==9.2.0
24-
- einops==0.3.0
25-
- torch-fidelity==0.3.0
26-
- transformers==4.19.2
27-
- torchmetrics==0.6.0
28-
- kornia==0.6.0
29-
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
48+
- invisible-watermark
49+
- test-tube
50+
- tokenizers
51+
- torch-fidelity
52+
- -e git+https://github.com/huggingface/[email protected]#egg=diffusers
3053
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
54+
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
3155
- -e git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
3256
- -e .
3357
variables:

ldm/simplet2i.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -272,14 +272,15 @@ def process_image(image,seed):
272272
if not(width == self.width and height == self.height):
273273
width, height, _ = self._resolution_check(width, height, log=True)
274274

275-
scope = autocast if self.precision == 'autocast' else nullcontext
275+
scope = autocast if self.precision == 'autocast' and torch.cuda.is_available() else nullcontext
276276

277277
if sampler_name and (sampler_name != self.sampler_name):
278278
self.sampler_name = sampler_name
279279
self._set_sampler()
280280

281281
tic = time.time()
282-
torch.cuda.torch.cuda.reset_peak_memory_stats()
282+
if torch.cuda.is_available():
283+
torch.cuda.torch.cuda.reset_peak_memory_stats()
283284
results = list()
284285

285286
try:

0 commit comments

Comments
 (0)