|
| 1 | +# Apple Silicon Mac Users |
| 2 | + |
| 3 | +Several people have gotten Stable Diffusion to work on Apple Silicon |
| 4 | +Macs using Anaconda. I've gathered up most of their instructions and |
| 5 | +put them in this fork (and readme). I haven't tested anything besides |
| 6 | +Anaconda, and I've read about issues with things like miniforge, so if |
| 7 | +you have an issue that isn't dealt with in this fork then head on over |
| 8 | +to the [Apple |
| 9 | +Silicon](https://github.com/CompVis/stable-diffusion/issues/25) issue |
| 10 | +on GitHub (that page is so long that GitHub hides most of it by |
| 11 | +default, so you need to find the hidden part and expand it to view the |
| 12 | +whole thing). This fork would not have been possible without the work |
| 13 | +done by the people on that issue. |
| 14 | + |
| 15 | +You have to have macOS 12.3 Monterey or later. Anything earlier than that won't work. |
| 16 | + |
| 17 | +BTW, I haven't tested any of this on Intel Macs. |
| 18 | + |
| 19 | +How to: |
| 20 | + |
| 21 | +``` |
| 22 | +git clone https://github.com/lstein/stable-diffusion.git |
| 23 | +cd stable-diffusion |
| 24 | +
|
| 25 | +mkdir -p models/ldm/stable-diffusion-v1/ |
| 26 | +ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt |
| 27 | +
|
| 28 | +conda env create -f environment-mac.yaml |
| 29 | +conda activate ldm |
| 30 | +``` |
| 31 | + |
| 32 | +These instructions are identical to the main repo except I added |
| 33 | +environment-mac.yaml because Mac doesn't have cudatoolkit. |
| 34 | + |
| 35 | +After you follow all the instructions and run txt2img.py you might get several errors. Here's the errors I've seen and found solutions for. |
| 36 | + |
| 37 | +### Doesn't work anymore? |
| 38 | + |
| 39 | +We are using PyTorch nightly, which includes support for MPS. I don't |
| 40 | +know exactly how Anaconda does updates, but I woke up one morning and |
| 41 | +Stable Diffusion crashed and I couldn't think of anything I did that |
| 42 | +would've changed anything the night before, when it worked. A day and |
| 43 | +a half later I finally got it working again. I don't know what changed |
| 44 | +overnight. PyTorch-nightly changes overnight but I'm pretty sure I |
| 45 | +didn't manually update it. Either way, things are probably going to be |
| 46 | +bumpy on Apple Silicon until PyTorch releases a firm version that we |
| 47 | +can lock to. |
| 48 | + |
| 49 | +To manually update to the latest version of PyTorch nightly (which could fix issues), run this command. |
| 50 | + |
| 51 | + conda install pytorch torchvision torchaudio -c pytorch-nightly |
| 52 | + |
| 53 | +## Debugging? |
| 54 | + |
| 55 | +Tired of waiting for your renders to finish before you can see if it |
| 56 | +works? Reduce the steps! The picture wont look like anything but if it |
| 57 | +finishes, hey, it works! This could also help you figure out if you've |
| 58 | +got a memory problem, because I'm betting 1 step doesn't use much |
| 59 | +memory. |
| 60 | + |
| 61 | + python ./scripts/txt2img.py --prompt "ocean" --ddim_steps 1 |
| 62 | + |
| 63 | +### "No module named cv2" (or some other module) |
| 64 | + |
| 65 | +Did you remember to `conda activate ldm`? If your terminal prompt |
| 66 | +begins with "(ldm)" then you activated it. If it begins with "(base)" |
| 67 | +or something else you haven't. |
| 68 | + |
| 69 | +If it says you're missing taming you need to rebuild your virtual |
| 70 | +environment. |
| 71 | + |
| 72 | + conda env remove -n ldm |
| 73 | + conda env create -f environment-mac.yaml |
| 74 | + |
| 75 | +If you have activated the ldm virtual environment and tried rebuilding |
| 76 | +it, maybe the problem could be that I have something installed that |
| 77 | +you don't and you'll just need to manually install it. Make sure you |
| 78 | +activate the virtual environment so it installs there instead of |
| 79 | +globally. |
| 80 | + |
| 81 | + conda activate ldm |
| 82 | + pip install *name* |
| 83 | + |
| 84 | +You might also need to install Rust (I mention this again below). |
| 85 | + |
| 86 | +### "The operator [name] is not current implemented for the MPS device." (sic) |
| 87 | + |
| 88 | +Example error. |
| 89 | + |
| 90 | +``` |
| 91 | +... |
| 92 | +NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on [https://github.com/pytorch/pytorch/issues/77764](https://github.com/pytorch/pytorch/issues/77764). As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS. |
| 93 | +``` |
| 94 | + |
| 95 | +Just do what it says: |
| 96 | + |
| 97 | + export PYTORCH_ENABLE_MPS_FALLBACK=1 |
| 98 | + |
| 99 | +### "Could not build wheels for tokenizers" |
| 100 | + |
| 101 | +I have not seen this error because I had Rust installed on my computer before I started playing with Stable Diffusion. The fix is to install Rust. |
| 102 | + |
| 103 | + curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh |
| 104 | + |
| 105 | +### How come `--seed` doesn't work? |
| 106 | + |
| 107 | +> Completely reproducible results are not guaranteed across PyTorch |
| 108 | +releases, individual commits, or different platforms. Furthermore, |
| 109 | +results may not be reproducible between CPU and GPU executions, even |
| 110 | +when using identical seeds. |
| 111 | + |
| 112 | +[PyTorch docs](https://pytorch.org/docs/stable/notes/randomness.html) |
| 113 | + |
| 114 | +There is an [open issue](https://github.com/pytorch/pytorch/issues/78035) (as of August 2022) in pytorch regarding gradient inconsistency. I am guessing that's what is causing this. |
| 115 | + |
| 116 | +### libiomp5.dylib error? |
| 117 | + |
| 118 | + OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized. |
| 119 | + |
| 120 | +There are several things you can do. First, you could use something |
| 121 | +besides Anaconda like miniforge. I read a lot of things online telling |
| 122 | +people to use something else, but I am stuck with Anaconda for other |
| 123 | +reasons. |
| 124 | + |
| 125 | +Or you can try this. |
| 126 | + |
| 127 | + export KMP_DUPLICATE_LIB_OK=True |
| 128 | + |
| 129 | +Or this (which takes forever on my computer and didn't work anyway). |
| 130 | + |
| 131 | + conda install nomkl |
| 132 | + |
| 133 | +This error happens with Anaconda on Macs, and |
| 134 | +[nomkl](https://stackoverflow.com/questions/66224879/what-is-the-nomkl-python-package-used-for) |
| 135 | +is supposed to fix the issue (it isn't a module but a fix of some |
| 136 | +sort). [There's more |
| 137 | +suggestions](https://stackoverflow.com/questions/53014306/error-15-initializing-libiomp5-dylib-but-found-libiomp5-dylib-already-initial), |
| 138 | +like uninstalling tensorflow and reinstalling. I haven't tried them. |
| 139 | + |
| 140 | +### Not enough memory. |
| 141 | + |
| 142 | +This seems to be a common problem and is probably the underlying |
| 143 | +problem for a lot of symptoms (listed below). The fix is to lower your |
| 144 | +image size or to add `model.half()` right after the model is loaded. I |
| 145 | +should probably test it out. I've read that the reason this fixes |
| 146 | +problems is because it converts the model from 32-bit to 16-bit and |
| 147 | +that leaves more RAM for other things. I have no idea how that would |
| 148 | +affect the quality of the images though. |
| 149 | + |
| 150 | +See [this issue](https://github.com/CompVis/stable-diffusion/issues/71). |
| 151 | + |
| 152 | +### "Error: product of dimension sizes > 2**31'" |
| 153 | + |
| 154 | +This error happens with img2img, which I haven't played with too much |
| 155 | +yet. But I know it's because your image is too big or the resolution |
| 156 | +isn't a multiple of 32x32. Because the stable-diffusion model was |
| 157 | +trained on images that were 512 x 512, it's always best to use that |
| 158 | +output size (which is the default). However, if you're using that size |
| 159 | +and you get the above error, try 256 x 256 or 512 x 256 or something |
| 160 | +as the source image. |
| 161 | + |
| 162 | +BTW, 2**31-1 = [2,147,483,647](https://en.wikipedia.org/wiki/2,147,483,647#In_computing), which is also 32-bit signed [LONG_MAX](https://en.wikipedia.org/wiki/C_data_types) in C. |
| 163 | + |
| 164 | +### I just got Rickrolled! Do I have a virus? |
| 165 | + |
| 166 | +You don't have a virus. It's part of the project. Here's |
| 167 | +[Rick](https://github.com/lstein/stable-diffusion/blob/main/assets/rick.jpeg) |
| 168 | +and here's [the |
| 169 | +code](https://github.com/lstein/stable-diffusion/blob/69ae4b35e0a0f6ee1af8bb9a5d0016ccb27e36dc/scripts/txt2img.py#L79) |
| 170 | +that swaps him in. It's a NSFW filter, which IMO, doesn't work very |
| 171 | +good (and we call this "computer vision", sheesh). |
| 172 | + |
| 173 | +Actually, this could be happening because there's not enough RAM. You could try the `model.half()` suggestion or specify smaller output images. |
| 174 | + |
| 175 | +### My images come out black |
| 176 | + |
| 177 | +I haven't solved this issue. I just throw away my black |
| 178 | +images. There's a [similar |
| 179 | +issue](https://github.com/CompVis/stable-diffusion/issues/69) on CUDA |
| 180 | +GPU's where the images come out green. Maybe it's the same issue? |
| 181 | +Someone in that issue says to use "--precision full", but this fork |
| 182 | +actually disables that flag. I don't know why, someone else provided |
| 183 | +that code and I don't know what it does. Maybe the `model.half()` |
| 184 | +suggestion above would fix this issue too. I should probably test it. |
| 185 | + |
| 186 | +### "view size is not compatible with input tensor's size and stride" |
| 187 | + |
| 188 | +``` |
| 189 | + File "/opt/anaconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2511, in layer_norm |
| 190 | + return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) |
| 191 | +RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. |
| 192 | +``` |
| 193 | + |
| 194 | +Update to the latest version of lstein/stable-diffusion. We were |
| 195 | +patching pytorch but we found a file in stable-diffusion that we could |
| 196 | +change instead. This is a 32-bit vs 16-bit problem. |
| 197 | + |
| 198 | +### The processor must support the Intel bla bla bla |
| 199 | + |
| 200 | +What? Intel? On an Apple Silicon? |
| 201 | + |
| 202 | + Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library. |
| 203 | + The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.██████████████| 50/50 [02:25<00:00, 2.53s/it] |
| 204 | + The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions. |
| 205 | + The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions. |
| 206 | + |
| 207 | +This fixed it for me: |
| 208 | + |
| 209 | + conda clean --yes --all |
| 210 | + |
| 211 | +### Still slow? |
| 212 | + |
| 213 | +I changed the defaults of n_samples and n_iter to 1 so that it uses |
| 214 | +less RAM and makes less images so it will be faster the first time you |
| 215 | +use it. I don't actually know what n_samples does internally, but I |
| 216 | +know it consumes a lot more RAM. The n_iter flag just loops around the |
| 217 | +image creation code, so it shouldn't consume more RAM (it should be |
| 218 | +faster if you're going to do multiple images because the libraries and |
| 219 | +model will already be loaded--use a prompt file to get this speed |
| 220 | +boost). |
| 221 | + |
| 222 | +These flags are the default sample and iter settings in this fork/branch: |
| 223 | + |
| 224 | +~~~~ |
| 225 | +python scripts/txt2img.py --prompt "ocean" --n_samples=1 --n_iter=1 |
| 226 | +~~~ |
| 227 | +
|
| 228 | +
|
0 commit comments