You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+69-25Lines changed: 69 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ This repository comprises:
9
9
-`python_coreml_stable_diffusion`, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face [diffusers](https://github.com/huggingface/diffusers) in Python
10
10
-`StableDiffusion`, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by `python_coreml_stable_diffusion`
11
11
12
-
If you run into issues during installation or runtime, please refer to the [FAQ](#FAQ) section.
12
+
If you run into issues during installation or runtime, please refer to the [FAQ](#faq) section. Please refer to the [System Requirements](#system-requirements) section before getting started.
13
13
14
14
15
15
## <aname="example-results"></a> Example Results
@@ -25,6 +25,13 @@ M2 MacBook Air 8GB Latency (s) | 18 | 23 | 23 |
25
25
26
26
Please see [Important Notes on Performance Benchmarks](#important-notes-on-performance-benchmarks) section for details.
27
27
28
+
## <aname="system-requirements"></a> System Requirements
29
+
30
+
The following is recommended to use all the functionality in this repository:
31
+
32
+
Python | macOS | Xcode | iPadOS, iOS |
33
+
:------:|:------:|:------:|:------:|
34
+
3.8 | 13.1 | 14.2 | 16.2 |
28
35
29
36
## <aname="converting-models-to-coreml"></a> Converting Models to Core ML
**WARNING:** This command will download several GB worth of PyTorch checkpoints from Hugging Face.
60
+
**WARNING:** This command will download several GB worth of PyTorch checkpoints from Hugging Face. Please ensure that you are on Wi-Fi and have enough disk space.
54
61
55
62
This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful execution, the 4 neural network models that comprise Stable Diffusion will have been converted from PyTorch to Core ML (`.mlpackage`) and saved into the specified `<output-mlpackages-directory>`. Some additional notable arguments:
56
63
@@ -59,9 +66,9 @@ This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful executi
59
66
60
67
-`--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `<output-mlpackages-directory>/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline.
61
68
62
-
-`--chunk-unet`: Splits the Unet model in two approximately equal chunks (each with less than 1GB of weights) for mobile-friendly deployment. This is **required** for ANE deployment on iOS and iPadOS. This is not required for macOS. Swift CLI is able to consume both the chunked and regular versions of the Unet model but prioritizes the former. Note that chunked unet is not compatible with the Python pipeline because Python pipeline is intended for macOS only. Chunking is for on-device deployment with Swift only.
69
+
-`--chunk-unet`: Splits the Unet model in two approximately equal chunks (each with less than 1GB of weights) for mobile-friendly deployment. This is **required** for Neural Engine deployment on iOS and iPadOS. This is not required for macOS. Swift CLI is able to consume both the chunked and regular versions of the Unet model but prioritizes the former. Note that chunked unet is not compatible with the Python pipeline because Python pipeline is intended for macOS only. Chunking is for on-device deployment with Swift only.
63
70
64
-
-`--attention-implementation`: Defaults to `SPLIT_EINSUM` which is the implementation described in [Deploying Transformers on the Apple Neural Engine](https://machinelearning.apple.com/research/neural-engine-transformers). `--attention-implementation ORIGINAL` will switch to an alternative that should be used for non-ANE deployment. Please refer to the [Performance Benchmark](#performance-benchmark) section for further guidance.
71
+
-`--attention-implementation`: Defaults to `SPLIT_EINSUM` which is the implementation described in [Deploying Transformers on the Apple Neural Engine](https://machinelearning.apple.com/research/neural-engine-transformers). `--attention-implementation ORIGINAL` will switch to an alternative that should be used for CPU or GPU deployment. Please refer to the [Performance Benchmark](#performance-benchmark) section for further guidance.
65
72
66
73
-`--check-output-correctness`: Compares original PyTorch model's outputs to final Core ML model's outputs. This flag increases RAM consumption significantly so it is recommended only for debugging purposes.
67
74
@@ -86,27 +93,30 @@ Please refer to the help menu for all available arguments: `python -m python_cor
86
93
87
94
</details>
88
95
89
-
## Image Generation with Swift
96
+
## <aname="image-gen-swift"></a> Image Generation with Swift
90
97
91
98
<details>
92
99
<summary> Click to expand </summary>
93
100
94
101
### <aname="swift-requirements"></a> System Requirements
95
-
Building the Swift projects require:
96
-
- macOS 13 or newer
97
-
- Xcode 14.1 or newer with command line tools installed. Please check [developer.apple.com](https://developer.apple.com/download/all/?q=xcode) for the latest version.
98
-
- Core ML models and tokenization resources. Please see `--bundle-resources-for-swift-cli` from the [Converting Models to Core ML](#converting-models-to-coreml) section above
99
-
100
-
If deploying this model to:
101
-
- iPhone
102
-
- iOS 16.2 or newer
103
-
- iPhone 12 or newer
104
-
- iPad
105
-
- iPadOS 16.2 or newer
106
-
- M1 or newer
107
-
- Mac
108
-
- macOS 13.1 or newer
109
-
- M1 or newer
102
+
103
+
**Building** (recommended):
104
+
105
+
- Xcode 14.2
106
+
- Command Line Tools for Xcode 14.2
107
+
108
+
Check [developer.apple.com](https://developer.apple.com/download/all/?q=xcode) for the latest versions.
109
+
110
+
**Running** (minimum):
111
+
112
+
| Mac | iPad\*| iPhone\*|
113
+
|:------:|:------:|:------:|
114
+
| macOS 13.1 | iPadOS 16.2 | iOS 16.2 |
115
+
| M1 | M1 | iPhone 12 Pro |
116
+
117
+
You will also need the resources generated by the `--bundle-resources-for-swift-cli` option described in [Converting Models to Core ML](#converting-models-to-coreml)
118
+
119
+
\* Please see [FAQ](#faq)[Q6](#q-mobile-app) regarding deploying on iPad and iPhone.
110
120
111
121
### Example CLI Usage
112
122
```shell
@@ -123,8 +133,10 @@ Please use the `--help` flag to learn about batched generation and more.
123
133
importStableDiffusion
124
134
...
125
135
let pipeline =tryStableDiffusionPipeline(resourcesAt: resourceURL)
136
+
pipeline.loadResources()
126
137
let image =try pipeline.generateImages(prompt: prompt, seed: seed).first
127
138
```
139
+
On iOS, the `reduceMemory` option should bet set to `true` when constructing `StableDiffusionPipeline`
128
140
129
141
### Swift Package Details
130
142
@@ -184,7 +196,7 @@ Please see [Important Notes on Performance Benchmarks](#important-notes-on-perfo
184
196
- The image generation procedure follows the standard configuration: 50 inference steps, 512x512 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).
185
197
- The actual prompt length does not impact performance because the Core ML model is converted with a static shape that computes the forward pass for all of the 77 elements (`tokenizer.model_max_length`) in the text token sequence regardless of the actual length of the input text.
186
198
- Pipelining across the 4 models is not optimized and these performance numbers are subject to variance under increased system load from other applications. Given these factors, we do not report sub-second variance in latency.
187
-
- Weights and activations are in float16 precision for both the GPU and the ANE.
199
+
- Weights and activations are in float16 precision for both the GPU and the Neural Engine.
188
200
- The Swift CLI program consumes a peak memory of approximately 2.6GB (without the safety checker), 2.1GB of which is model weights in float16 precision. We applied [8-bit weight quantization](https://coremltools.readme.io/docs/compressing-ml-program-weights#use-affine-quantization) to reduce peak memory consumption by approximately 1GB. However, we observed that it had an adverse effect on generated image quality and we rolled it back. We encourage developers to experiment with other advanced weight compression techniques such as [palettization](https://coremltools.readme.io/docs/compressing-ml-program-weights#use-a-lookup-table) and/or [pruning](https://coremltools.readme.io/docs/compressing-ml-program-weights#use-sparse-representation) which may yield better results.
189
201
- In the [benchmark table](performance-benchmark), we report the best performing `--compute-unit` and `--attention-implementation` values per device. The former does not modify the Core ML model and can be applied during runtime. The latter modifies the Core ML model. Note that the best performing compute unit is model version and hardware-specific.
190
202
@@ -208,7 +220,7 @@ Differences may be less or more pronounced for different inputs. Please see the
208
220
209
221
</details>
210
222
211
-
## FAQ
223
+
## <aname="faq"></a> FAQ
212
224
213
225
<details>
214
226
<summary> Click to expand </summary>
@@ -228,7 +240,7 @@ Differences may be less or more pronounced for different inputs. Please see the
228
240
</details>
229
241
230
242
<details>
231
-
<summary> <b> Q3: </b> My Mac has 8GB RAM and I am converting models to Core ML using the example command. The process is geting killed because of memory issues. How do I fix this issue? </summary>
243
+
<summary> <b> <aname="low-mem-conversion"></a> Q3: </b> My Mac has 8GB RAM and I am converting models to Core ML using the example command. The process is geting killed because of memory issues. How do I fix this issue? </summary>
232
244
233
245
<b> A3: </b> In order to minimize the memory impact of the model conversion process, please execute the following command instead:
<summary> <b> Q5: </b> Every time I generate an image using the Python pipeline, loading all the Core ML models takes 2-3 minutes. Is this expected? </summary>
258
270
259
271
<b> A5: </b> Yes and using the Swift library reduces this to just a few seconds. The reason is that `coremltools` loads Core ML models (`.mlpackage`) and each model is compiled to be run on the requested compute unit during load time. Because of the size and number of operations of the unet model, it takes around 2-3 minutes to compile it for Neural Engine execution. Other models should take at most a few seconds. Note that `coremltools` does not cache the compiled model for later loads so each load takes equally long. In order to benefit from compilation caching, `StableDiffusion` Swift package by default relies on compiled Core ML models (`.mlmodelc`) which will be compiled down for the requested compute unit upon first load but then the cache will be reused on subsequent loads until it is purged due to lack of use.
272
+
273
+
If you intend to use the Python pipeline in an application, we recommend initializing the pipeline once so that the load time is only incurred once. Afterwards, generating images using different prompts and random seeds will not incur the load time for the current session of your application.
274
+
260
275
</details>
261
276
277
+
262
278
<details>
263
-
<summary> <b> Q6: </b> I want to deploy <code>StableDiffusion</code>, the Swift package, in my mobile app. What should I be aware of?" </summary>
279
+
<summary> <b> <aname="q-mobile-app"></a> Q6: </b> I want to deploy <code>StableDiffusion</code>, the Swift package, in my mobile app. What should I be aware of? </summary>
264
280
265
-
<b> A6: </b> [This section](#swift-requirements) describes the minimum SDK and OS versions as well as the device models supported by this package. In addition to these requirements, for best practice, we recommend testing the package on the device with the least amount of RAM available among your deployment targets. This is due to the fact that `StableDiffusion` consumes approximately 2.6GB of peak memory during runtime while using `.cpuAndNeuralEngine` (the Swift equivalent of `coremltools.ComputeUnit.CPU_AND_NE`). Other compute units may have a higher peak memory consumption so `.cpuAndNeuralEngine` is recommended for iOS and iPadOS deployment (Please refer to this [section](#swift-requirements) for minimum device model requirements). If your app crashes during image generation, please try adding the [Increased Memory Limit](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_developer_kernel_increased-memory-limit) capability to your Xcode project which should significantly increase your app's memory limit.
281
+
<b> A6: </b>The [Image Generation with Swift](#image-gen-swift) section describes the minimum SDK and OS versions as well as the device models supported by this package. We recommend carefully testing the package on the device with the least amount of RAM available among your deployment targets.
282
+
283
+
The image generation process in `StableDiffusion` can yield over 2 GB of peak memory during runtime depending on the compute units selected. On iPadOS, we recommend using `.cpuAndNeuralEngine` in your configuration and the `reduceMemory` option when constructing a `StableDiffusionPipeline` to minimize memory pressure.
284
+
285
+
If your app crashes during image generation, consider adding the [Increased Memory Limit](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_developer_kernel_increased-memory-limit) capability to inform the system that some of your app’s core features may perform better by exceeding the default app memory limit on supported devices.
286
+
287
+
On iOS, depending on the iPhone model, Stable Diffusion model versions, selected compute units, system load and design of your app, this may still not be sufficient to keep your apps peak memory under the limit. Please remember, because the device shares memory between apps and iOS processes, one app using too much memory can compromise the user experience across the whole device.
<b> A10: </b> This warning is safe to ignore in the context of this repository.
331
+
332
+
</details>
333
+
334
+
<details>
335
+
<summary> <b> Q11: </b> <code> TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect </code> </summary>
336
+
337
+
<b> A11: </b> This warning is safe to ignore in the context of this repository.
338
+
</details>
339
+
340
+
<details>
341
+
<summary> <b> Q12: </b> <code> UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown </code> </summary>
342
+
343
+
<b> A12: </b> If this warning is printed right after <code> zsh: killed python -m python_coreml_stable_diffusion.torch2coreml ... </code>, then it is highly likely that your Mac has run out of memory while converting models to Core ML. Please see [Q3](#low-mem-conversion) from above for the solution.
0 commit comments