You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| iPhone 14 Pro Max |`CPU_AND_NE`|`SPLIT_EINSUM_V2`| 7.9 | 2.69 |
64
+
| iPad Pro (M1) |`CPU_AND_NE`|`SPLIT_EINSUM_V2`| 11.2 | 2.19 |
65
+
| iPad Pro (M2) |`CPU_AND_NE`|`SPLIT_EINSUM_V2`| 7.0 | 3.07 |
68
66
69
67
<details>
70
68
<summary> Details (Click to expand) </summary>
71
69
72
-
- This benchmark was conducted by Apple using public beta versions of iOS 17.0, iPadOS 17.0 and macOS 14.0 in June 2023.
70
+
- This benchmark was conducted by Apple using public beta versions of iOS 17.0, iPadOS 17.0 and macOS 14.0 Seed 8 in August 2023.
73
71
- The performance data was collected by running the `StableDiffusion` Swift pipeline.
74
72
- Swift code is not fully optimized, introducing up to ~10% overhead unrelated to Core ML model execution.
75
-
- The median latency value across 3 back-to-back end-to-end executions are reported
73
+
- The median latency value across 5 back-to-back end-to-end executions are reported
76
74
- The image generation procedure follows the standard configuration: 20 inference steps, 512x512 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).
77
75
- The actual prompt length does not impact performance because the Core ML model is converted with a static shape that computes the forward pass for all of the 77 elements (`tokenizer.model_max_length`) in the text token sequence regardless of the actual length of the input text.
78
76
- Weights are compressed to 6 bit precision. Please refer to [this section](#compression) for details.
- In the benchmark table, we report the best performing `--compute-unit` and `--attention-implementation` values per device. The former does not modify the Core ML model and can be applied during runtime. The latter modifies the Core ML model. Note that the best performing compute unit is model version and hardware-specific.
82
80
- Note that the performance optimizations in this repository (e.g. `--attention-implementation`) are generally applicable to Transformers and not customized to Stable Diffusion. Better performance may be observed upon custom kernel tuning. Therefore, these numbers do not represent **peak** HW capability.
83
81
- Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context.
84
-
- Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state. Given these factors, we do not report sub-second variance in latency.
82
+
- Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state.
coremltools-7.0 supports advanced weight compression techniques for [pruning](https://coremltools.readme.io/v7.0/docs/pruning), [palettization](https://coremltools.readme.io/v7.0/docs/palettization-overview) and [linear 8-bit quantization](https://coremltools.readme.io/v7.0/docs/quantization-aware-training). For these techniques, `coremltools.optimize.torch.*` includes APIs that require fine-tuning to maintain accuracy at higher compression rates whereas `coremltools.optimize.coreml.*` includes APIs that are applied post-training and are data-free.
113
114
114
115
We demonstrate how data-free [post-training palettization](https://coremltools.readme.io/v7.0/docs/post-training-palettization) implemented in `coremltools.optimize.coreml.palettize_weights` enables us to achieve greatly improved performance for Stable Diffusion on mobile devices. This API implements the [Fast Exact k-Means](https://arxiv.org/abs/1701.07204) algorithm for optimal weight clustering which yields more accurate palettes. Using `--quantize-nbits {2,4,6,8}` during [conversion](#converting-models-to-coreml) is going to apply this compression to the unet and text_encoder models.
@@ -129,6 +130,8 @@ Resources:
129
130
-[Core ML Tools Docs: Optimizing Models](https://coremltools.readme.io/v7.0/docs/optimizing-models)
130
131
-[WWDC23 Session Video: Use Core ML Tools for machine learning model compression](https://developer.apple.com/videos/play/wwdc2023/10047)
0 commit comments