Skip to content

Commit cf16df8

Browse files
authored
Merge pull request #348 from lamtrinhdev/Update_README.md
Correct wording in README.md
2 parents c891f43 + cf4e1e8 commit cf16df8

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ MBP builds a per-layer "palettization recipe" by picking a suitable number of bi
194194

195195
For example, the original float16 [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model has an ~82 dB signal strength. Naively applying [linear 8-bit quantization](https://coremltools.readme.io/docs/data-free-quantization) to the Unet model drops the signal to ~65 dB. Instead, applying MBP yields an average of 2.81-bits quantization while maintaining a signal strength of ~67 dB. This technique generally yields better results compared to using `--quantize-nbits` during model conversion but requires a "pre-analysis" run that takes up to a few hours on a single GPU (`mps` or `cuda`).
196196

197-
Here is the signal strength (PSNR in dB) versus model size reduction (% of float16 size) for `stabilityai/stable-diffusion-xl-base-1.0`. The `{1,2,4,6,8}-bit` curves are generated by progresssively palettizing more layers using a palette with fixed number of bits. The layers were ordered in ascending order of their isolated impact to end-to-end signal strength so the cumulative compression's impact is delayed as much as possible. The mixed-bit curve is based on falling back to a higher number of bits as soon as a layer's isolated impact to end-to-end signal integrity drops below a threshold. Note that all curves based on palettization outperform linear 8-bit quantization at the same model size except for 1-bit.
197+
Here is the signal strength (PSNR in dB) versus model size reduction (% of float16 size) for `stabilityai/stable-diffusion-xl-base-1.0`. The `{1,2,4,6,8}-bit` curves are generated by progressively palettizing more layers using a palette with fixed number of bits. The layers were ordered in ascending order of their isolated impact to end-to-end signal strength so the cumulative compression's impact is delayed as much as possible. The mixed-bit curve is based on falling back to a higher number of bits as soon as a layer's isolated impact to end-to-end signal integrity drops below a threshold. Note that all curves based on palettization outperform linear 8-bit quantization at the same model size except for 1-bit.
198198

199199
<img src="assets/mbp/stabilityai_stable-diffusion-xl-base-1.0_psnr_vs_size.png" width="640">
200200

0 commit comments

Comments
 (0)