You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running both stable diffusion 1.5 and 2.1 as fp16 Onnx models on Android Snapdragon 8 Gen 1 chipset I get about 20 seconds per step (or 10 seconds per unet step times two for the two prompts)when doing single batches, positive and negative prompts separately. If I batch both the positive and negative prompts so they run at the same time it slows down and takes about 35 seconds a step. The models also take a lot of memory, the unet is less than 3GB but when loaded alone almost uses up the entirety of the memory available on my 16 GB device. Since it takes so long it also generates a lot of heat which is sure to be a problem on phones without a built in fan.
I've spent a couple of days trying to do quantization but I'm running on a laptop slower than my phone with no graphics card and I keep running into issues getting it to work. Often the tools in the Onnx libraries break the model so I must be using it wrong or am making incorrect assumptions.
What's the preferred method for preparing stable diffusion for local inference on Android? Does it need to be an ort model instead of an Onnx?
Edit2: So I was able to convert the model but would need support for the NhwcConv node in android, I believe the chipset might support that layout for convolutions but the android runtime that is live does not support it.
platform:mobileissues related to ONNX Runtime mobile; typically submitted using template
0 participants
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Models Tested:
https://huggingface.co/dynasoulstudio/stable-diffusion-v1-5-fp16-onnx
https://huggingface.co/dynasoulstudio/stable-diffusion-2-1-fp16-onnx
When running both stable diffusion 1.5 and 2.1 as fp16 Onnx models on Android Snapdragon 8 Gen 1 chipset I get about 20 seconds per step (or 10 seconds per unet step times two for the two prompts)when doing single batches, positive and negative prompts separately. If I batch both the positive and negative prompts so they run at the same time it slows down and takes about 35 seconds a step. The models also take a lot of memory, the unet is less than 3GB but when loaded alone almost uses up the entirety of the memory available on my 16 GB device. Since it takes so long it also generates a lot of heat which is sure to be a problem on phones without a built in fan.
I've spent a couple of days trying to do quantization but I'm running on a laptop slower than my phone with no graphics card and I keep running into issues getting it to work. Often the tools in the Onnx libraries break the model so I must be using it wrong or am making incorrect assumptions.
What's the preferred method for preparing stable diffusion for local inference on Android? Does it need to be an ort model instead of an Onnx?
Edit: Looking into this:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/onnx_model_unet.py
Edit2: So I was able to convert the model but would need support for the NhwcConv node in android, I believe the chipset might support that layout for convolutions but the android runtime that is live does not support it.
Beta Was this translation helpful? Give feedback.
All reactions