Replies: 1 comment
-
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Just an idea.
It's not a problem or anything...
I've be using a custom offload for my potato GPU. Maybe there is another way to do it or so...
In short, I've being using sequential offloading for a long time, when I enable it it use a minimal of VRAM, however I know It could use more VRAM to do less IO, so I created a Mixin for partial CPU offload where the model can keep several layers on GPU and just offload some.
See code here: https://gist.github.com/rodjjo/20e2e842fea9ed58114adb560a4566b6
It's saving me 12 to 13 seconds of inference in zimage turbo (my custom pipeline with this partial layers offloading):
Before (normal sequential offloading):

After (partial sequential offloading):

Beta Was this translation helpful? Give feedback.
All reactions