-
Notifications
You must be signed in to change notification settings - Fork 222
Description
When using the GGUF version of Wan2.2 I2V, I found that the first run works normally, but the second run consumes significantly more memory, resulting in what appears to be a memory leak. The same issue occurs with the GGUF version of Qwen Image Edit—memory usage gradually increases with each loop iteration until it eventually leads to an out-of-memory (OOM) error.
Based on my observation, the cause of this issue might be that when GPU memory is insufficient, the system offloads the model to the CPU, and when the model is loaded back onto the GPU during the second run, the memory on the CPU is not properly released. I’m not certain if this is indeed the root cause or how exactly to resolve it. Could someone please offer some suggestions or guidance? Thank you very much.