Replies: 1 comment
-
It looks like there is an advanced PR upstream to make MMQ the default when optimization is available. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I recently conducted experiments with all my NVIDIA cards and achieved superior results with less memory using MMQ.
With MMQ, I can now offload more layers. Even when offloading the same number of layers, I still observe a slight performance improvement.
I believe this enhancement is due to the upstream PR.
Have you had a similar experience? Do you think it's time to update the MMQ documentation?
Beta Was this translation helpful? Give feedback.
All reactions