GGUF appears to run much slower compared to same version of GGML model #959
              
                Unanswered
              
          
                  
                    
                      J-Scott-Dav
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
I am comparing performance of similar models pulled from HuggingFace. Using the question "Name the planets in the solar system?" I found a striking difference in performance. I was wondering if someone could comment on my observations below. Is this a fair comparison? Should these performance differences be expected, or perhaps I am doing something wrong?
Model Name: wizardlm-13b-v1.1-superhot-8k-ggmlv3.q4_0.bin
File size: 7.1 GB
llama-cpp-python version: 0.1.78
Time to answer question: 7.9 seconds
Model Name: wizardlm-13b-v1.2.ggmlv3.q4_0.bin
File size: 7.1 GB
llama-cpp-python-version: 0.1.78
Time to answer question: 15.6 seconds
Model Name: wizardlm-13b-v1.2.q4_k_m.gguf
File size: 7.6 GB
llama-cpp-python-version: 0.2.12
Time to answer question: 64.4 seconds
Any comments/suggestions would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions