Something magic in here... #10
                  
                    
                      trinhdoduyhungss
                    
                  
                
                  started this conversation in
                General
              
            Replies: 1 comment 7 replies
-
| 
         Please tell model size and data type (FP16, FP32, Q4_0, Q4_1). This is a known issue: #8 I've bumped the memory today morning. Pleast try latest commit and check if it works. If it does not work, then that's a bug in   | 
  
Beta Was this translation helpful? Give feedback.
                  
                    7 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
I tried to run your code on my computer (Windows OS). Everything's so smooth and it works so well. No matter if my local computer only has 16GB RAM (just 8GB available), 14B raven still works well (even though so slow - 2 minutes for answering "What is your name"). However...The magic started from here when I push the weight into my server (Linux) with 126GB RAM (100GB available) and 32 cores CPU....but I got the error:
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 10662867344, available 10662862009)
Segmentation fault (core dumped)
What happens here?
I tried to download, convert it to the ggml, quantize the model one more time at the server, and run it again but still got the error above. Hmmm, I have no idea...
Beta Was this translation helpful? Give feedback.
All reactions