run_localGPT.py runs but very slow - How to run it faster? #231
-
I have NVIDIA GeForce GTX 1060, 6GB. My OS is Ubuntu 22.04. Python 3.10.11. Ram 32GB. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Your GPU is probably not used at all, which would explain the slow speed in answering. You are using a 7 billion parameter model without quantization, which means that with 16 bit weights ( = 2 byte), your model is 14 GB in size. As your GPU only has 6 GB it will probably not be useful for any reasonable model. For example, I have a 3070 with 8 GB and even with the 2-bit quantized version (which probably has a very low quality) of a 7 billion parameter model I run out of GPU RAM due to cuBLAS requiring extra space. |
Beta Was this translation helpful? Give feedback.
Your GPU is probably not used at all, which would explain the slow speed in answering.
You are using a 7 billion parameter model without quantization, which means that with 16 bit weights ( = 2 byte), your model is 14 GB in size.
As your GPU only has 6 GB it will probably not be useful for any reasonable model.
For example, I have a 3070 with 8 GB and even with the 2-bit quantized version (which probably has a very low quality) of a 7 billion parameter model I run out of GPU RAM due to cuBLAS requiring extra space.