Replies: 2 comments 1 reply
-
This sounds like a llama.cpp problem, not lmql related. Make sure you install llama cpp with the following command:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Yes turn out it is a llama-cpp-python problem. Because it is working with llama.cpp. Maybe they will fix in future release, they are always far behind llama.cpp |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, when trying to use the model in serve-model, the model always gets reloaded in cpu instead using the model loaded in gpu.
i am able to load my model in local using all 3 gpu with code:
query_string = '''
"Q: In one word, What is the captital of {country}? \n"
"A: [CAPITAL] \n"
"Q: What is the main sight in {CAPITAL}? \n"
"A: [ANSWER]" where (len(TOKENS(CAPITAL)) < 10)
and (len(TOKENS(ANSWER)) < 200) and STOPS_AT(CAPITAL, '\n')
and STOPS_AT(ANSWER, '\n')
'''
print(lmql.run_sync(query_string,
country="united kingdom",
model = lmql.model("local:llama.cpp:/home/ebudmada/llama.cpp/phind-codellama-34b-v2.Q8_0.gguf",
cuda=True,
n_ctx=512,
n_gpu_layers=-1,
tokenizer = 'Phind/Phind-CodeLlama-34B-v2')).variables)
but i am unable to use the serve-model instance: loading using terminal: lmql serve-model llama.cpp:/home/ebudmada/llama.cpp/phind-codellama-34b-v2.Q8_0.gguf --cuda --n_ctx=512 n_gpu_layer=-1 --trust_remote_code True
[Serving LMTP endpoint on ws://localhost:8080/]
I see the gpu ram being used a little (276mb each 3x)
but when i try to use the model with my python script, it is always loading a new model in cpu. : i try a lot of things, but essentially using:
print(lmql.run_sync(query_string,
country="united kingdom",
model = lmql.model("llama.cpp:/home/ebudmada/llama.cpp/phind-codellama-34b-v2.Q8_0.gguf",
cuda=True,
n_ctx=512,
n_gpu_layers=-1,
endpoint="localhost:8080",
tokenizer = 'Phind/Phind-CodeLlama-34B-v2')).variables)
can anyone help me? Thank you
Beta Was this translation helpful? Give feedback.
All reactions