Running Unsloth's quants on KTransformers running in K8S #995
Unanswered
maximeozenne
asked this question in
Q&A
Replies: 2 comments
-
The ktransformers can only accelerated for the MoE model in the following optimize_rules. |
Beta Was this translation helpful? Give feedback.
0 replies
-
When I had
I simply tried "pip install openai" i think and it worked. And try with the latest image! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I bought a gaming computer some years ago, and I'm trying to use it to locally run LLM. To be more precise, I want to use CrewAI.
I don't want to buy others GPU to be able to run heavier models, so I'm trying to use KTransformers as my inference engine. If I'm correct, it allows me to run my LLM on a hybrid setup, GPU and RAM.
I actually own a RTX 4090 and 32gb of RAM. My motherboard and CPU can handle up to 192gb of RAM, which I'm planning to buy if I'm able to achieve my actual test. Here is what I've done so far :
I've set up a dual boot, so I'm running Ubuntu 24.04.2 on my bare computer. No WSL.
I've set up a microk8s to support some functionalities that KTransformers does not offer at the moment :
Now I'm trying to run the unsloth's quants of Phi-4, because I really like the work of the unsloth team, and because they provide GGUF, I assume we can use it with KTransformers? I've seen some people running unsloth's Deepseek R1 quants on KTransformers so I guess we can do it with their other models.
But I'm not able to run it. I don't know what I'm doing wrong.
I've tried with 2 KTransformers images : 0.2.1 and latest-AVX2 (I have a I7-13700K so I can't use the AVX512 version). Both failed either because the 0.2.1 is AVX512 only, and the latest-AVX2 require to inject an openai component, something I want to avoid. I'm assuming the image is correct, and if it does not work, the fault is on my side :
So I'm actually running the v0.2.2rc2-AVX2, and now it seems the problem comes from the model or the tokenizer.
I've downloaded the Q4_K_M quants from unsloth's phi-4 repo : https://huggingface.co/unsloth/phi-4-GGUF/tree/main
My first issue was the missing config.json. So I've downloaded it, plus the others config files from the official microsoft/phi-4 repo : https://huggingface.co/microsoft/phi-4/tree/main
But now the error is the following :
But I'm still receiving the same error. Even if it's not an Unsloth's custom quants.
ChatGPT is telling me that the binary is passing the value for "prefill_device" twice, and I should patch the code of the KTransformers image myself. I don't want to patch or recompile the docker image, as I said I think the official image is good and I'm the one who's doing something wrong.
Can you help me running KTransformers please?
Beta Was this translation helpful? Give feedback.
All reactions