Running Unsloth's quants on KTransformers running in K8S #995

maximeozenne · 2025-03-28T18:03:29Z

maximeozenne
Mar 28, 2025

Hello!

I bought a gaming computer some years ago, and I'm trying to use it to locally run LLM. To be more precise, I want to use CrewAI.

I don't want to buy others GPU to be able to run heavier models, so I'm trying to use KTransformers as my inference engine. If I'm correct, it allows me to run my LLM on a hybrid setup, GPU and RAM.

I actually own a RTX 4090 and 32gb of RAM. My motherboard and CPU can handle up to 192gb of RAM, which I'm planning to buy if I'm able to achieve my actual test. Here is what I've done so far :

I've set up a dual boot, so I'm running Ubuntu 24.04.2 on my bare computer. No WSL.

I've set up a microk8s to support some functionalities that KTransformers does not offer at the moment :

deploy multiple pods running KTransformers, behind one endpoint per model ( /qwq, /mistral...) and switch model at each request
Unload unused pods after 5 minutes of inactivity, to save my RAM
Load balance the needs of CrewAI by deploying one pod per agent

Now I'm trying to run the unsloth's quants of Phi-4, because I really like the work of the unsloth team, and because they provide GGUF, I assume we can use it with KTransformers? I've seen some people running unsloth's Deepseek R1 quants on KTransformers so I guess we can do it with their other models.

But I'm not able to run it. I don't know what I'm doing wrong.

I've tried with 2 KTransformers images : 0.2.1 and latest-AVX2 (I have a I7-13700K so I can't use the AVX512 version). Both failed either because the 0.2.1 is AVX512 only, and the latest-AVX2 require to inject an openai component, something I want to avoid. I'm assuming the image is correct, and if it does not work, the fault is on my side :

from openai.types.completion_usage import CompletionUsage
ModuleNotFoundError: No module named 'openai'

So I'm actually running the v0.2.2rc2-AVX2, and now it seems the problem comes from the model or the tokenizer.

I've downloaded the Q4_K_M quants from unsloth's phi-4 repo : https://huggingface.co/unsloth/phi-4-GGUF/tree/main
My first issue was the missing config.json. So I've downloaded it, plus the others config files from the official microsoft/phi-4 repo : https://huggingface.co/microsoft/phi-4/tree/main

But now the error is the following :

TypeError: BaseInjectedModule.__init__() got multiple values for argument 'prefill_device'
I don't know what I can try next. I've tried with another model, from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF

But I'm still receiving the same error. Even if it's not an Unsloth's custom quants.

ChatGPT is telling me that the binary is passing the value for "prefill_device" twice, and I should patch the code of the KTransformers image myself. I don't want to patch or recompile the docker image, as I said I think the official image is good and I'm the one who's doing something wrong.

Can you help me running KTransformers please?

cyhasuka · 2025-03-31T03:32:13Z

cyhasuka
Mar 31, 2025

The ktransformers can only accelerated for the MoE model in the following optimize_rules.
So, if u want to run phi-4 or Llama-2-7B (which is the dense arc, NOT MoE), maybe Ollama or vllm is the best choice.

0 replies

lililolo0927 · 2025-04-01T11:24:09Z

lililolo0927
Apr 1, 2025

When I had

 from openai.types.completion_usage import CompletionUsage
ModuleNotFoundError: No module named 'openai'

I simply tried "pip install openai" i think and it worked. And try with the latest image!
I am looking forward to see the result of ur attempt with k8s !

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running Unsloth's quants on KTransformers running in K8S #995

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Running Unsloth's quants on KTransformers running in K8S #995

Uh oh!

Uh oh!

maximeozenne Mar 28, 2025

Replies: 2 comments

Uh oh!

cyhasuka Mar 31, 2025

Uh oh!

Uh oh!

lililolo0927 Apr 1, 2025

maximeozenne
Mar 28, 2025

cyhasuka
Mar 31, 2025

lililolo0927
Apr 1, 2025