-
Notifications
You must be signed in to change notification settings - Fork 154
Closed
Description
What happened?
I no longer can load Kimi K2 after "Tool calls support from mainline" (0f9ecae) patch:
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 44063.76 MiB on device 0: cudaMalloc failed: out of memory
llama_model_load: error loading model: unable to allocate backend buffer
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mnt/neuro/models/Kimi-K2-Instruct/Kimi-K2-Instruct-IQ4_XS.gguf'
ERR [ load_model] unable to load model | tid="131404752666624" timestamp=1756967078 model="/mnt/neuro/models/Kimi-K2-Instruct/Kimi-K2-Instruct-IQ4_XS.gguf"
free(): invalid pointer
I use this command:
numactl --cpunodebind=0 --interleave=all /home/lissanro/pkgs/ik_llama.cpp/build/bin/llama-server \
--model /mnt/neuro/models/Kimi-K2-Instruct/Kimi-K2-Instruct-IQ4_XS.gguf \
--ctx-size 131072 --n-gpu-layers 62 --tensor-split 15,25,30,30 -mla 3 -fa -ctk q8_0 -amb 512 -fmoe -b 4096 -ub 4096 \
-ot "blk\.3\.ffn_up_exps=CUDA0, blk\.3\.ffn_gate_exps=CUDA0, blk\.3\.ffn_down_exps=CUDA0" \
-ot "blk\.4\.ffn_up_exps=CUDA1, blk\.4\.ffn_gate_exps=CUDA1, blk\.4\.ffn_down_exps=CUDA1" \
-ot "blk\.5\.ffn_up_exps=CUDA2, blk\.5\.ffn_gate_exps=CUDA2, blk\.5\.ffn_down_exps=CUDA2" \
-ot "blk\.6\.ffn_up_exps=CUDA3, blk\.6\.ffn_gate_exps=CUDA3, blk\.6\.ffn_down_exps=CUDA3" \
-ot "ffn_down_exps=CPU, ffn_up_exps=CPU, gate_exps=CPU" \
--threads 64 --host 0.0.0.0 --port 5000 \
--slot-save-path /var/cache/ik_llama.cpp/k2
Reverting the patch 0f9ecae helps to fix this issue. Additionally to reverting it, I also had to add this in src/llama-vocab.cpp to prevent a minor compile error due to some missing logging defines:
LLAMA_ATTRIBUTE_FORMAT(2, 3)
void llama_log_internal (ggml_log_level level, const char * format, ...);
void llama_log_callback_default(ggml_log_level level, const char * text, void * user_data);
#define LLAMA_LOG_INFO(...) llama_log_internal(GGML_LOG_LEVEL_INFO , __VA_ARGS__)
#define LLAMA_LOG_DEBUG(...) llama_log_internal(GGML_LOG_LEVEL_DEBUG , __VA_ARGS__)
#define LLAMA_LOG_WARN(...) llama_log_internal(GGML_LOG_LEVEL_WARN , __VA_ARGS__)
#define LLAMA_LOG_ERROR(...) llama_log_internal(GGML_LOG_LEVEL_ERROR, __VA_ARGS__)
Name and Version
latest git
What operating system are you seeing the problem on?
Linux
Relevant log output
Metadata
Metadata
Assignees
Labels
No labels