-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Name and Version
build: 6387 (4fd1242) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./build/bin/llama-server -np 2 -dev none --embedding --port 9999 --ctx-size 8192 -m models/jina-embeddings-v2-base-en-Q8_0.gguf
Problem description & steps to reproduce
The given command line fails with:
/home/user/src/llama.cpp/ggml/src/ggml.c:3023: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
[New LWP 1246574]
[New LWP 1246578]
[New LWP 1246579]
[New LWP 1246580]
[New LWP 1246581]
[New LWP 1246582]
[New LWP 1246583]
[New LWP 1246584]
[New LWP 1246585]
[New LWP 1246586]
[New LWP 1246587]
[New LWP 1246588]
[New LWP 1246589]
[New LWP 1246590]
[New LWP 1246591]
[New LWP 1246592]
[New LWP 1246593]
[New LWP 1246594]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f9957e53c17 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#0 0x00007f9957e53c17 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f995829335b in ggml_print_backtrace () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#2 0x00007f99582934ae in ggml_abort () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#3 0x00007f99582982e3 in ggml_mul_mat () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#4 0x00007f99583e6e50 in llm_graph_context::build_pooling(ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*) const () from /home/mogs/src/llama.cpp/build/bin/libllama.so
#5 0x00007f995841d26c in llama_model::build_graph(llm_graph_params const&) const () from /home/user/src/llama.cpp/build/bin/libllama.so
#6 0x00007f99583bb56c in llama_context::graph_reserve(unsigned int, unsigned int, unsigned int, llama_memory_context_i const*, bool) () from /home/user/src/llama.cpp/build/bin/libllama.so
#7 0x00007f99583bfdf2 in llama_context::llama_context(llama_model const&, llama_context_params) () from /home/user/src/llama.cpp/build/bin/libllama.so
#8 0x00007f99583c0536 in llama_init_from_model () from /home/user/src/llama.cpp/build/bin/libllama.so
#9 0x000055fc1ac1a123 in common_init_from_params(common_params&) ()
#10 0x000055fc1aaf8664 in server_context::load_model(common_params const&) ()
#11 0x000055fc1aa9306c in main ()
[Inferior 1 (process 1246560) detached]
Aborted
But ./build/bin/llama-server -np 2 -dev none --port 9999 --ctx-size 8192 -m models/jina-embeddings-v2-base-en-Q8_0.gguf
works (less the --embedding which isn't what I need)
Curiously it also reports the context size setting as incorrect:
llama_context: n_ctx_per_seq (4096) < n_ctx_train (8192) -- the full capacity of the model will not be utilized
When it's explicitly set at 8129 which works when --embedding is omitted.
First Bad Commit
No response
Relevant log output
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working