Skip to content

Misc. bug: Can't Parallel with --embedding #15849

@Fluffkin

Description

@Fluffkin

Name and Version

build: 6387 (4fd1242) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./build/bin/llama-server -np 2 -dev none --embedding --port 9999 --ctx-size 8192 -m models/jina-embeddings-v2-base-en-Q8_0.gguf

Problem description & steps to reproduce

The given command line fails with:

/home/user/src/llama.cpp/ggml/src/ggml.c:3023: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
[New LWP 1246574]
[New LWP 1246578]
[New LWP 1246579]
[New LWP 1246580]
[New LWP 1246581]
[New LWP 1246582]
[New LWP 1246583]
[New LWP 1246584]
[New LWP 1246585]
[New LWP 1246586]
[New LWP 1246587]
[New LWP 1246588]
[New LWP 1246589]
[New LWP 1246590]
[New LWP 1246591]
[New LWP 1246592]
[New LWP 1246593]
[New LWP 1246594]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f9957e53c17 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007f9957e53c17 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f995829335b in ggml_print_backtrace () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#2  0x00007f99582934ae in ggml_abort () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#3  0x00007f99582982e3 in ggml_mul_mat () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#4  0x00007f99583e6e50 in llm_graph_context::build_pooling(ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*) const () from /home/mogs/src/llama.cpp/build/bin/libllama.so
#5  0x00007f995841d26c in llama_model::build_graph(llm_graph_params const&) const () from /home/user/src/llama.cpp/build/bin/libllama.so
#6  0x00007f99583bb56c in llama_context::graph_reserve(unsigned int, unsigned int, unsigned int, llama_memory_context_i const*, bool) () from /home/user/src/llama.cpp/build/bin/libllama.so
#7  0x00007f99583bfdf2 in llama_context::llama_context(llama_model const&, llama_context_params) () from /home/user/src/llama.cpp/build/bin/libllama.so
#8  0x00007f99583c0536 in llama_init_from_model () from /home/user/src/llama.cpp/build/bin/libllama.so
#9  0x000055fc1ac1a123 in common_init_from_params(common_params&) ()
#10 0x000055fc1aaf8664 in server_context::load_model(common_params const&) ()
#11 0x000055fc1aa9306c in main ()
[Inferior 1 (process 1246560) detached]
Aborted

But ./build/bin/llama-server -np 2 -dev none --port 9999 --ctx-size 8192 -m models/jina-embeddings-v2-base-en-Q8_0.gguf
works (less the --embedding which isn't what I need)

Curiously it also reports the context size setting as incorrect:
llama_context: n_ctx_per_seq (4096) < n_ctx_train (8192) -- the full capacity of the model will not be utilized
When it's explicitly set at 8129 which works when --embedding is omitted.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions