Misc. bug: Can't Parallel with --embedding

### Name and Version

build: 6387 (4fd1242be) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
./build/bin/llama-server -np 2 -dev none --embedding --port 9999 --ctx-size 8192 -m models/jina-embeddings-v2-base-en-Q8_0.gguf
```

### Problem description & steps to reproduce

The given command line fails with:

```
/home/user/src/llama.cpp/ggml/src/ggml.c:3023: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
[New LWP 1246574]
[New LWP 1246578]
[New LWP 1246579]
[New LWP 1246580]
[New LWP 1246581]
[New LWP 1246582]
[New LWP 1246583]
[New LWP 1246584]
[New LWP 1246585]
[New LWP 1246586]
[New LWP 1246587]
[New LWP 1246588]
[New LWP 1246589]
[New LWP 1246590]
[New LWP 1246591]
[New LWP 1246592]
[New LWP 1246593]
[New LWP 1246594]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f9957e53c17 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007f9957e53c17 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f995829335b in ggml_print_backtrace () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#2  0x00007f99582934ae in ggml_abort () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#3  0x00007f99582982e3 in ggml_mul_mat () from /home/user/src/llama.cpp/build/bin/libggml-base.so
#4  0x00007f99583e6e50 in llm_graph_context::build_pooling(ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*) const () from /home/mogs/src/llama.cpp/build/bin/libllama.so
#5  0x00007f995841d26c in llama_model::build_graph(llm_graph_params const&) const () from /home/user/src/llama.cpp/build/bin/libllama.so
#6  0x00007f99583bb56c in llama_context::graph_reserve(unsigned int, unsigned int, unsigned int, llama_memory_context_i const*, bool) () from /home/user/src/llama.cpp/build/bin/libllama.so
#7  0x00007f99583bfdf2 in llama_context::llama_context(llama_model const&, llama_context_params) () from /home/user/src/llama.cpp/build/bin/libllama.so
#8  0x00007f99583c0536 in llama_init_from_model () from /home/user/src/llama.cpp/build/bin/libllama.so
#9  0x000055fc1ac1a123 in common_init_from_params(common_params&) ()
#10 0x000055fc1aaf8664 in server_context::load_model(common_params const&) ()
#11 0x000055fc1aa9306c in main ()
[Inferior 1 (process 1246560) detached]
Aborted
```

But ./build/bin/llama-server -np 2 -dev none --port 9999 --ctx-size 8192 -m models/jina-embeddings-v2-base-en-Q8_0.gguf
works (less the --embedding which isn't what I need)

Curiously it also reports the context size setting as incorrect:
llama_context: n_ctx_per_seq (4096) < n_ctx_train (8192) -- the full capacity of the model will not be utilized
When it's explicitly set at 8129 which works when --embedding is omitted.

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Can't Parallel with --embedding #15849

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Can't Parallel with --embedding #15849

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions