[BUG]: Qwen3 VL + HandleRunOutOfContext()

### Description

The error occurs when filling in the context.

Tried the options InferenceParams:
TokensKeep: -1, 0, value

Code:
...
InteractiveExecutor executor = new(context, logger);
...
await foreach (string text in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, query), false, inferenceParams, cancellationToken))
{
}

Critical error:

D:\a\LLamaSharp\LLamaSharp\src\llama-kv-cache.cpp:398: GGML_ASSERT(hparams.n_pos_per_embd() == 1 && "seq_add() is only supported for n_pos_per_embd() == 1") failed

### Environment & Configuration

- Operating system: Windows 11
- .NET runtime version:  NET 10.0.3
- LLamaSharp version: 0.26.0
- CUDA version (if you are using cuda backend): 12.9
- CPU & GPU device:  GPU RTX 5090

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Qwen3 VL + HandleRunOutOfContext() #1336

Description

Environment & Configuration

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: Qwen3 VL + HandleRunOutOfContext() #1336

Description

Description

Environment & Configuration

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions