Skip to content

[BUG]: Qwen3 VL + HandleRunOutOfContext() #1336

@aropb

Description

@aropb

Description

The error occurs when filling in the context.

Tried the options InferenceParams:
TokensKeep: -1, 0, value

Code:
...
InteractiveExecutor executor = new(context, logger);
...
await foreach (string text in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, query), false, inferenceParams, cancellationToken))
{
}

Critical error:

D:\a\LLamaSharp\LLamaSharp\src\llama-kv-cache.cpp:398: GGML_ASSERT(hparams.n_pos_per_embd() == 1 && "seq_add() is only supported for n_pos_per_embd() == 1") failed

Environment & Configuration

  • Operating system: Windows 11
  • .NET runtime version: NET 10.0.3
  • LLamaSharp version: 0.26.0
  • CUDA version (if you are using cuda backend): 12.9
  • CPU & GPU device: GPU RTX 5090

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions