Eval bug: gpt-oss debugging response issues

### Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition, compute capability 12.0, VMM: yes
version: 6097 (9515c613)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

dual EPYC 9355, blackwell 6000 pro

### Models

I have two Open Hands AI instances pointed at the same llama.cpp server running gpt-oss 120b.

When I prompt the Mac - Docker (linux) instance with the prompt:

```
What do you see in the /workspace dir?
```

gpt-oss sends back this response:

```json
        "choices": [
            {
                "finish_reason": "stop",
                "index": 0,
                "message": {
                    "role": "assistant",
                    "reasoning_content": "The user asks: \"What do you see in the /workspace dir?\" So we need to list the contents of /workspace. Use appropriate tool. The best tool is read_file? Actually we need to list directory contents. The str_replace_editor view command can view a directory: view command with path /workspace. That will list non-hidden files and directories up to 2 levels deep. Or we can use execute_bash \"ls -R /workspace\" or \"ls -la /workspace\". The guidelines: Use efficient tools like find, grep, etc. For listing directory, we can use execute_bash. Let's do that.\n\nWe need to output the function call. Use <function=execute_bash> with command \"ls -R /workspace\". That will list recursively. Or \"ls -la /workspace\". Let's do \"ls -R /workspace\". That will show all files.\n\nThus we will call execute_bash.",
                    "content": "<function=execute_bash>\n<parameter=command>ls -R /workspace</parameter>\n"
                }
            }
        ],
```

This is fine and the agent session continues.

However, if I do the same thing on the Open Hands AI Windows instance, with this prompt:

```
What do you see at `C:\Users\jesse\Documents\dev` ?
```

gpt-oss sends back this response:

```json
        "choices": [
            {
                "finish_reason": "stop",
                "index": 0,
                "message": {
                    "role": "assistant",
                    "reasoning_content": "The user asks: \"What do you see at `C:\\Users\\jesse\\Documents\\dev` ?\" This is a Windows path. The environment is a Linux container? The repository is likely a codebase. The user wants to see the contents of that directory. We need to explore the file system. The path is absolute Windows path. In the container, we may not have that path. We need to explore the file system to locate that path. The instructions: \"When a user provides a file path, do NOT assume it's relative to the current working directory. First explore the file system to locate the file before working on it.\"\n\nThus we need to explore the file system. Use execute_bash to list directories. However, the path is Windows style. In Linux, the path might be under /mnt/c/Users/jesse/Documents/dev if using WSL. But we are in a container environment. The repository root is /workspace. The user asks \"What do you see at `C:\\Users\\jesse\\Documents\\dev` ?\" Possibly they want to list the contents of that directory. We need to check if that path exists. Use `ls -R` or `find`. Let's try to see if that path exists. Use `execute_bash` with `ls -R` or `find`. But we need to be careful: the path may not exist. We can try `ls -l /mnt/c/Users/jesse/Documents/dev` or `ls -l \"C:\\Users\\jesse\\Documents\\dev\"` but in Linux, backslashes are escape characters. We can try `ls -l /c/Users/jesse/Documents/dev`? Not sure.\n\nBetter approach: Use `find` to locate any directory named \"dev\" under /workspace or root. But the path is Windows style, maybe the repository includes a folder named \"C:\\Users\\jesse\\Documents\\dev\"? Unlikely. Could be a Windows path in a config file. But the user asks \"What do you see at `C:\\Users\\jesse\\Documents\\dev` ?\" So we need to list the contents of that directory. Let's try to see if that path exists in the container. Use `execute_bash` with `ls -la` on that path. In Linux, we can try `ls -la /mnt/c/Users/jesse/Documents/dev`. If not, we can try `ls -la /c/Users/jesse/Documents/dev`. Or we can try `ls -la /Users/jesse/Documents/dev`. But we need to explore.\n\nFirst, let's list root directories: `ls -R /` maybe too large. Let's start with `ls -la /` to see top-level. Then we can see if any mount points. Use `df -h` maybe.\n\nBut the user wants to see the contents of that path. So we need to try to locate it. Let's start by checking if `/mnt/c/Users/jesse/Documents/dev` exists. Use `execute_bash`.<|start|>assistant<|channel|>commentary<|message|>We will run a bash command to list the directory.<|start|>assistant<|channel|>commentary to=execute_bash json<|message|>{\n  \"command\": \"ls -la \\\"/mnt/c/Users/jesse/Documents/dev\\\"\"\n}",
                    "content": ""
                }
            }
        ],
```

Note that there are jinja tags in the `reasoning_content`. Also, there is no data in the `content`.

I don't understand why it is putting jinja in the `reasoning_content`.

It also makes me wonder if gpt-oss accidentally sent a jinja tag that ended the turn.

How can I debug this sort of thing further?

### Problem description & steps to reproduce

See above

### First Bad Commit

_No response_

### Relevant log output

```shell
N/A
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: gpt-oss debugging response issues #15113

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: gpt-oss debugging response issues #15113

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions