Skip to content

Eval bug: gpt-oss debugging response issuesΒ #15113

@createthis

Description

@createthis

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition, compute capability 12.0, VMM: yes
version: 6097 (9515c61)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

dual EPYC 9355, blackwell 6000 pro

Models

I have two Open Hands AI instances pointed at the same llama.cpp server running gpt-oss 120b.

When I prompt the Mac - Docker (linux) instance with the prompt:

What do you see in the /workspace dir?

gpt-oss sends back this response:

        "choices": [
            {
                "finish_reason": "stop",
                "index": 0,
                "message": {
                    "role": "assistant",
                    "reasoning_content": "The user asks: \"What do you see in the /workspace dir?\" So we need to list the contents of /workspace. Use appropriate tool. The best tool is read_file? Actually we need to list directory contents. The str_replace_editor view command can view a directory: view command with path /workspace. That will list non-hidden files and directories up to 2 levels deep. Or we can use execute_bash \"ls -R /workspace\" or \"ls -la /workspace\". The guidelines: Use efficient tools like find, grep, etc. For listing directory, we can use execute_bash. Let's do that.\n\nWe need to output the function call. Use <function=execute_bash> with command \"ls -R /workspace\". That will list recursively. Or \"ls -la /workspace\". Let's do \"ls -R /workspace\". That will show all files.\n\nThus we will call execute_bash.",
                    "content": "<function=execute_bash>\n<parameter=command>ls -R /workspace</parameter>\n"
                }
            }
        ],

This is fine and the agent session continues.

However, if I do the same thing on the Open Hands AI Windows instance, with this prompt:

What do you see at `C:\Users\jesse\Documents\dev` ?

gpt-oss sends back this response:

        "choices": [
            {
                "finish_reason": "stop",
                "index": 0,
                "message": {
                    "role": "assistant",
                    "reasoning_content": "The user asks: \"What do you see at `C:\\Users\\jesse\\Documents\\dev` ?\" This is a Windows path. The environment is a Linux container? The repository is likely a codebase. The user wants to see the contents of that directory. We need to explore the file system. The path is absolute Windows path. In the container, we may not have that path. We need to explore the file system to locate that path. The instructions: \"When a user provides a file path, do NOT assume it's relative to the current working directory. First explore the file system to locate the file before working on it.\"\n\nThus we need to explore the file system. Use execute_bash to list directories. However, the path is Windows style. In Linux, the path might be under /mnt/c/Users/jesse/Documents/dev if using WSL. But we are in a container environment. The repository root is /workspace. The user asks \"What do you see at `C:\\Users\\jesse\\Documents\\dev` ?\" Possibly they want to list the contents of that directory. We need to check if that path exists. Use `ls -R` or `find`. Let's try to see if that path exists. Use `execute_bash` with `ls -R` or `find`. But we need to be careful: the path may not exist. We can try `ls -l /mnt/c/Users/jesse/Documents/dev` or `ls -l \"C:\\Users\\jesse\\Documents\\dev\"` but in Linux, backslashes are escape characters. We can try `ls -l /c/Users/jesse/Documents/dev`? Not sure.\n\nBetter approach: Use `find` to locate any directory named \"dev\" under /workspace or root. But the path is Windows style, maybe the repository includes a folder named \"C:\\Users\\jesse\\Documents\\dev\"? Unlikely. Could be a Windows path in a config file. But the user asks \"What do you see at `C:\\Users\\jesse\\Documents\\dev` ?\" So we need to list the contents of that directory. Let's try to see if that path exists in the container. Use `execute_bash` with `ls -la` on that path. In Linux, we can try `ls -la /mnt/c/Users/jesse/Documents/dev`. If not, we can try `ls -la /c/Users/jesse/Documents/dev`. Or we can try `ls -la /Users/jesse/Documents/dev`. But we need to explore.\n\nFirst, let's list root directories: `ls -R /` maybe too large. Let's start with `ls -la /` to see top-level. Then we can see if any mount points. Use `df -h` maybe.\n\nBut the user wants to see the contents of that path. So we need to try to locate it. Let's start by checking if `/mnt/c/Users/jesse/Documents/dev` exists. Use `execute_bash`.<|start|>assistant<|channel|>commentary<|message|>We will run a bash command to list the directory.<|start|>assistant<|channel|>commentary to=execute_bash json<|message|>{\n  \"command\": \"ls -la \\\"/mnt/c/Users/jesse/Documents/dev\\\"\"\n}",
                    "content": ""
                }
            }
        ],

Note that there are jinja tags in the reasoning_content. Also, there is no data in the content.

I don't understand why it is putting jinja in the reasoning_content.

It also makes me wonder if gpt-oss accidentally sent a jinja tag that ended the turn.

How can I debug this sort of thing further?

Problem description & steps to reproduce

See above

First Bad Commit

No response

Relevant log output

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions