[Bug] Differing behavior when predicting dicts in UI vs Python library

**Describe the bug**
Given a JSON output schema that includes a dict field (such that any keys can be predicted), when generating from the CLI, then the output dict is always empty. When attempting the same in the UI, this is not the case. The UI case is preferable!

Tested this with `google/gemini-2.0-flash-001` and `openai/gpt-4o-mini`, both via OpenRouter.

**Checks**

- [x] I've read the [troubleshooting guide](https://docs.getkiln.ai/docs/troubleshooting-and-logs)
- [x] I've tried to reproduce the problem using another model, and confirmed it's not an issue specific to the model I've chosen.
- [x] I've searched [the docs](https://docs.getkiln.ai) for a solution
- [x] I've searched for existing Github issues/discussions

**To Reproduce**
Steps to reproduce the behavior:

1. Use the Kiln task defined below.
2. Open the task in the UI. Paste in any text, e.g., "The judge was Steve Cosman and the defendant was John Smith."
3. Run task.
4. See a structured output, e.g., `{entities: {"judge": "Steve Cosman", "defendant": "John Smith"}}`
5. Try again, programmatically this time, loading the task from the command line.
6. Get empty output, e.g., `{entities: {}}`. This happens for any input text.

```
{
  "v": 1,
  "id": "168947134826",
  "created_at": "2025-06-13T09:14:37.854319",
  "created_by": "aryamccarthy",
  "name": "Entity extraction v1",
  "description": "",
  "instruction": "Extract the entities (i.e., organizations and people) in this document, as key/value pairs to signify that the role (key) X was performed by entity Y (value). For instance, you may learn that in a court case, the defendant was John Smith.",
  "requirements": [],
  "output_json_schema": "{\"description\": \"Model for validating and storing key entities.\\n\\nThe goal of this model is only to standardize the collection of entities in a document;\\\\nit does not attempt to capture the relationships between them or normalize their names.\\nThat'll be the responsibility of another model.\", \"properties\": {\"entities\": {\"additionalProperties\": {\"anyOf\": [{\"type\": \"string\"}, {\"items\": {\"type\": \"string\"}, \"type\": \"array\"}]}, \"default\": {}, \"description\": \"A dictionary of entities found in the document. The keys are the entity types, and the values are either a single entity or a list of entities. The values may be strings or lists of strings, depending on how many of a given entity type arefound.\", \"title\": \"Entities\", \"type\": \"object\"}}, \"title\": \"KeyEntities\", \"type\": \"object\"}",
  "input_json_schema": null,
  "thinking_instruction": "",
  "model_type": "task"
}
```

**Expected behavior**
A clear and concise description of what you expected to happen.

I expected the CLI behavior to match the UI behavior. Is there a structured prediction setting I'm not enabling in the UI—or one that I can disable from the CLI? I know OAI doesn't let you freely generate dict keys; is this true for other models?

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Error Logs**
Please include the logs if the issue shows an error. State that no error is shown if there is no error.

No error

**System Information:**

- OS: MacOS
- Browser safari
- Kiln app Version v0.16

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Differing behavior when predicting dicts in UI vs Python library #374

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Differing behavior when predicting dicts in UI vs Python library #374

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions