[Bug] Differing behavior when predicting dicts in UI vs Python library #386

aryamccarthy · 2025-06-18T22:55:07Z

aryamccarthy
Jun 18, 2025

Describe the bug
Given a JSON output schema that includes a dict field (such that any keys can be predicted), when generating from the CLI, then the output dict is always empty. When attempting the same in the UI, this is not the case. The UI case is preferable!

Tested this with google/gemini-2.0-flash-001 and openai/gpt-4o-mini, both via OpenRouter.

Checks

I've read the troubleshooting guide
I've tried to reproduce the problem using another model, and confirmed it's not an issue specific to the model I've chosen.
I've searched the docs for a solution
I've searched for existing Github issues/discussions

To Reproduce
Steps to reproduce the behavior:

Use the Kiln task defined below.
Open the task in the UI. Paste in any text, e.g., "The judge was Steve Cosman and the defendant was John Smith."
Run task.
See a structured output, e.g., {entities: {"judge": "Steve Cosman", "defendant": "John Smith"}}
Try again, programmatically this time, loading the task from the command line.
Get empty output, e.g., {entities: {}}. This happens for any input text.

{
  "v": 1,
  "id": "168947134826",
  "created_at": "2025-06-13T09:14:37.854319",
  "created_by": "aryamccarthy",
  "name": "Entity extraction v1",
  "description": "",
  "instruction": "Extract the entities (i.e., organizations and people) in this document, as key/value pairs to signify that the role (key) X was performed by entity Y (value). For instance, you may learn that in a court case, the defendant was John Smith.",
  "requirements": [],
  "output_json_schema": "{\"description\": \"Model for validating and storing key entities.\\n\\nThe goal of this model is only to standardize the collection of entities in a document;\\\\nit does not attempt to capture the relationships between them or normalize their names.\\nThat'll be the responsibility of another model.\", \"properties\": {\"entities\": {\"additionalProperties\": {\"anyOf\": [{\"type\": \"string\"}, {\"items\": {\"type\": \"string\"}, \"type\": \"array\"}]}, \"default\": {}, \"description\": \"A dictionary of entities found in the document. The keys are the entity types, and the values are either a single entity or a list of entities. The values may be strings or lists of strings, depending on how many of a given entity type arefound.\", \"title\": \"Entities\", \"type\": \"object\"}}, \"title\": \"KeyEntities\", \"type\": \"object\"}",
  "input_json_schema": null,
  "thinking_instruction": "",
  "model_type": "task"
}

Expected behavior
A clear and concise description of what you expected to happen.

I expected the CLI behavior to match the UI behavior. Is there a structured prediction setting I'm not enabling in the UI—or one that I can disable from the CLI? I know OAI doesn't let you freely generate dict keys; is this true for other models?

Screenshots
If applicable, add screenshots to help explain your problem.

Error Logs
Please include the logs if the issue shows an error. State that no error is shown if there is no error.

No error

System Information:

OS: MacOS
Browser safari
Kiln app Version v0.16

Additional context
Add any other context about the problem here.

scosman · 2025-06-19T00:44:17Z

scosman
Jun 19, 2025
Maintainer

We don’t have a CLI - I assume you mean python library?

Also: better the judge than the defendant I guess…

0 replies

aryamccarthy · 2025-06-20T14:56:15Z

aryamccarthy
Jun 20, 2025
Author

You're right - I should say, it happens with the Python library.

I can try to isolate a minimal example; right now, it's wrapped in a bit of indirection. Before I do: is there a reason (off the top of your head) why the default running configuration would be different within the library versus the app?

0 replies

scosman · 2025-06-23T15:49:22Z

scosman
Jun 23, 2025
Maintainer

Sounds like a difference between the configs or runtime. The UI is using the library under the hood so some parameter is likely to blame. Might be model, json mode, json prompt generation or other things.

Easiest is to point your code at the same kiln task file you are running via UI. That should be aligned.

You can also Check the model logs (new option in v0.17, can open the logs folder from settings in the UI). You can compare what’s being called from UI and what’s being called from the UI and spot the difference.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Differing behavior when predicting dicts in UI vs Python library #386

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Bug] Differing behavior when predicting dicts in UI vs Python library #386

Uh oh!

Uh oh!

aryamccarthy Jun 18, 2025

Replies: 3 comments

Uh oh!

scosman Jun 19, 2025 Maintainer

Uh oh!

aryamccarthy Jun 20, 2025 Author

Uh oh!

scosman Jun 23, 2025 Maintainer

aryamccarthy
Jun 18, 2025

scosman
Jun 19, 2025
Maintainer

aryamccarthy
Jun 20, 2025
Author

scosman
Jun 23, 2025
Maintainer