Skip to content

[Bug] Differing behavior when predicting dicts in UI vs Python library #374

@aryamccarthy

Description

@aryamccarthy

Describe the bug
Given a JSON output schema that includes a dict field (such that any keys can be predicted), when generating from the CLI, then the output dict is always empty. When attempting the same in the UI, this is not the case. The UI case is preferable!

Tested this with google/gemini-2.0-flash-001 and openai/gpt-4o-mini, both via OpenRouter.

Checks

  • I've read the troubleshooting guide
  • I've tried to reproduce the problem using another model, and confirmed it's not an issue specific to the model I've chosen.
  • I've searched the docs for a solution
  • I've searched for existing Github issues/discussions

To Reproduce
Steps to reproduce the behavior:

  1. Use the Kiln task defined below.
  2. Open the task in the UI. Paste in any text, e.g., "The judge was Steve Cosman and the defendant was John Smith."
  3. Run task.
  4. See a structured output, e.g., {entities: {"judge": "Steve Cosman", "defendant": "John Smith"}}
  5. Try again, programmatically this time, loading the task from the command line.
  6. Get empty output, e.g., {entities: {}}. This happens for any input text.
{
  "v": 1,
  "id": "168947134826",
  "created_at": "2025-06-13T09:14:37.854319",
  "created_by": "aryamccarthy",
  "name": "Entity extraction v1",
  "description": "",
  "instruction": "Extract the entities (i.e., organizations and people) in this document, as key/value pairs to signify that the role (key) X was performed by entity Y (value). For instance, you may learn that in a court case, the defendant was John Smith.",
  "requirements": [],
  "output_json_schema": "{\"description\": \"Model for validating and storing key entities.\\n\\nThe goal of this model is only to standardize the collection of entities in a document;\\\\nit does not attempt to capture the relationships between them or normalize their names.\\nThat'll be the responsibility of another model.\", \"properties\": {\"entities\": {\"additionalProperties\": {\"anyOf\": [{\"type\": \"string\"}, {\"items\": {\"type\": \"string\"}, \"type\": \"array\"}]}, \"default\": {}, \"description\": \"A dictionary of entities found in the document. The keys are the entity types, and the values are either a single entity or a list of entities. The values may be strings or lists of strings, depending on how many of a given entity type arefound.\", \"title\": \"Entities\", \"type\": \"object\"}}, \"title\": \"KeyEntities\", \"type\": \"object\"}",
  "input_json_schema": null,
  "thinking_instruction": "",
  "model_type": "task"
}

Expected behavior
A clear and concise description of what you expected to happen.

I expected the CLI behavior to match the UI behavior. Is there a structured prediction setting I'm not enabling in the UI—or one that I can disable from the CLI? I know OAI doesn't let you freely generate dict keys; is this true for other models?

Screenshots
If applicable, add screenshots to help explain your problem.

Error Logs
Please include the logs if the issue shows an error. State that no error is shown if there is no error.

No error

System Information:

  • OS: MacOS
  • Browser safari
  • Kiln app Version v0.16

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions