Skip to content

Retry Causes Concatenation #2210

@cemde

Description

@cemde

DISCLAIMER: I personally validated an unexpected and unwanted behavior exists, but I used an LLM to do a root cause analysis for the bug, which I have not yet validated.

  • This is actually a bug report.
  • I am not getting good LLM Results
  • I have tried asking for help in the community on discord or discussions and have not received a response.
  • I have tried searching the documentation and have not found an answer.

What Model are you using?

  • gpt-3.5-turbo
  • gpt-4-turbo
  • gpt-4
  • Other (please specify)

Gemini 2.5 Flash via google-genai SDK (GENAI_STRUCTURED_OUTPUTS mode)

Describe the bug

When a Gemini response is truncated due to hitting the output token limit (finish_reason=MAX_TOKENS), the GENAI_STRUCTURED_OUTPUTS code path does not detect this. Instead of raising IncompleteOutputException (non-retryable), instructor tries to parse the truncated JSON, gets a ValidationError (retryable), and enters the retry loop. Each retry appends the full truncated output to the prompt via reask_genai_structured_outputs, causing exponential prompt growth:

Attempt Prompt tokens Output tokens finish_reason
1 15 1 MAX_TOKENS
2 450 4 MAX_TOKENS
3 1,227 1 MAX_TOKENS

Other providers already check for truncation before parsing — for example OpenAI checks finish_reason == "length" and Anthropic checks stop_reason == "max_tokens", both in function_calls.py. The parse_genai_structured_outputs method is missing this check.

In production with the default 65,536 max output token limit, this burns ~590K output tokens and ~920K prompt tokens per failure when the model happens to generate long string content (e.g. repetitive text inside a {"text": "..."} schema).

To Reproduce

import os
import instructor
from instructor.core.exceptions import InstructorRetryException
from google import genai
from pydantic import BaseModel

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
structured_client = instructor.from_genai(
    client, mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS
)

class Response(BaseModel):
    text: str

try:
    result = structured_client.chat.completions.create(
        model="gemini-2.5-flash",
        response_model=Response,
        messages=[
            {"role": "user", "content": "List all prime numbers between 1 and 500."}
        ],
        max_retries=3,
        generation_config={"max_tokens": 5},  # force truncation
    )
except InstructorRetryException as e:
    for attempt in e.failed_attempts:
        resp = attempt.completion
        candidate = resp.candidates[0]
        usage = resp.usage_metadata
        print(f"Attempt {attempt.attempt_number}: "
              f"finish_reason={candidate.finish_reason}, "
              f"prompt_tokens={usage.prompt_token_count}, "
              f"exception={type(attempt.exception).__name__}")

Output:

Attempt 1: finish_reason=FinishReason.MAX_TOKENS, prompt_tokens=15, exception=ValidationError
Attempt 2: finish_reason=FinishReason.MAX_TOKENS, prompt_tokens=450, exception=ValidationError
Attempt 3: finish_reason=FinishReason.MAX_TOKENS, prompt_tokens=1227, exception=ValidationError

Expected behavior

parse_genai_structured_outputs should check finish_reason before parsing and raise IncompleteOutputException when the response was truncated, matching the behavior of all other provider paths. Suggested fix:

# In instructor/processing/function_calls.py, parse_genai_structured_outputs:

@classmethod
def parse_genai_structured_outputs(cls, completion, validation_context=None, strict=None):
    from google.genai import types

    if (
        hasattr(completion, "candidates")
        and completion.candidates
        and completion.candidates[0].finish_reason == types.FinishReason.MAX_TOKENS
    ):
        raise IncompleteOutputException(last_completion=completion)

    return cls.model_validate_json(
        completion.text, context=validation_context, strict=strict
    )

Versions

  • instructor==1.14.4
  • google-genai==1.46.0
  • Python 3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpythonPull requests that update python codestatus:needs-investigationIssue needs investigation to determine scope

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions