Retry Causes Concatenation

DISCLAIMER: I personally validated an unexpected and unwanted behavior exists, but I used an LLM to do a root cause analysis for the bug, which I have not yet validated.

- [x] This is actually a bug report.
- [ ] I am not getting good LLM Results
- [ ] I have tried asking for help in the community on discord or discussions and have not received a response.
- [ ] I have tried searching the documentation and have not found an answer.

**What Model are you using?**

- [ ] gpt-3.5-turbo
- [ ] gpt-4-turbo
- [ ] gpt-4
- [x] Other (please specify)

Gemini 2.5 Flash via `google-genai` SDK (`GENAI_STRUCTURED_OUTPUTS` mode)

**Describe the bug**

When a Gemini response is truncated due to hitting the output token limit (`finish_reason=MAX_TOKENS`), the `GENAI_STRUCTURED_OUTPUTS` code path does not detect this. Instead of raising `IncompleteOutputException` (non-retryable), instructor tries to parse the truncated JSON, gets a `ValidationError` (retryable), and enters the retry loop. Each retry appends the full truncated output to the prompt via `reask_genai_structured_outputs`, causing exponential prompt growth:

| Attempt | Prompt tokens | Output tokens | finish_reason |
|---------|--------------|---------------|---------------|
| 1       | 15           | 1             | MAX_TOKENS    |
| 2       | 450          | 4             | MAX_TOKENS    |
| 3       | 1,227        | 1             | MAX_TOKENS    |

Other providers already check for truncation before parsing — for example OpenAI checks `finish_reason == "length"` and Anthropic checks `stop_reason == "max_tokens"`, both in `function_calls.py`. The `parse_genai_structured_outputs` method is missing this check.

In production with the default 65,536 max output token limit, this burns ~590K output tokens and ~920K prompt tokens per failure when the model happens to generate long string content (e.g. repetitive text inside a `{"text": "..."}` schema).

**To Reproduce**

```python
import os
import instructor
from instructor.core.exceptions import InstructorRetryException
from google import genai
from pydantic import BaseModel

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
structured_client = instructor.from_genai(
    client, mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS
)

class Response(BaseModel):
    text: str

try:
    result = structured_client.chat.completions.create(
        model="gemini-2.5-flash",
        response_model=Response,
        messages=[
            {"role": "user", "content": "List all prime numbers between 1 and 500."}
        ],
        max_retries=3,
        generation_config={"max_tokens": 5},  # force truncation
    )
except InstructorRetryException as e:
    for attempt in e.failed_attempts:
        resp = attempt.completion
        candidate = resp.candidates[0]
        usage = resp.usage_metadata
        print(f"Attempt {attempt.attempt_number}: "
              f"finish_reason={candidate.finish_reason}, "
              f"prompt_tokens={usage.prompt_token_count}, "
              f"exception={type(attempt.exception).__name__}")
```

Output:
```
Attempt 1: finish_reason=FinishReason.MAX_TOKENS, prompt_tokens=15, exception=ValidationError
Attempt 2: finish_reason=FinishReason.MAX_TOKENS, prompt_tokens=450, exception=ValidationError
Attempt 3: finish_reason=FinishReason.MAX_TOKENS, prompt_tokens=1227, exception=ValidationError
```

**Expected behavior**

`parse_genai_structured_outputs` should check `finish_reason` before parsing and raise `IncompleteOutputException` when the response was truncated, matching the behavior of all other provider paths. Suggested fix:

```python
# In instructor/processing/function_calls.py, parse_genai_structured_outputs:

@classmethod
def parse_genai_structured_outputs(cls, completion, validation_context=None, strict=None):
    from google.genai import types

    if (
        hasattr(completion, "candidates")
        and completion.candidates
        and completion.candidates[0].finish_reason == types.FinishReason.MAX_TOKENS
    ):
        raise IncompleteOutputException(last_completion=completion)

    return cls.model_validate_json(
        completion.text, context=validation_context, strict=strict
    )
```

**Versions**
- `instructor==1.14.4`
- `google-genai==1.46.0`
- Python 3.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry Causes Concatenation #2210

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Retry Causes Concatenation #2210

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions