feat(genapi): update troubleshooting (#5197)

fpagny · nerda-codes · RoRoJ · web-flow · commit 16c7412e22ad · 2025-07-02T11:42:22.000+02:00
* feat(genapi): update troubleshooting

* docs(review): rowena's review

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

* Apply suggestions from code review

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

* Apply suggestions from code review

Co-authored-by: Océane &lt;ofranc@scaleway.com&gt;

---------

Co-authored-by: Néda &lt;87707325+nerda-codes@users.noreply.github.com&gt;
Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;
Co-authored-by: Benedikt Rollik &lt;brollik@scaleway.com&gt;
Co-authored-by: Océane &lt;ofranc@scaleway.com&gt;
diff --git a/pages/generative-apis/troubleshooting/fixing-common-issues.mdx b/pages/generative-apis/troubleshooting/fixing-common-issues.mdx
@@ -111,15 +111,25 @@ Below are common issues that you may encounter when using Generative APIs, their
 - The model goes into an infinite loop while processing the input (which is a known structural issue with several AI models)
 
 ### Solution
+For queries that are too long to process:
 - Set a stricter **maximum token limit** to prevent overly long responses.
 - Reduce the size of the input tokens, or split the input into multiple API requests.
 - Use [Managed Inference](/managed-inference/), where no query timeout is enforced.
 
+For queries where the model enters an infinite loop (more frequent when using **structured output**):
+- Set `temperature` to the default value recommended for the model. These values can be found in the [Generative APIs Playground](https://console.scaleway.com/generative-api/models/fr-par/playground) when selecting the model. Avoid using temperature `0`, as this can lock the model into outputting only the next (and same) most probable token repeatedly.
+- Ensure the `top_p` parameter is not set too low (we recommend the default value of `1`). 
+- Add a `presence_penalty` value in your request (`0.5` is a good starting value). This option will help the model choose different tokens than the one it is looping on, although it might impact accuracy for some tasks requiring repeated multiple similar outputs.
+- Use more recent models, which are usually more optimized to avoid loops, especially when using structured output.
+- Optimize the system prompt to provide clearer and simpler tasks. Currently, JSON output accuracy still relies on heuristics to constrain models to output only valid JSON tokens, and thus depends on the prompts given. As a counter-example, providing contradictory requirements to a model - such as `Never output JSON` in the system prompt and `response_format` as `json_schema" in the query - may lead to the model never outputting closing JSON brackets `}`.
+
 ## Structured output (e.g., JSON) is not working correctly
 
 ### Description
 - Structured output response contains invalid JSON
 - Structured output response is valid JSON but content is less relevant
+- Structured output response never ends (loop over characters such as `"`, `\t` or `\n`). For this issue, see the advice on infinite loops in [504 Gateway Timeout](#504-gateway-timeout).
+ 
 
 ### Causes
 - Incorrect field naming in the request, such as using `"format"` instead of the correct `"response_format"` field.