feat(ai): added troubleshooting for generative apis (#4230)

bene2k1 · fpagny · ldecarvalho-doc · web-flow · commit 7596f547dee2 · 2025-01-17T14:54:02.000+01:00
* feat(ai): added troubleshooting for generative apis

* fix(ai): fix meta

* Update fixing-common-issues.mdx

Add details to on 429: Too many request error

* Apply suggestions from code review

Co-authored-by: ldecarvalho-doc &lt;82805470+ldecarvalho-doc@users.noreply.github.com&gt;

---------

Co-authored-by: fpagny &lt;franckpagny@hotmail.fr&gt;
Co-authored-by: ldecarvalho-doc &lt;82805470+ldecarvalho-doc@users.noreply.github.com&gt;
diff --git a/ai-data/generative-apis/troubleshooting/fixing-common-issues.mdx b/ai-data/generative-apis/troubleshooting/fixing-common-issues.mdx
@@ -0,0 +1,87 @@
+---
+meta:
+  title: Fixing common issues with Generative APIs
+  description: This page lists common issues that you may encounter while using Scaleway's Generative APIs, their causes and recommended solutions.
+content:
+  h1: Fixing common issues with Generative APIs
+  paragraph: Generative APIs offer serverless AI models hosted at Scaleway - no need to configure hardware or deploy your own models
+tags: generative-apis ai-data common-issues
+dates:
+  validation: 2025-01-16
+  posted: 2025-01-16
+---
+
+Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions.
+
+## 429: Too Many Requests - You exceeded your current quota of requests/tokens per minute
+
+### Cause
+- You performed too many API requests over a given minute
+- You consumed too many tokens (input and output) with your API requests over a given minute 
+
+### Solution
+- [Ask our support](https://console.scaleway.com/support/tickets/create) to raise your quota
+- Smooth out your API requests rate by limiting the number of API requests you perform in parallel
+- Reduce the size of the input or output tokens processed by your API requests
+- Use [Managed Inference](/ai-data/managed-inference/), where these quota do not apply (your throughput will be only limited by the amount of Inference Deployment your provision)
+
+
+## 504: Gateway Timeout
+
+### Cause
+- The query is too long to process (even if context-length stays [between supported context window and maximum tokens](https://www.scaleway.com/en/docs/ai-data/generative-apis/reference-content/supported-models/))
+- The model goes into an infinite loop while processing the input (which is a known structural issue with several AI models)
+
+### Solution
+- Set a stricter **maximum token limit** to prevent overly long responses.
+- Reduce the size of the input tokens, or split the input into multiple API requests.
+- Use [Managed Inference](/ai-data/managed-inference/), where no query timeout is enforced.
+
+## Structured output (e.g., JSON) is not working correctly
+
+### Cause
+- Incorrect field naming in the request, such as using `"format"` instead of the correct `"response_format"` field.
+- Lack of a JSON schema, which can lead to ambiguity in the output structure.
+
+### Solution
+- Ensure the proper field `"response_format"` is used in the query.
+- Provide a JSON schema in the request to guide the model's structured output.
+- Refer to the [documentation on structured outputs](/ai-data/generative-apis/how-to/use-structured-outputs/) for examples and additional guidance.
+
+
+## Multiple "role": "user" successive messages
+
+### Cause
+- Successive messages with `"role": "user"` are sent in the API request instead of alternating between `"role": "user"` and `"role": "assistant"`.
+
+### Solution
+- Ensure the `"messages"` array alternates between `"role": "user"` and `"role": "assistant"`.
+- If multiple `"role": "user"` messages need to be sent, concatenate them into one `"role": "user"` message or intersperse them with appropriate `"role": "assistant"` responses.
+
+#### Example error message (for Mistral models)
+```json
+{
+  "object": "error",
+  "message": "After the optional system message, conversation roles must alternate user/assistant/user/assistant/...",
+  "type": "BadRequestError",
+  "param": null,
+  "code": 400
+}
+```
+
+## Best practices for optimizing model performance
+
+### Input size management
+- Avoid overly long input sequences; break them into smaller chunks if needed.
+- Use summarization techniques for large inputs to reduce token count while maintaining relevance.
+
+### Use proper parameter configuration
+- Double-check parameters like `"temperature"`, `"max_tokens"`, and `"top_p"` to ensure they align with your use case.
+- For structured output, always include a `"response_format"` and, if possible, a detailed JSON schema.
+
+### Debugging silent errors
+- For cases where no explicit error is returned:
+  - Verify all fields in the API request are correctly named and formatted.
+  - Test the request with smaller and simpler inputs to isolate potential issues.
+
+
diff --git a/ai-data/generative-apis/troubleshooting/index.mdx b/ai-data/generative-apis/troubleshooting/index.mdx
@@ -0,0 +1,8 @@
+---
+meta:
+  title: Generative APIs - Troubleshooting
+  description: Generative APIs - Troubleshooting
+content:
+  h1: Generative APIs - Troubleshooting
+  paragraph: Generative APIs - Troubleshooting
+---
diff --git a/menu/navigation.json b/menu/navigation.json
@@ -838,6 +838,16 @@
                 ],
                 "label": "Additional Content",
                 "slug": "reference-content"
+              },
+              {
+                "items": [
+                  {
+                    "label": "Fixing common issues",
+                    "slug": "fixing-common-issues"
+                  }
+                ],
+                "label": "Troubleshooting",
+                "slug": "troubleshooting"
               }
             ],
             "label": "Generative APIs",