Skip to content

Commit 7596f54

Browse files
bene2k1fpagnyldecarvalho-doc
authored
feat(ai): added troubleshooting for generative apis (#4230)
* feat(ai): added troubleshooting for generative apis * fix(ai): fix meta * Update fixing-common-issues.mdx Add details to on 429: Too many request error * Apply suggestions from code review Co-authored-by: ldecarvalho-doc <[email protected]> --------- Co-authored-by: fpagny <[email protected]> Co-authored-by: ldecarvalho-doc <[email protected]>
1 parent bf15187 commit 7596f54

File tree

3 files changed

+105
-0
lines changed

3 files changed

+105
-0
lines changed
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
meta:
3+
title: Fixing common issues with Generative APIs
4+
description: This page lists common issues that you may encounter while using Scaleway's Generative APIs, their causes and recommended solutions.
5+
content:
6+
h1: Fixing common issues with Generative APIs
7+
paragraph: Generative APIs offer serverless AI models hosted at Scaleway - no need to configure hardware or deploy your own models
8+
tags: generative-apis ai-data common-issues
9+
dates:
10+
validation: 2025-01-16
11+
posted: 2025-01-16
12+
---
13+
14+
Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions.
15+
16+
## 429: Too Many Requests - You exceeded your current quota of requests/tokens per minute
17+
18+
### Cause
19+
- You performed too many API requests over a given minute
20+
- You consumed too many tokens (input and output) with your API requests over a given minute
21+
22+
### Solution
23+
- [Ask our support](https://console.scaleway.com/support/tickets/create) to raise your quota
24+
- Smooth out your API requests rate by limiting the number of API requests you perform in parallel
25+
- Reduce the size of the input or output tokens processed by your API requests
26+
- Use [Managed Inference](/ai-data/managed-inference/), where these quota do not apply (your throughput will be only limited by the amount of Inference Deployment your provision)
27+
28+
29+
## 504: Gateway Timeout
30+
31+
### Cause
32+
- The query is too long to process (even if context-length stays [between supported context window and maximum tokens](https://www.scaleway.com/en/docs/ai-data/generative-apis/reference-content/supported-models/))
33+
- The model goes into an infinite loop while processing the input (which is a known structural issue with several AI models)
34+
35+
### Solution
36+
- Set a stricter **maximum token limit** to prevent overly long responses.
37+
- Reduce the size of the input tokens, or split the input into multiple API requests.
38+
- Use [Managed Inference](/ai-data/managed-inference/), where no query timeout is enforced.
39+
40+
## Structured output (e.g., JSON) is not working correctly
41+
42+
### Cause
43+
- Incorrect field naming in the request, such as using `"format"` instead of the correct `"response_format"` field.
44+
- Lack of a JSON schema, which can lead to ambiguity in the output structure.
45+
46+
### Solution
47+
- Ensure the proper field `"response_format"` is used in the query.
48+
- Provide a JSON schema in the request to guide the model's structured output.
49+
- Refer to the [documentation on structured outputs](/ai-data/generative-apis/how-to/use-structured-outputs/) for examples and additional guidance.
50+
51+
52+
## Multiple "role": "user" successive messages
53+
54+
### Cause
55+
- Successive messages with `"role": "user"` are sent in the API request instead of alternating between `"role": "user"` and `"role": "assistant"`.
56+
57+
### Solution
58+
- Ensure the `"messages"` array alternates between `"role": "user"` and `"role": "assistant"`.
59+
- If multiple `"role": "user"` messages need to be sent, concatenate them into one `"role": "user"` message or intersperse them with appropriate `"role": "assistant"` responses.
60+
61+
#### Example error message (for Mistral models)
62+
```json
63+
{
64+
"object": "error",
65+
"message": "After the optional system message, conversation roles must alternate user/assistant/user/assistant/...",
66+
"type": "BadRequestError",
67+
"param": null,
68+
"code": 400
69+
}
70+
```
71+
72+
## Best practices for optimizing model performance
73+
74+
### Input size management
75+
- Avoid overly long input sequences; break them into smaller chunks if needed.
76+
- Use summarization techniques for large inputs to reduce token count while maintaining relevance.
77+
78+
### Use proper parameter configuration
79+
- Double-check parameters like `"temperature"`, `"max_tokens"`, and `"top_p"` to ensure they align with your use case.
80+
- For structured output, always include a `"response_format"` and, if possible, a detailed JSON schema.
81+
82+
### Debugging silent errors
83+
- For cases where no explicit error is returned:
84+
- Verify all fields in the API request are correctly named and formatted.
85+
- Test the request with smaller and simpler inputs to isolate potential issues.
86+
87+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
meta:
3+
title: Generative APIs - Troubleshooting
4+
description: Generative APIs - Troubleshooting
5+
content:
6+
h1: Generative APIs - Troubleshooting
7+
paragraph: Generative APIs - Troubleshooting
8+
---

menu/navigation.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -838,6 +838,16 @@
838838
],
839839
"label": "Additional Content",
840840
"slug": "reference-content"
841+
},
842+
{
843+
"items": [
844+
{
845+
"label": "Fixing common issues",
846+
"slug": "fixing-common-issues"
847+
}
848+
],
849+
"label": "Troubleshooting",
850+
"slug": "troubleshooting"
841851
}
842852
],
843853
"label": "Generative APIs",

0 commit comments

Comments
 (0)