diff --git a/ai-data/generative-apis/troubleshooting/fixing-common-issues.mdx b/ai-data/generative-apis/troubleshooting/fixing-common-issues.mdx new file mode 100644 index 0000000000..9db31bacb9 --- /dev/null +++ b/ai-data/generative-apis/troubleshooting/fixing-common-issues.mdx @@ -0,0 +1,87 @@ +--- +meta: + title: Fixing common issues with Generative APIs + description: This page lists common issues that you may encounter while using Scaleway's Generative APIs, their causes and recommended solutions. +content: + h1: Fixing common issues with Generative APIs + paragraph: Generative APIs offer serverless AI models hosted at Scaleway - no need to configure hardware or deploy your own models +tags: generative-apis ai-data common-issues +dates: + validation: 2025-01-16 + posted: 2025-01-16 +--- + +Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions. + +## 429: Too Many Requests - You exceeded your current quota of requests/tokens per minute + +### Cause +- You performed too many API requests over a given minute +- You consumed too many tokens (input and output) with your API requests over a given minute + +### Solution +- [Ask our support](https://console.scaleway.com/support/tickets/create) to raise your quota +- Smooth out your API requests rate by limiting the number of API requests you perform in parallel +- Reduce the size of the input or output tokens processed by your API requests +- Use [Managed Inference](/ai-data/managed-inference/), where these quota do not apply (your throughput will be only limited by the amount of Inference Deployment your provision) + + +## 504: Gateway Timeout + +### Cause +- The query is too long to process (even if context-length stays [between supported context window and maximum tokens](https://www.scaleway.com/en/docs/ai-data/generative-apis/reference-content/supported-models/)) +- The model goes into an infinite loop while processing the input (which is a known structural issue with several AI models) + +### Solution +- Set a stricter **maximum token limit** to prevent overly long responses. +- Reduce the size of the input tokens, or split the input into multiple API requests. +- Use [Managed Inference](/ai-data/managed-inference/), where no query timeout is enforced. + +## Structured output (e.g., JSON) is not working correctly + +### Cause +- Incorrect field naming in the request, such as using `"format"` instead of the correct `"response_format"` field. +- Lack of a JSON schema, which can lead to ambiguity in the output structure. + +### Solution +- Ensure the proper field `"response_format"` is used in the query. +- Provide a JSON schema in the request to guide the model's structured output. +- Refer to the [documentation on structured outputs](/ai-data/generative-apis/how-to/use-structured-outputs/) for examples and additional guidance. + + +## Multiple "role": "user" successive messages + +### Cause +- Successive messages with `"role": "user"` are sent in the API request instead of alternating between `"role": "user"` and `"role": "assistant"`. + +### Solution +- Ensure the `"messages"` array alternates between `"role": "user"` and `"role": "assistant"`. +- If multiple `"role": "user"` messages need to be sent, concatenate them into one `"role": "user"` message or intersperse them with appropriate `"role": "assistant"` responses. + +#### Example error message (for Mistral models) +```json +{ + "object": "error", + "message": "After the optional system message, conversation roles must alternate user/assistant/user/assistant/...", + "type": "BadRequestError", + "param": null, + "code": 400 +} +``` + +## Best practices for optimizing model performance + +### Input size management +- Avoid overly long input sequences; break them into smaller chunks if needed. +- Use summarization techniques for large inputs to reduce token count while maintaining relevance. + +### Use proper parameter configuration +- Double-check parameters like `"temperature"`, `"max_tokens"`, and `"top_p"` to ensure they align with your use case. +- For structured output, always include a `"response_format"` and, if possible, a detailed JSON schema. + +### Debugging silent errors +- For cases where no explicit error is returned: + - Verify all fields in the API request are correctly named and formatted. + - Test the request with smaller and simpler inputs to isolate potential issues. + + diff --git a/ai-data/generative-apis/troubleshooting/index.mdx b/ai-data/generative-apis/troubleshooting/index.mdx new file mode 100644 index 0000000000..6b9669bbeb --- /dev/null +++ b/ai-data/generative-apis/troubleshooting/index.mdx @@ -0,0 +1,8 @@ +--- +meta: + title: Generative APIs - Troubleshooting + description: Generative APIs - Troubleshooting +content: + h1: Generative APIs - Troubleshooting + paragraph: Generative APIs - Troubleshooting +--- \ No newline at end of file diff --git a/menu/navigation.json b/menu/navigation.json index 13acef88fe..bbaad53a05 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -838,6 +838,16 @@ ], "label": "Additional Content", "slug": "reference-content" + }, + { + "items": [ + { + "label": "Fixing common issues", + "slug": "fixing-common-issues" + } + ], + "label": "Troubleshooting", + "slug": "troubleshooting" } ], "label": "Generative APIs",