scaleway · jcirinosclwy · Apr 7, 2025 · Apr 7, 2025
@@ -34,8 +34,8 @@ Below are common issues that you may encounter when using Generative APIs, their
 - You consumed too many tokens (input and output) with your API requests over a given minute
 
 ### Solution
-- Smooth out your API requests rate by limiting the number of API requests you perform over a given minute so that you remain below your [Organization quotas for Generative APIs](/en/docs/organizations-and-projects/additional-content/organization-quotas/#generative-apis).
-- [Add a payment method](/en/docs/billing/how-to/add-payment-method/#how-to-add-a-credit-card) and [validate your identity](/en/docs/account/how-to/verify-identity/) to increase automatically your quotas [based on standard limits](/en/docs/organizations-and-projects/additional-content/organization-quotas/#generative-apis).
+- Smooth out your API requests rate by limiting the number of API requests you perform over a given minute so that you remain below your [Organization quotas for Generative APIs](/organizations-and-projects/additional-content/organization-quotas/#generative-apis).
+- [Add a payment method](/billing/how-to/add-payment-method/#how-to-add-a-credit-card) and [validate your identity](/account/how-to/verify-identity/) to increase automatically your quotas [based on standard limits](/organizations-and-projects/additional-content/organization-quotas/#generative-apis).
 - [Ask our support](https://console.scaleway.com/support/tickets/create) to raise your quota.
 - Reduce the size of the input or output tokens processed by your API requests.
 - Use [Managed Inference](/managed-inference/), where these quota do not apply (your throughput will be only limited by the amount of Inference Deployment your provision)
@@ -77,8 +77,8 @@ Below are common issues that you may encounter when using Generative APIs, their
 ### Solution
 - Ensure the proper field `"response_format"` is used in the query.
 - Provide a JSON schema in the request to guide the model's structured output.
-- Ensure the `max_tokens` value is higher than the response `completion_tokens` value. If this is not the case, the model answer may be stripped down before it can finish the proper JSON structure (and lack closing JSON brackets `}` for example). Note that if the `max_tokens` value is not set in the query, [default values apply for each models](/generative-apis/reference-content/supported-models/).
-- Ensure the `temperature` value is set with a lower range value for the model. If this is not the case, the model answer may output invalid JSON characters. Note that if the `temperature` value is not set in the query, [default values apply for each models](/generative-apis/reference-content/supported-models/). As examples:
+- Ensure the `max_tokens` value is higher than the response `completion_tokens` value. If this is not the case, the model answer may be stripped down before it can finish the proper JSON structure (and lack closing JSON brackets `}` for example). Note that if the `max_tokens` value is not set in the query, [default values apply for each model](/generative-apis/reference-content/supported-models/).
+- Ensure the `temperature` value is set with a lower range value for the model. If this is not the case, the model answer may output invalid JSON characters. Note that if the `temperature` value is not set in the query, [default values apply for each model](/generative-apis/reference-content/supported-models/). As examples:
   - for `llama-3.3-70b-instruct`, `temperature` should be set lower than `0.6`
   - for `mistral-nemo-instruct-2407	`, `temperature` should be set lower than `0.3`
 - Refer to the [documentation on structured outputs](/generative-apis/how-to/use-structured-outputs/) for examples and additional guidance.
@@ -108,7 +108,7 @@ Below are common issues that you may encounter when using Generative APIs, their
 
 ### Causes
 - Cockpit is isolated by `project_id` and only displays token consumption related to one Project.
-- Cockpit `Tokens Processed` graphs along time can take up to an hour to update (to provide more accurate average consumptions over time). The overall `Tokens Processed` counter is updated in real time.
+- Cockpit `Tokens Processed` graphs along time can take up to an hour to update (to provide more accurate average consumptions over time). The overall `Tokens Processed` counter is updated in real-time.
 
 ### Solution
 - Ensure you are connecting to the Cockpit corresponding to your Project. Cockpits are currently isolated by `project_id`, which you can see in their URL: `https://PROJECT_ID.dashboard.obs.fr-par.scw.cloud/`. This Project should correspond to the one used in the URL you used to perform Generative APIs requests, such as `https://api.scaleway.ai/{PROJECT_ID}/v1/chat/completions`. You can list your projects and their IDs in your [Organization dashboard](https://console.scaleway.com/organization/projects).
@@ -135,6 +135,4 @@ Below are common issues that you may encounter when using Generative APIs, their
 ### Debugging silent errors
 - For cases where no explicit error is returned:
   - Verify all fields in the API request are correctly named and formatted.
-  - Test the request with smaller and simpler inputs to isolate potential issues.
-
-
+  - Test the request with smaller and simpler inputs to isolate potential issues.